From 63268f4ee54bb432defd629df542a4dc52c53533 Mon Sep 17 00:00:00 2001 From: Fabian Wiesel Date: Tue, 10 Mar 2026 18:38:12 +0100 Subject: [PATCH] Documentation of the controller states/conditions and their interaction This is a documentation of the various controllers and their interactions. --- controller-state-diagram.md | 714 ++++++++++++++++++++++++++++++++++++ 1 file changed, 714 insertions(+) create mode 100644 controller-state-diagram.md diff --git a/controller-state-diagram.md b/controller-state-diagram.md new file mode 100644 index 0000000..6205495 --- /dev/null +++ b/controller-state-diagram.md @@ -0,0 +1,714 @@ + + +# OpenStack Hypervisor Operator - Controller State Diagram + +This document describes how the controllers in the OpenStack Hypervisor Operator interact through conditions on the Hypervisor resource. + +## Overview + +The operator manages the lifecycle of OpenStack compute hypervisors through multiple controllers that coordinate via Kubernetes conditions on the `Hypervisor` custom resource. + +## Controllers + +### Lifecycle Controllers + +1. **HypervisorController** (`hypervisor_controller.go`) + - Creates Hypervisor CR from Kubernetes Node (with owner reference) + - On creation sets defaults: `HighAvailability=true`, `InstallCertificate=true` + - Syncs Node → Hypervisor CR (all one-directional): + - **Labels:** `kubernetes.io/hostname`, `topology.kubernetes.io/region`, `topology.kubernetes.io/zone`, `kubernetes.metal.cloud.sap/{bb,cluster,name,type}`, `worker.garden.sapcloud.io/group`, `worker.gardener.cloud/pool` (plus any keys from global LabelSelector) + - **Annotations → Spec:** `nova.openstack.cloud.sap/aggregates` → `Spec.Aggregates` (comma-split, zone appended), `nova.openstack.cloud.sap/custom-traits` → `Spec.CustomTraits` (comma-split) + - **Label → Spec:** `cobaltcore.cloud.sap/node-hypervisor-lifecycle` presence → `Spec.LifecycleEnabled=true`, value `"skip-tests"` → `Spec.SkipTests=true` + - **Status:** Node internal IP → `Status.InternalIP` + - Propagates Node `Terminating=True` condition → sets `Terminating` and `Ready=False` on Hypervisor CR + +2. **OnboardingController** (`onboarding_controller.go`) + - Manages initial hypervisor onboarding + - Runs smoke tests to verify hypervisor functionality + - Coordinates with AggregatesController and TraitsController for completion + - Waits for `HaEnabled=True` (set by HypervisorInstanceHaController) before completing onboarding + +3. **AggregatesController** (`aggregates_controller.go`) + - Manages OpenStack aggregate membership + - Applies different aggregates based on lifecycle phase + - Coordinates with onboarding and termination flows + +4. **HypervisorMaintenanceController** (`hypervisor_maintenance_controller.go`) + - Manages the compute service state and VM evacuation during maintenance + - Reacts to `Spec.Maintenance`: + - `""` (unset): Re-enables the OpenStack compute service, deletes any Eviction CR, and restores `Ready=True` + - `"manual"` / `"auto"` / `"termination"`: Disables compute service and creates an Eviction CR to migrate VMs off the hypervisor. The three modes behave identically in this controller; they differ only in origin (`manual` = external user, `auto` = automated maintenance, `termination` = node terminating) + - `"ha"`: Disables compute service but does **not** create an Eviction CR (the HA service handles evacuation) + +5. **EvictionController** (`eviction_controller.go`) + - Executes VM migration off the hypervisor + - Manages live and cold migration of instances + +6. **DecommissionController** (`decomission_controller.go`) + - Handles hypervisor offboarding + - Cleans up OpenStack service and resource provider + - Runs during node termination + +7. **GardenerNodeLifecycleController** (`gardener_node_lifecycle_controller.go`) + - Signals node state to Gardener Machine Controller Manager + - Creates blocking PodDisruptionBudgets + - Manages readiness signaling deployment + - Waits for `HaEnabled=False` (set by HypervisorInstanceHaController) before allowing node deletion during offboarding + +### Auxiliary Controllers + +8. **HypervisorInstanceHaController** (`hypervisor_instance_ha_controller.go`) + - Manages instance HA registration with `kvm-ha-service` + - Reads `Spec.HighAvailability`, the `Onboarding` condition, and the `Evicting` condition: + - **Enables HA** (`HaEnabled=True`) when `Spec.HighAvailability=true`, onboarding has reached Handover or Succeeded, and no eviction has completed + - **Disables HA** (`HaEnabled=False`) when any of: + - `Spec.HighAvailability=false` (reason: `Succeeded` — "HA disabled per spec") + - `Evicting=False` (eviction completed, hypervisor is empty; reason: `Evicted`) + - Onboarding is still in Initial/Testing phase or was Aborted (reason: `Onboarding`) + - Sets `HaEnabled=Unknown/Failed` if the HTTP call to `kvm-ha-service` fails + - Skips the HTTP call if the condition already reflects the desired state (idempotent) + - Does nothing if the `Onboarding` condition has never been set + +9. **TraitsController** (`traits_controller.go`) + - Syncs `Spec.CustomTraits` to the OpenStack Placement API + - Operates after onboarding completes (`Onboarding=False`) and also during the Handover phase (`Onboarding=True/Handover`) + - Does NOT run during Initial or Testing phases, or when node is terminating + - Sets `TraitsUpdated` condition + +10. **HypervisorTaintController** (`hypervisor_taint_controller.go`) + - Detects if the Hypervisor CR was edited via `kubectl` + - Sets `Tainted` condition as a warning signal + +11. **NodeCertificateController** (`node_certificate_controller.go`) + - Creates cert-manager `Certificate` CRs for libvirt TLS + - Generates RSA 4096-bit certificates covering all node IPs/DNS names + +## Spec Fields + +Spec fields are the primary inputs for controllers: + +| Field | Type | Read By | Description | +|-------|------|---------|-------------| +| `Spec.LifecycleEnabled` | bool | OnboardingController | Gates onboarding and termination. When `false`, onboarding is aborted and the hypervisor is left unmanaged. | +| `Spec.SkipTests` | bool | OnboardingController | Skips smoke tests during onboarding. Set from the `cobaltcore.cloud.sap/node-hypervisor-lifecycle=skip-tests` node label. | +| `Spec.Maintenance` | string | HypervisorMaintenanceController | Triggers maintenance mode. Values: `""` (none), `manual`, `auto`, `ha`, `termination`. | +| `Spec.HighAvailability` | bool | HypervisorInstanceHaController | Enables or disables instance HA registration with `kvm-ha-service`. | +| `Spec.Aggregates` | []string | AggregatesController | OpenStack aggregates to apply once onboarding reaches the Handover phase. Set from the `nova.openstack.cloud.sap/aggregates` node annotation. | +| `Spec.CustomTraits` | []string | TraitsController | Placement API traits to sync. Set from the `nova.openstack.cloud.sap/custom-traits` node annotation. | + +## Conditions + +Conditions are used to track state and coordinate between controllers: + +| Condition | Description | Set By | Required for `Ready=True` | +|-----------|-------------|--------|---------------------------| +| `Ready` | Overall readiness of the hypervisor | HypervisorController, OnboardingController, HypervisorMaintenanceController, DecommissionController | *(this is the readiness condition itself)* | +| `Onboarding` | Onboarding process state | OnboardingController | `False` (reason: Succeeded) — while `True`, Ready is held at False (reason: Onboarding) | +| `Terminating` | Node is being terminated | HypervisorController | Absent or `False` — when `True`, sets Ready=False (only if Ready was still True, to avoid overwriting other controllers) | +| `HypervisorDisabled` | OpenStack compute service disabled | HypervisorMaintenanceController | `False` — when `True`, Ready=False (reason: Maintenance) | +| `Evicting` | VMs are being migrated off | HypervisorMaintenanceController (on Hypervisor), EvictionController (on Eviction CR) | Absent — when present with reason Running: Ready=False (reason: Evicting); when reason Succeeded: Ready=False (reason: Evicted) | +| `Offboarded` | Cleanup completed | DecommissionController | Absent — when `True`, Ready=False (reason: Offboarded); this is a terminal state | +| `AggregatesUpdated` | OpenStack aggregates synced | AggregatesController | `True` — required for onboarding to complete (which sets Ready=True); not directly re-checked at runtime | +| `TraitsUpdated` | Placement API traits synced | TraitsController | `True` — required for onboarding to complete (which sets Ready=True); not directly re-checked at runtime | +| `HaEnabled` | Instance HA enabled/disabled state | HypervisorInstanceHaController | `True` — required for onboarding to complete when `Spec.HighAvailability=true`; `False` required before GardenerNodeLifecycleController allows node deletion during offboarding | + +### Condition Reasons + +**Ready Condition Reasons:** +- `Ready` - Hypervisor is operational +- `Maintenance` - In maintenance mode +- `Evicted` - VMs evacuated +- `Evicting` - VM evacuation in progress +- `Onboarding` - Initial setup in progress +- `Decommissioning` - Node decommissioning in progress (literal string, not a defined constant) +- `Offboarded` - Offboarding complete (literal string, not a defined constant) + +**Onboarding Condition Reasons:** +- `Initial` - Starting onboarding +- `Testing` - Running smoke tests +- `Handover` - Waiting for other controllers +- `Succeeded` - Onboarding complete +- `Aborted` - Onboarding cancelled + +**AggregatesUpdated Reasons:** +- `Succeeded` - Spec aggregates applied +- `Failed` - OpenStack API error +- `TestAggregates` - Test aggregate applied during onboarding +- `WaitingForTraits` - During Handover, waiting for TraitsController to complete before applying spec aggregates +- `Terminating` - Aggregates cleared for termination +- `EvictionInProgress` - Aggregates unchanged during eviction + +**Evicting Reasons (on Eviction CR):** +- `Running` - Eviction in progress +- `Succeeded` - Eviction completed +- `Failed` - Eviction failed + +**TraitsUpdated Reasons:** +- `Succeeded` - Traits synced to Placement API +- `Failed` - Placement API error + +**HaEnabled Condition Reasons:** +- `Succeeded` - HA enabled, or HA disabled per `Spec.HighAvailability=false` +- `Evicted` - HA disabled because eviction completed (hypervisor is empty) +- `Onboarding` - HA disabled because onboarding is in Initial/Testing phase or was Aborted +- `Failed` - HTTP call to `kvm-ha-service` failed (condition status: `Unknown`) + +## State Diagrams + +### High-Level Overview + +This diagram shows the phases a hypervisor goes through during its lifecycle: + +```mermaid +stateDiagram-v2 + [*] --> Onboarding: Node created with
hypervisor label + [*] --> Termination: Node deleted
before onboarding + + Onboarding --> Ready: Onboarding
complete + + Ready --> Maintenance: Spec.Maintenance
set + + Maintenance --> Ready: Maintenance
cleared + + Onboarding --> Termination: Terminating
(any phase) + Ready --> Termination: Terminating + Maintenance --> Termination: Terminating + + Termination --> [*]: Offboarded + + note right of Onboarding + Initial setup and testing + - Create Hypervisor CR + - Apply test aggregates + - Run smoke tests + - Enable HA (HypervisorInstanceHaController) + end note + + note right of Ready + Normal operation + - Scheduling VMs + - HA enabled + - Aggregates active + end note + + note right of Maintenance + Planned maintenance + - Disable service + - Evict VMs + - Perform maintenance + - Re-enable service + end note + + note right of Termination + Node decommissioning + - Abort onboarding if needed + - Evict VMs + - Disable HA (HypervisorInstanceHaController) + - Remove aggregates + - Delete service + - Clean up resources + end note +``` + +### Detailed Phase Diagrams + +#### 1. Onboarding Phase + +```mermaid +sequenceDiagram + participant Node as Kubernetes Node + participant HC as HypervisorController + participant Hyp as Hypervisor CR + participant OC as OnboardingController + participant AC as AggregatesController + participant TC as TraitsController + participant HAC as HypervisorInstanceHaController + participant Nova as Nova / Placement API + participant HA as kvm-ha-service + + Node->>HC: Node created with hypervisor label + HC->>Hyp: Create Hypervisor CR (owner ref to Node) + + OC->>Hyp: Set Onboarding=True/Initial, Ready=False/Onboarding + OC->>Nova: ensureNovaProperties() → discover HypervisorID, ServiceID + + AC->>Hyp: Read Onboarding=True/Initial + AC->>Nova: Apply zone + tenant_filter_tests aggregates + AC->>Hyp: Set AggregatesUpdated=False/TestAggregates + + OC->>Hyp: Read Status.Aggregates matches [zone, test] + OC->>Nova: Enable compute service (ForcedDown=false) + OC->>Hyp: Set Onboarding=True/Testing + + Note over OC,Nova: Test VM validation + OC->>Nova: Create test VM in test project + OC->>Nova: Monitor boot, validate console output + OC->>Nova: Delete test VM + + OC->>Hyp: Set Onboarding=True/Handover + + Note over TC,HAC: Sequential ordering during Handover + TC->>Hyp: Read Onboarding=True/Handover + TC->>Nova: Sync Spec.CustomTraits to Placement API + TC->>Hyp: Set TraitsUpdated=True/Succeeded + + AC->>Hyp: Read Onboarding=True/Handover, TraitsUpdated=True + AC->>Nova: Apply Spec.Aggregates + AC->>Hyp: Set AggregatesUpdated=True/Succeeded + + HAC->>Hyp: Read Onboarding=True/Handover (HA should be enabled) + HAC->>HA: enableInstanceHA() — POST {enabled: true} + HAC->>Hyp: Set HaEnabled=True/Succeeded + + OC->>Hyp: Read AggregatesUpdated=True AND TraitsUpdated=True AND HaEnabled=True + OC->>Hyp: Set Onboarding=False/Succeeded, Ready=True/Ready +``` + +#### 2. Maintenance Phase + +```mermaid +sequenceDiagram + participant User as User / System + participant Hyp as Hypervisor CR + participant MC as MaintenanceController + participant Evic as Eviction CR + participant EC as EvictionController + participant Nova as Nova API + + User->>Hyp: Set Spec.Maintenance (manual/auto/ha) + + MC->>Nova: Disable compute service + MC->>Hyp: Set HypervisorDisabled=True, Ready=False/Maintenance + + alt Maintenance mode is manual, auto, or termination + MC->>Evic: Create Eviction CR + MC->>Hyp: Set Evicting=True/Running + + EC->>Hyp: Read HypervisorDisabled=True (preflight) + EC->>Nova: List VMs (hypervisors.GetExt WithServers) + EC->>Evic: Set Preflight=True, populate OutstandingInstances + + loop For each VM (sequential) + Note over EC,Nova: ACTIVE → live migration
Stopped → cold migration
ERROR → move to end of queue
Deleting → skip + EC->>Nova: Trigger migration + EC->>Evic: Set MigratingInstance condition + EC->>Evic: Remove from OutstandingInstances + end + + EC->>Evic: Set Evicting=False/Succeeded + + MC->>Hyp: Set Evicting=False/Succeeded, Ready=False/Evicted + MC->>Hyp: Set Status.Evicted=true + else Maintenance mode is ha + Note over MC: No Eviction CR created
HA service handles evacuation + end + + Note over Hyp: Hypervisor is empty, service disabled
HA disabled (HypervisorInstanceHaController reacts to Evicting=False) +``` + +#### 2b. Maintenance Recovery (Return to Ready) + +```mermaid +sequenceDiagram + participant User as User / System + participant Hyp as Hypervisor CR + participant MC as MaintenanceController + participant Evic as Eviction CR + participant Nova as Nova API + + User->>Hyp: Clear Spec.Maintenance (set to "") + + MC->>Nova: Enable compute service + MC->>Hyp: Set HypervisorDisabled=False, Ready=True/Ready + MC->>Hyp: Set Status.Evicted=false + MC->>Hyp: Remove Evicting condition + MC->>Evic: Delete Eviction CR + + Note over Hyp: Service re-enabled, ready to accept VMs
HAC re-enables HA (Evicting condition removed → no longer evicted) +``` + +#### 3. Termination Phase + +```mermaid +sequenceDiagram + participant MCM as Gardener MCM + participant Node as Kubernetes Node + participant HC as HypervisorController + participant Hyp as Hypervisor CR + participant OC as OnboardingController + participant AC as AggregatesController + participant DC as DecommissionController + participant HAC as HypervisorInstanceHaController + participant GC as GardenerNodeLifecycleController + participant Nova as Nova / Placement API + participant HA as kvm-ha-service + + Note over MCM,HA: Preamble (termination-specific) + + MCM->>Node: Set condition Terminating=True + + HC->>Hyp: Set Terminating=True, Ready=False/Terminating + HC->>Hyp: Set Spec.Maintenance=termination + + opt If onboarding was in progress + OC->>Nova: Delete test VMs + OC->>Hyp: Set Onboarding=False/Aborted + end + + rect rgb(200, 220, 240) + Note over Hyp,Nova: Maintenance Phase (see diagram 2 above)
Triggered by Spec.Maintenance=termination
Disable service → Evict VMs → Mark evicted (Evicting=False) + end + + Note over HAC: Reacts to Evicting=False (eviction completed, concurrent with postamble) + HAC->>HA: disableInstanceHA() — POST {enabled: false} + HAC->>Hyp: Set HaEnabled=False/Evicted + + Note over MCM,HA: Postamble (termination-specific) + + Note over AC: Waits for Terminating=True AND Evicting=False + AC->>Nova: Remove from ALL aggregates + AC->>Hyp: Set AggregatesUpdated=False/Terminating, Status.Aggregates=[] + + Note over DC: Waits for Evicting=False AND Aggregates=[] + DC->>Nova: Verify no VMs remain (RunningVMs=0) + DC->>Nova: Delete compute service + DC->>Nova: Clean up resource provider + DC->>Hyp: Set Offboarded=True, Ready=False/Offboarded + + Note over GC: Waits for Offboarded=True, Onboarding=False, Node Terminating, HaEnabled=False + GC->>Hyp: Scale PDB minAvailable=0 + GC->>Hyp: Scale deployment to 0 replicas + + Note over Node: MCM can now delete the node
Owner reference cascades Hypervisor CR deletion +``` + +## Detailed Flow Descriptions + +### 0. Hypervisor CR Creation (New Node -> Hypervisor CR) + +**HypervisorController** detects a new Node with the hypervisor label and creates a Hypervisor CR with: + +- Owner reference to the Node +- Hardcoded defaults: `HighAvailability=true`, `InstallCertificate=true` +- Synced from Node labels/annotations (see HypervisorController description above for full list): + - `LifecycleEnabled` and `SkipTests` from `cobaltcore.cloud.sap/node-hypervisor-lifecycle` label + - `Spec.Aggregates` from `nova.openstack.cloud.sap/aggregates` annotation (+ zone) + - `Spec.CustomTraits` from `nova.openstack.cloud.sap/custom-traits` annotation + +### 1. Onboarding Flow (Hypervisor CR -> Ready) + +**Preconditions:** +- `Spec.LifecycleEnabled` is `true` +- `Terminating` condition is not `True` + +**Steps:** + +1. **OnboardingController** starts Initial phase + - Sets `Ready=False/Onboarding`, `Onboarding=True/Initial` + - Discovers `HypervisorID` and `ServiceID` from Nova via `ensureNovaProperties()` + +2. **AggregatesController** applies test aggregates + - Once `Onboarding=True/Initial`: + - Adds hypervisor to zone aggregate + test aggregate (`tenant_filter_tests`) + - Sets `AggregatesUpdated=False/TestAggregates` + +3. **OnboardingController** enters Testing phase + - Once `Status.Aggregates` matches `[zone, tenant_filter_tests]`: + - Enables Nova compute service (`ForcedDown=false`) + - Sets `Onboarding=True/Testing` + - Creates test VM in test project on this specific hypervisor + - Monitors VM creation and boot + - Validates console output contains server name + - Deletes test VM upon success + +4. **OnboardingController** enters Handover phase (first pass of `completeOnboarding`) + - Deletes remaining test VMs + - Sets `Onboarding=True/Handover` + +5. **TraitsController** syncs custom traits + - Once `Onboarding=True/Handover` (or `Onboarding=False`): + - Syncs `Spec.CustomTraits` to OpenStack Placement API + - Sets `TraitsUpdated=True/Succeeded` + +6. **AggregatesController** applies production aggregates (sequential after traits) + - Once `Onboarding=True/Handover`: + - If `TraitsUpdated` is not `True`: keeps test aggregates, sets `AggregatesUpdated=False/WaitingForTraits` + - Once `TraitsUpdated=True`: applies aggregates from `Spec.Aggregates`, sets `AggregatesUpdated=True/Succeeded` + +6b. **HypervisorInstanceHaController** enables HA + - Once `Onboarding=True/Handover` (and `Spec.HighAvailability=true`): + - Calls `enableInstanceHA()` (HTTP POST to kvm-ha-service) + - Sets `HaEnabled=True/Succeeded` + - Runs concurrently with steps 5 and 6 — all three can complete in any order + +7. **OnboardingController** completes onboarding (second pass of `completeOnboarding`) + - Once `AggregatesUpdated=True` AND `TraitsUpdated=True` AND (if `Spec.HighAvailability=true`) `HaEnabled=True`: + - Sets `Onboarding=False/Succeeded`, `Ready=True/Ready` + +8. **GardenerNodeLifecycleController** signals readiness + - Once `Onboarding=False`: + - Updates deployment startup probe to `/bin/true` (pod becomes ready) + - Gardener MCM detects node is ready + +### 2. Maintenance Flow (Ready -> Maintenance -> Ready) + +**Preconditions:** +- `Onboarding` condition is not `True` (absent or `False`) + +**Steps:** + +1. **User/System** sets `Spec.Maintenance` (manual, auto, or ha) + +2. **HypervisorMaintenanceController** disables compute service + - Disables compute service via OpenStack API (prevents new VM scheduling) + - Sets `HypervisorDisabled=True/Succeeded`, `Ready=False/Maintenance` + +3. **HypervisorMaintenanceController** creates Eviction CR + - Only for modes `manual`, `auto`, or `termination` (not `ha` — HA eviction is handled externally) + - Creates Eviction custom resource with `Spec.Hypervisor` set + - Sets `Evicting=True/Running` on Hypervisor + +4. **EvictionController** runs preflight checks + - After `HypervisorDisabled=True` on Hypervisor: + - Queries OpenStack for VM list (using `hypervisors.GetExt` with `WithServers=true`) + - Populates `Status.OutstandingInstances` array + - Sets `Preflight=True/Succeeded` on Eviction CR + +5. **EvictionController** migrates VMs sequentially + - For each VM in `OutstandingInstances` (processes last element): + - ACTIVE or PowerState=1: triggers live migration (`BlockMigration=false`) + - Stopped: triggers cold migration + - ERROR: moves to front of array (tried last), returns error + - MIGRATING/RESIZE: waits and polls (10s) + - task_state=deleting: moves to front (skip for now) + - Already on different host: removes from list + - Updates `MigratingInstance` condition on Eviction CR + - Removes from list when migration completes + +6. **EvictionController** completes eviction + - Once `OutstandingInstances` is empty: + - Sets `Evicting=False/Succeeded` on Eviction CR + +7. **HypervisorMaintenanceController** marks hypervisor as evicted + - Once Eviction CR has `Evicting=False`: + - Sets `Evicting=False/Succeeded`, `Ready=False/Evicted` on Hypervisor + - Sets `Status.Evicted=true` + +8. **User/System** clears `Spec.Maintenance` after maintenance + - Sets `Spec.Maintenance=""` or removes field + +9. **HypervisorMaintenanceController** re-enables service + - Once `Spec.Maintenance=""`: + - Enables compute service via OpenStack API + - Sets `HypervisorDisabled=False/Succeeded`, `Ready=True/Ready` + - Sets `Status.Evicted=false` + - Removes `Evicting` condition and deletes Eviction CR + +**Note:** Instance HA is disabled during `manual`/`auto`/`termination` maintenance once eviction completes and `Evicting=False` is set. After maintenance recovery (service re-enabled, `Evicting` condition removed), `HypervisorInstanceHaController` re-enables HA. For `ha` maintenance mode (no Eviction CR created), the `Evicting` condition is never set to `False`, so HA is not disabled by the controller — the HA service handles evacuation directly. + +### 3. Termination Flow (Ready -> Offboarded) + +Termination is a special case of the [Maintenance Flow](#2-maintenance-flow-ready---maintenance---ready): the same eviction core (steps 2-7) runs, preceded by termination-specific setup and followed by cleanup steps that replace the normal recovery (steps 8-9). + +**Preconditions:** +- `Spec.LifecycleEnabled` is `true` +- Node has `Terminating=True` condition (set by Gardener MCM) + +**Steps:** + +1. **HypervisorController** propagates termination + - Sets `Terminating=True`, `Ready=False/Terminating` on Hypervisor (only if Ready is currently True or nil) + - Sets `Spec.Maintenance=termination` + +2. **OnboardingController** aborts if onboarding was in progress + - Deletes any test VMs + - Sets `Onboarding=False/Aborted`, `Ready=False/Onboarding` (only if Ready was set by onboarding) + +3. **Maintenance Flow steps 2-7** — eviction core + - `Spec.Maintenance=termination` triggers the same sequence as any other maintenance mode: disable compute service, create Eviction CR, migrate VMs, mark evicted + - See [Maintenance Flow](#2-maintenance-flow-ready---maintenance---ready) for full details + - After eviction completes, the flow continues below instead of waiting for a user to clear `Spec.Maintenance` (steps 8-9 of Maintenance do not apply) + +4. **AggregatesController** removes from aggregates + - Once `Terminating=True` AND `Evicting=False`: + - Removes hypervisor from ALL OpenStack aggregates + - Sets `AggregatesUpdated=False/Terminating`, `Status.Aggregates=[]` + - Must wait for eviction to complete — removing aggregates during migration could cause scheduling failures + +5. **DecommissionController** deletes OpenStack resources + - Once `Evicting=False` (if `ServiceID` is set) AND `Status.Aggregates=[]`: + - Queries OpenStack for hypervisor details (with `WithServers=true`) + - Verifies no VMs remain (`RunningVMs=0`, `Servers=[]`) + - Deletes OpenStack compute service + - Cleans up placement resource provider + - Sets `Offboarded=True/Offboarded`, `Ready=False/Offboarded` + +6. **HypervisorInstanceHaController** disables HA + - Reacts to `Evicting=False` (set by HypervisorMaintenanceController when eviction completes, during step 3) + - Runs concurrently with steps 4 and 5 (AggregatesController and DecommissionController also wait for `Evicting=False`) + - Calls `disableInstanceHA()` (HTTP POST to kvm-ha-service) + - Sets `HaEnabled=False/Evicted` + +7. **GardenerNodeLifecycleController** allows node deletion + - Once `Offboarded=True` AND `Onboarding=False` AND `HaEnabled=False` (when Node is Terminating): + - Scales PodDisruptionBudget `minAvailable=0` + - Scales deployment to 0 replicas + - Gardener MCM can now delete the node + +8. **Node deletion** cascades to Hypervisor CR deletion via owner reference + +### 4. Abort Onboarding Flow + +**Causes:** + +- **Lifecycle disabled** — `Spec.LifecycleEnabled` set to `false` +- **Node terminating** — Node gets `Terminating=True` condition + +**Behavior:** OnboardingController detects either condition during any onboarding phase and: + +- Deletes test VMs from test project +- Sets `Onboarding=False/Aborted` +- Sets `Ready=False/Onboarding` (message: "Aborted") if Ready was set by onboarding controller + +**Aftermath:** + +- **If lifecycle disabled:** Hypervisor remains in the cluster but unmanaged. AggregatesController leaves current aggregates in place. +- **If terminating:** Proceeds to Termination Flow (see section 3). AggregatesController will handle aggregate removal as part of that flow. + +## Key Synchronization Points + +### Controller Coordination Mechanisms + +1. **Onboarding Completion** (Traits → Aggregates → HA enable → Complete) + - During Handover phase, TraitsController syncs custom traits first (`TraitsUpdated=True`) + - AggregatesController waits for `TraitsUpdated=True` before applying spec aggregates (`AggregatesUpdated=True`) + - HypervisorInstanceHaController enables HA and sets `HaEnabled=True` (runs concurrently with Traits/Aggregates) + - OnboardingController waits for all three: `AggregatesUpdated=True` AND `TraitsUpdated=True` AND (if `Spec.HighAvailability=true`) `HaEnabled=True` + - Only then sets: `Onboarding=False/Succeeded`, `Ready=True/Ready` + +2. **Eviction Start** + - EvictionController waits for: `HypervisorDisabled=True` (on Hypervisor) + - Validates hypervisor is disabled before migrating VMs + +3. **Aggregate Removal During Termination** + - AggregatesController waits for: `Terminating=True` AND `Evicting=False` (on Hypervisor) + - Ensures VMs are migrated before removing aggregate membership + +4. **Service Deletion** + - DecommissionController waits for: `Evicting=False` AND `Status.Aggregates=[]` + - Ensures hypervisor is evacuated and isolated before deletion + +5. **HA Disable Before Gardener Node Lifecycle** + - HypervisorInstanceHaController disables HA when `Evicting=False` (eviction completed), sets `HaEnabled=False/Evicted` + - GardenerNodeLifecycleController allows termination when: + - `Offboarded=True` + - When additionally `Onboarding=False` AND Node has `Terminating` condition: waits for `HaEnabled=False` before scaling down PDB/deployment + - Blocks termination during onboarding or operation via PDB with `minAvailable=1` + +## Aggregate Management Strategy + +The AggregatesController applies different aggregate sets based on lifecycle phase: + +| Phase | Aggregates Applied | Condition Status | +|-------|-------------------|------------------| +| Onboarding (Initial/Testing) | zone + `tenant_filter_tests` | `AggregatesUpdated=False/TestAggregates` | +| Onboarding (Handover, traits pending) | zone + `tenant_filter_tests` (unchanged) | `AggregatesUpdated=False/WaitingForTraits` | +| Onboarding (Handover, traits ready) | `Spec.Aggregates` | `AggregatesUpdated=True/Succeeded` | +| Normal Operation | `Spec.Aggregates` | `AggregatesUpdated=True/Succeeded` | +| Terminating (before eviction) | Current aggregates unchanged | `AggregatesUpdated=False/EvictionInProgress` | +| Terminating (after eviction) | Empty (removed from all) | `AggregatesUpdated=False/Terminating` | + +- Test aggregate during onboarding isolates test VMs from production traffic +- Production aggregates are applied only after testing succeeds +- Aggregates are retained during eviction to ensure VM placement remains valid +- Aggregates are removed after eviction to prevent scheduling on the decommissioned hypervisor + +## HA Management + +Instance HA is managed by `HypervisorInstanceHaController`, which owns the `HaEnabled` condition and all HTTP calls to `kvm-ha-service`. + +| Desired State | Trigger | Action | Condition Set | +|---------------|---------|--------|---------------| +| Enable HA | `Onboarding=True/Handover` (or `Onboarding=False/Succeeded`) with `Spec.HighAvailability=true`, `Evicting` not False | `enableInstanceHA()` - POST `{"enabled": true}` | `HaEnabled=True/Succeeded` | +| Disable HA (evicted) | `Evicting=False` (eviction completed) | `disableInstanceHA()` - POST `{"enabled": false}` | `HaEnabled=False/Evicted` | +| Disable HA (spec) | `Spec.HighAvailability=false` | `disableInstanceHA()` - POST `{"enabled": false}` | `HaEnabled=False/Succeeded` | +| Disable HA (onboarding) | `Onboarding=True` with reason not `Handover`, or `Onboarding=False/Aborted` | `disableInstanceHA()` - POST `{"enabled": false}` | `HaEnabled=False/Onboarding` | + +**URL pattern:** `https://kvm-ha-service-..cloud.sap/api/hypervisors/` + +**Key behaviors:** +- HA is enabled during the Handover phase, after `Onboarding=True/Handover` is set +- The HTTP call is skipped if the condition already reflects the desired state (idempotent) +- HA is disabled during `manual`/`auto`/`termination` maintenance once eviction completes (`Evicting=False`), and re-enabled after recovery when the `Evicting` condition is removed +- For `ha` maintenance mode, no `Evicting=False` condition is set, so the controller does not disable HA +- `disableInstanceHA()` accepts HTTP 404 (Not Found) as a success response; `enableInstanceHA()` requires HTTP 200 +- The controller does nothing if the `Onboarding` condition has never been set + +## Error Handling + +### Onboarding Failures + +- **Test VM fails:** OnboardingController deletes failed VM, updates condition message, requeues after 1 minute +- **Service not found:** `ensureNovaProperties()` requeues after 1 minute, waits for OpenStack registration +- **Aggregate sync fails:** AggregatesController sets `AggregatesUpdated=False/Failed`, returns error for retry + +### Eviction Failures + +- **VM migration ERROR state:** Moves VM to front of array (tried last), sets `MigratingInstance=False/Failed`, returns error +- **Migration API failure:** EvictionController sets `MigratingInstance=False/Failed`, returns error for retry +- **Deleting VM:** Moves to front of array (skip for now), sets `MigratingInstance=False/Failed` +- **Hypervisor not found in Nova:** If expected, sets `Evicting=False/Failed`; if not expected (no IDs), sets `Evicting=False/Succeeded` + +### Termination Failures + +- **VMs still present:** DecommissionController sets `Ready=False/Decommissioning` with message, requeues +- **Service deletion fails:** Reports error, retries +- **Aggregates not cleared:** Blocks decommission, sets `Ready=False/Decommissioning` with message + +## Testing the Flows + +### Test Onboarding + +```bash +# Watch hypervisor progress +kubectl get hypervisor -w + +# Check detailed conditions +kubectl get hypervisor -o jsonpath='{.status.conditions}' | jq + +# Expected condition progression: +# 1. Onboarding=True/Initial +# 2. Onboarding=True/Testing +# 3. Onboarding=True/Handover +# 4. Onboarding=False/Succeeded, Ready=True +``` + +### Test Maintenance + +```bash +# Enter maintenance +kubectl patch hypervisor --type=merge -p '{"spec":{"maintenance":"manual","maintenanceReason":"OS update"}}' + +# Watch eviction progress +kubectl get eviction -w +kubectl get eviction -o jsonpath='{.status.outstandingInstances}' + +# Exit maintenance +kubectl patch hypervisor --type=merge -p '{"spec":{"maintenance":""}}' +``` + +### Test Termination + +```bash +# Mark node for termination (depends on MCM) +kubectl annotate node node.machine.sapcloud.io/trigger-deletion-by-mcm=true' + +# Watch progression +kubectl get hypervisor -o jsonpath='{.status.conditions}' | jq -r '.[] | "\(.type)=\(.status)/\(.reason)"' + +# Expected: Terminating=True -> Evicting=False -> Offboarded=True +```