Skip to content

Documentation of the controller states/conditions and their interaction#264

Open
fwiesel wants to merge 1 commit intomainfrom
documentation
Open

Documentation of the controller states/conditions and their interaction#264
fwiesel wants to merge 1 commit intomainfrom
documentation

Conversation

@fwiesel
Copy link
Contributor

@fwiesel fwiesel commented Mar 12, 2026

This is a documentation of the various controllers and their interactions.

@fwiesel fwiesel marked this pull request as ready for review March 12, 2026 13:30
This is a documentation of the various controllers and their
interactions.
@notandy notandy self-requested a review March 12, 2026 17:38
Copy link
Contributor

@notandy notandy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The file name is incorrect, since it includes way more than just an diagram.
Please consider splitting parts.
Also please put any docs into a docs/ directory, like it's common with Go project layouts


### Lifecycle Controllers

1. **HypervisorController** (`hypervisor_controller.go`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Numbered list is a bit confusing since the "order" doesn't matter.

Comment on lines +23 to +26
- **Labels:** `kubernetes.io/hostname`, `topology.kubernetes.io/region`, `topology.kubernetes.io/zone`, `kubernetes.metal.cloud.sap/{bb,cluster,name,type}`, `worker.garden.sapcloud.io/group`, `worker.gardener.cloud/pool` (plus any keys from global LabelSelector)
- **Annotations → Spec:** `nova.openstack.cloud.sap/aggregates``Spec.Aggregates` (comma-split, zone appended), `nova.openstack.cloud.sap/custom-traits``Spec.CustomTraits` (comma-split)
- **Label → Spec:** `cobaltcore.cloud.sap/node-hypervisor-lifecycle` presence → `Spec.LifecycleEnabled=true`, value `"skip-tests"``Spec.SkipTests=true`
- **Status:** Node internal IP → `Status.InternalIP`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be split up into one key per line?

Comment on lines +35 to +38
3. **AggregatesController** (`aggregates_controller.go`)
- Manages OpenStack aggregate membership
- Applies different aggregates based on lifecycle phase
- Coordinates with onboarding and termination flows
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A short explanation about the related spec and status field would be good

- `"ha"`: Disables compute service but does **not** create an Eviction CR (the HA service handles evacuation)

5. **EvictionController** (`eviction_controller.go`)
- Executes VM migration off the hypervisor
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

main point missing: reconcilles on evicition CRDs

6. **DecommissionController** (`decomission_controller.go`)
- Handles hypervisor offboarding
- Cleans up OpenStack service and resource provider
- Runs during node termination
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which a litte bit more information. like how it monitors node termination and what are the steps.

- Manages readiness signaling deployment
- Waits for `HaEnabled=False` (set by HypervisorInstanceHaController) before allowing node deletion during offboarding

### Auxiliary Controllers
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the difference?


### Auxiliary Controllers

8. **HypervisorInstanceHaController** (`hypervisor_instance_ha_controller.go`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this controller should not be in here

- Creates cert-manager `Certificate` CRs for libvirt TLS
- Generates RSA 4096-bit certificates covering all node IPs/DNS names

## Spec Fields
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should generate the spec documentation with go-docs directly from the spec. This is doomed to be outdated quickly


11. **NodeCertificateController** (`node_certificate_controller.go`)
- Creates cert-manager `Certificate` CRs for libvirt TLS
- Generates RSA 4096-bit certificates covering all node IPs/DNS names
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is hallucination

| `TraitsUpdated` | Placement API traits synced | TraitsController | `True` — required for onboarding to complete (which sets Ready=True); not directly re-checked at runtime |
| `HaEnabled` | Instance HA enabled/disabled state | HypervisorInstanceHaController | `True` — required for onboarding to complete when `Spec.HighAvailability=true`; `False` required before GardenerNodeLifecycleController allows node deletion during offboarding |

### Condition Reasons
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure if this section is helpful at all. Conditions have human-readable messages for a reason, which should explain the same.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants