fix: activate NextReconcileTime guard to prevent reconciliation storms#1657
Merged
gianlucam76 merged 9 commits intoprojectsveltos:mainfrom Mar 19, 2026
Merged
Conversation
…memory cooldown
When multiple controllers update the same ClusterSummary, watch events from
status patches bypass RequeueAfter delays, causing tight reconciliation loops
(~100ms cycles, 3 CPU cores consumed). The existing NextReconcileTime guard
in skipReconciliation() was effectively dead code — only set in one obscure
undeploy path.
Changes:
- Set NextReconcileTime before every RequeueAfter return (10 call sites)
- Add in-memory cooldown map (survives status-patch conflicts)
- Swallow conflict errors in scope.Close() to prevent bypass of backoff
- Skip redundant status patches when feature already provisioned with same hash
- Configure 1s base-delay rate limiter on ClusterSummary workqueue
- Use precise remaining time in skip path instead of hardcoded 10s
Member
|
Thank you so much @muthusk for the detailed description and the PR. I will review this by today. Thank you so much again. Really appreciated |
gianlucam76
reviewed
Mar 19, 2026
|
|
||
| DeletedInstances map[types.NamespacedName]time.Time | ||
| DeletedInstances map[types.NamespacedName]time.Time | ||
| NextReconcileTimes map[types.NamespacedName]time.Time // in-memory cooldown, survives status-patch conflicts |
Member
There was a problem hiding this comment.
this is minor, in reconcileDelete, we should delete the clusterSummary instance if present in this map
gianlucam76
approved these changes
Mar 19, 2026
Member
gianlucam76
left a comment
There was a problem hiding this comment.
This is a thoughtful PR. Thank you.
I added a comment, but will address in a separate PR.
And the test failing in fv-agentless sanity, I will debug it offline.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
We've been running into reconciliation storms in production when managing multiple child clusters with 10+ Helm services each. The addon-controller gets stuck in a tight reconciliation loop (~100ms cycles), consuming 2-3 CPU cores and stalling
observedGenerationadvancement for 30+ minutes.After digging into it, the core issue is that
skipReconciliation()checksStatus.NextReconcileTimeto throttle reconciliation, but this field is almost never actually set during normal deploy flow — it's only written in one obscure path (HandOverError during undeploy). So the guard exists but is effectively dead code.This means that whenever
scope.Close()patches the status (e.g., changing the FeatureSummary hash from nil to a value), the resulting watch event immediately re-enqueues the ClusterSummary, completely bypassing theRequeueAfterdelay thatproceedDeployingClusterSummaryreturns. The controller thinks it's asking for a 60-second delay, but it gets called back in milliseconds.Related upstream issue in k0rdent: k0rdent/kcm#2552
What this PR does
1. Activates the NextReconcileTime guard — Calls
setNextReconcileTime()before everyreturn reconcile.Result{RequeueAfter: X}across 8 call sites inreconcileNormalandproceedDeployingClusterSummary. This is the main fix — it makes the existingskipReconciliation()check actually work.2. Adds an in-memory cooldown map —
NextReconcileTimes map[NamespacedName]time.Timeon the reconciler struct, following the same pattern as the existingDeletedInstancesmap. This is needed because whenscope.Close()encounters a conflict error (another controller updated the same resource), theNextReconcileTimestatus field never reaches etcd. The in-memory map survives that.3. Swallows conflict errors in scope.Close() — Instead of propagating conflict errors to controller-runtime (which immediately requeues, bypassing backoff), we log and swallow them. The watch event from whatever caused the conflict will re-enqueue the resource anyway, and the in-memory cooldown ensures the backoff is honored.
4. Skips redundant status patches — When a feature is already
Provisionedwith the same hash, we skip callingupdateFeatureStatus. This prevents unnecessary status patches that generate watch events and re-trigger reconciliation for no reason.5. Custom rate limiter — Configures the ClusterSummary controller workqueue with a 1-second base delay (vs the default 5ms) for exponential backoff on failed items.
6. Precise requeue timing — When
skipReconciliation()fires, uses the actual remaining cooldown time instead of the hardcodednormalRequeueAfter(10s).How the fixes work together
Testing
go build/go vetcleanlib/clusterops,pkg/scope,controllers/dependencymanager)PolicyRef Tiertest that also fails intermittently on upstream main)Production results
Deployed to a k0rdent cluster managing 3 child clusters with 11 services each:
"the object has been modified") dropped to zeroskipReconciliationguard firing correctly (520 skips/minute)The remaining elevated CPU (~374m vs expected ~50m) is caused by a separate issue in the k0rdent controller manager writing Profile specs in non-deterministic order — tracked in k0rdent/kcm#2552.