feat(service/graph): add partial reload support for pipeline components#14513
feat(service/graph): add partial reload support for pipeline components#14513blakerouse wants to merge 17 commits intoopen-telemetry:mainfrom
Conversation
When only receiver configurations change, restart just the affected receivers instead of tearing down and rebuilding the entire service. This preserves running processors, exporters, connectors, and extensions, reducing reload disruption and data loss. The feature is gated behind the Alpha featuregate "service.receiverPartialReload" and falls back to a full reload when non-receiver sections of the config have changed.
Implement partial reload capability that allows the collector to reload only the affected components when configuration changes, rather than tearing down and rebuilding the entire service. Pipeline structure: receiver -> capabilitiesNode -> processors -> fanOutNode -> exporters When a component is recreated, all upstream components must also be recreated because each component stores a reference to its next consumer. For example, if an exporter config changes: - The exporter is recreated - The fanOutNode is rebuilt to reference the new exporter - All processors are recreated to get new references to the fanOutNode - The capabilitiesNode is rebuilt to reference the first processor - All receivers are recreated to get new references to the capabilitiesNode The reload process has five phases: 1. Identify changes: Determine which components changed and which pipelines are affected 2. Shutdown: Stop components that need recreation (receivers first, then downstream) 3. Update graph: Remove old nodes, create new nodes, wire edges 4. Build: Create new component instances (downstream first, then upstream) 5. Start: Start recreated components (downstream first, then receivers last)
|
I know this comes as a very large PR. I am willing to split this up in anyway you would like to get it merged. I just wanted to provide it as a full unit so it could be tested and get overall temperature of getting something like this added to the collector. I wanted to show how it would look and how it could be done in a way that would not risk the current operating mode of the collector. |
Codecov Report❌ Patch coverage is
❌ Your patch check has failed because the patch coverage (82.54%) is below the target coverage (95.00%). You can increase the patch coverage or adjust the target coverage. Additional details and impacted files@@ Coverage Diff @@
## main #14513 +/- ##
==========================================
- Coverage 91.83% 91.67% -0.17%
==========================================
Files 677 678 +1
Lines 42705 43487 +782
==========================================
+ Hits 39220 39865 +645
- Misses 2429 2517 +88
- Partials 1056 1105 +49 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Description
Implements partial reload capability that allows the collector to reload only the affected components when configuration changes, rather than tearing down and rebuilding the entire service.
When a component is identified as needing to be recreated, all upstream components must also be
recreated because each component stores a reference to its next consumer. This PR uses that to only re-create what is required. If a processor is changed then the processors and the receivers are re-created for that pipeline, but not the exporter or connector. If a receiver is changed in a pipeline and doesn't get used in another pipeline then that pipeline is not touched.
For example, if an exporter config changes:
Note: This is using
reflect.DeepEqualto limit the change in this PR. Using something like hashing of the config objects and using hash comparison would result in even better performance. The benchmarking does show this as still being fast than full reload usingreflect.DeepEqual.Link to tracking issue
Fixes #5966
Testing
Unit tests where added that cover all scenarios that I could think of. Including simple scenarios like just a single receiver changing to more complex of a connector between pipelines (resulting in the first pipeline being fully restarted as the connector is the exporter, and the second pipeline only restarting the connector as its the receiver).
Benchmark testing was also added so it can be determined that partial reload is faster than full reload in all scenarios. The less that changes the faster it is, when it gets closer to all things changing it is still faster but only slightly.
Documentation
Documentation was added into the
docs/internal-architecture.md. This includes how partial reload works, the feature flag information to turn it on, and the phases that it performs.