Skip to content

feat(service/graph): add partial reload support for pipeline components#14513

Open
blakerouse wants to merge 17 commits intoopen-telemetry:mainfrom
blakerouse:partial-reload
Open

feat(service/graph): add partial reload support for pipeline components#14513
blakerouse wants to merge 17 commits intoopen-telemetry:mainfrom
blakerouse:partial-reload

Conversation

@blakerouse
Copy link

Description

Implements partial reload capability that allows the collector to reload only the affected components when configuration changes, rather than tearing down and rebuilding the entire service.

When a component is identified as needing to be recreated, all upstream components must also be
recreated because each component stores a reference to its next consumer. This PR uses that to only re-create what is required. If a processor is changed then the processors and the receivers are re-created for that pipeline, but not the exporter or connector. If a receiver is changed in a pipeline and doesn't get used in another pipeline then that pipeline is not touched.

For example, if an exporter config changes:

  • The exporter is recreated
  • The fanOutNode is rebuilt to reference the new exporter
  • All processors are recreated to get new references to the fanOutNode
  • The capabilitiesNode is rebuilt to reference the first processor
  • All receivers are recreated to get new references to the capabilitiesNode

Note: This is using reflect.DeepEqual to limit the change in this PR. Using something like hashing of the config objects and using hash comparison would result in even better performance. The benchmarking does show this as still being fast than full reload using reflect.DeepEqual.

Link to tracking issue

Fixes #5966

Testing

Unit tests where added that cover all scenarios that I could think of. Including simple scenarios like just a single receiver changing to more complex of a connector between pipelines (resulting in the first pipeline being fully restarted as the connector is the exporter, and the second pipeline only restarting the connector as its the receiver).

Benchmark testing was also added so it can be determined that partial reload is faster than full reload in all scenarios. The less that changes the faster it is, when it gets closer to all things changing it is still faster but only slightly.

Benchmark Partial Reload Full Reload Speedup
AddReceiver 0.63ms 1.26ms 2.01x
AddProcessor 0.62ms 1.26ms 2.04x
AddExporter 0.61ms 1.26ms 2.08x
AddPipeline 0.68ms 1.54ms 2.24x
FullChange (all new pipelines) 0.79ms 1.29ms 1.63x
LargeConfig (all components changed) 10.01ms 11.87ms 1.19x

Documentation

Documentation was added into the docs/internal-architecture.md. This includes how partial reload works, the feature flag information to turn it on, and the phases that it performs.

When only receiver configurations change, restart just the affected
receivers instead of tearing down and rebuilding the entire service.
This preserves running processors, exporters, connectors, and
extensions, reducing reload disruption and data loss.

The feature is gated behind the Alpha featuregate
"service.receiverPartialReload" and falls back to a full reload
when non-receiver sections of the config have changed.
Implement partial reload capability that allows the collector to reload
only the affected components when configuration changes, rather than
tearing down and rebuilding the entire service.

Pipeline structure: receiver -> capabilitiesNode -> processors -> fanOutNode -> exporters

When a component is recreated, all upstream components must also be
recreated because each component stores a reference to its next consumer.
For example, if an exporter config changes:
- The exporter is recreated
- The fanOutNode is rebuilt to reference the new exporter
- All processors are recreated to get new references to the fanOutNode
- The capabilitiesNode is rebuilt to reference the first processor
- All receivers are recreated to get new references to the capabilitiesNode

The reload process has five phases:
1. Identify changes: Determine which components changed and which pipelines are affected
2. Shutdown: Stop components that need recreation (receivers first, then downstream)
3. Update graph: Remove old nodes, create new nodes, wire edges
4. Build: Create new component instances (downstream first, then upstream)
5. Start: Start recreated components (downstream first, then receivers last)
@blakerouse blakerouse requested a review from a team as a code owner February 2, 2026 16:18
@blakerouse blakerouse requested a review from jmacd February 2, 2026 16:18
@blakerouse
Copy link
Author

I know this comes as a very large PR. I am willing to split this up in anyway you would like to get it merged. I just wanted to provide it as a full unit so it could be tested and get overall temperature of getting something like this added to the collector. I wanted to show how it would look and how it could be done in a way that would not risk the current operating mode of the collector.

@blakerouse blakerouse changed the title service/graph: Add partial reload support for pipeline components feat(service/graph): Add partial reload support for pipeline components Feb 2, 2026
@blakerouse blakerouse changed the title feat(service/graph): Add partial reload support for pipeline components feat(service/graph): add partial reload support for pipeline components Feb 2, 2026
@codecov
Copy link

codecov bot commented Feb 4, 2026

Codecov Report

❌ Patch coverage is 82.54777% with 137 lines in your changes missing coverage. Please review.
✅ Project coverage is 91.67%. Comparing base (20cbfc0) to head (53ec97f).
⚠️ Report is 4 commits behind head on main.

Files with missing lines Patch % Lines
service/internal/graph/graph.go 83.21% 76 Missing and 45 partials ⚠️
otelcol/collector.go 80.48% 4 Missing and 4 partials ⚠️
service/service.go 27.27% 8 Missing ⚠️

❌ Your patch check has failed because the patch coverage (82.54%) is below the target coverage (95.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files
@@            Coverage Diff             @@
##             main   #14513      +/-   ##
==========================================
- Coverage   91.83%   91.67%   -0.17%     
==========================================
  Files         677      678       +1     
  Lines       42705    43487     +782     
==========================================
+ Hits        39220    39865     +645     
- Misses       2429     2517      +88     
- Partials     1056     1105      +49     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Configuration reloading logic and customizability

1 participant