chore: migrate to golang tracer instead of openmetrics in prometheus#3959
chore: migrate to golang tracer instead of openmetrics in prometheus#3959mark-vass-konghq wants to merge 4 commits intomainfrom
Conversation
📝 WalkthroughWalkthroughConverted the runtime metrics collector into a lightweight placeholder with a new constructor; removed it from the TelemetryHandler signature; added config to optionally unregister Prometheus default Go/Process collectors; updated DI/wiring and minor logging refinements across services. Changes
Sequence Diagram(s)(Skipped — changes are refactor/DI adjustments, config, and lightweight lifecycle; no new multi-component sequential flow requiring diagram.) Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Suggested reviewers
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
45ec170 to
033341a
Compare
|
@CodeRabbit review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@app/config/telemetry.go`:
- Around line 265-266: The CollectorsEnabled field is inverted (true causes the
default Go/Process collectors to be unregistered) and the ConfigureTelemetry()
path doesn't explicitly set a default for
telemetry.metrics.exporters.prometheus.collectorsEnabled; rename the field to a
clearer name (e.g., DisableDefaultCollectors or UseOTelRuntimeMetrics) in the
telemetry config struct (replace CollectorsEnabled with the new name and update
all usages such as where collectors are unregistered in
app/common/telemetry.go), and update ConfigureTelemetry() to explicitly set a
sensible default for telemetry.metrics.exporters.prometheus.collectorsEnabled
(true/false as appropriate for backward compatibility) so the behavior is
discoverable and consistent.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: dd63c0eb-86a5-4c61-ae25-d86b4182fc1c
📒 Files selected for processing (7)
app/common/telemetry.goapp/config/telemetry.gocmd/balance-worker/wire_gen.gocmd/billing-worker/wire_gen.gocmd/notification-service/wire_gen.gocmd/server/wire_gen.gocmd/sink-worker/wire_gen.go
There was a problem hiding this comment.
🧹 Nitpick comments (1)
app/common/telemetry.go (1)
231-235: Consider only unregistering collectors when Prometheus is enabled.The unregistration happens even if
Prometheus.Enabledis false (checked later at line 239). While not a bug, it's a bit wasteful to modify the global registry when Prometheus metrics aren't being exposed anyway.🔧 Optional refactor to scope the unregistration
- if metricsConf.Exporters.Prometheus.DisableDefaultCollectors { - prometheus.Unregister(collectors.NewGoCollector()) - prometheus.Unregister(collectors.NewProcessCollector(collectors.ProcessCollectorOpts{})) - } - router := chi.NewRouter() router.Mount("/debug", middleware.Profiler()) if metricsConf.Exporters.Prometheus.Enabled { + if metricsConf.Exporters.Prometheus.DisableDefaultCollectors { + prometheus.Unregister(collectors.NewGoCollector()) + prometheus.Unregister(collectors.NewProcessCollector(collectors.ProcessCollectorOpts{})) + } router.Handle("/metrics", promhttp.Handler()) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@app/common/telemetry.go` around lines 231 - 235, The code currently unregisters default Prometheus collectors unconditionally via metricsConf.Exporters.Prometheus.DisableDefaultCollectors (calling prometheus.Unregister on collectors.NewGoCollector and collectors.NewProcessCollector) even when Prometheus is not enabled later; change the logic to check Prometheus.Enabled first and only perform the unregister when both metricsConf.Exporters.Prometheus.DisableDefaultCollectors is true and Prometheus.Enabled is true, so the global Prometheus registry is only modified when Prometheus metrics are actually being exposed.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@app/common/telemetry.go`:
- Around line 231-235: The code currently unregisters default Prometheus
collectors unconditionally via
metricsConf.Exporters.Prometheus.DisableDefaultCollectors (calling
prometheus.Unregister on collectors.NewGoCollector and
collectors.NewProcessCollector) even when Prometheus is not enabled later;
change the logic to check Prometheus.Enabled first and only perform the
unregister when both metricsConf.Exporters.Prometheus.DisableDefaultCollectors
is true and Prometheus.Enabled is true, so the global Prometheus registry is
only modified when Prometheus metrics are actually being exposed.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 6b82a969-d2e7-4a3c-9def-3ead917b4235
📒 Files selected for processing (2)
app/common/telemetry.goapp/config/telemetry.go
🚧 Files skipped from review as they are similar to previous changes (1)
- app/config/telemetry.go
c309633 to
78ad9e0
Compare
78ad9e0 to
45ef76d
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@app/common/telemetry.go`:
- Around line 297-312: The Wire provider for runtime metrics
(NewRuntimeMetricsCollector which calls runtime.Start) is being dropped unless
referenced; add a RuntimeMetricsCollector field to the Application struct in
both cmd/balance-worker/wire.go and cmd/notification-service/wire.go so Wire
keeps the provider in those graphs, then re-run wire to regenerate the DI code;
ensure the Application structs reference the RuntimeMetricsCollector type (not
just the marker) so NewRuntimeMetricsCollector is retained.
- Around line 231-234: The current block unregisters both
collectors.NewGoCollector() and collectors.NewProcessCollector(...); change it
to only unregister the Go runtime collector: keep the call that unregisters
collectors.NewProcessCollector(...) intact and remove/unskip unregistering
collectors.NewProcessCollector(collectors.ProcessCollectorOpts{}) so only
prometheus.Unregister(collectors.NewGoCollector()) runs when
metricsConf.Exporters.Prometheus.DisableDefaultCollectors is true; ensure the
condition still references
metricsConf.Exporters.Prometheus.DisableDefaultCollectors and leave
prometheus.Unregister(collectors.NewProcessCollector(...)) out so OS process
metrics remain exposed.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 92f52a18-ea5a-4647-bf1a-23fd0df3a217
📒 Files selected for processing (7)
app/common/telemetry.goapp/config/telemetry.gocmd/balance-worker/wire_gen.gocmd/billing-worker/wire_gen.gocmd/notification-service/wire_gen.gocmd/server/wire_gen.gocmd/sink-worker/wire_gen.go
🚧 Files skipped from review as they are similar to previous changes (2)
- cmd/sink-worker/wire_gen.go
- cmd/billing-worker/wire_gen.go
| if metricsConf.Exporters.Prometheus.DisableDefaultCollectors { | ||
| prometheus.Unregister(collectors.NewGoCollector()) | ||
| prometheus.Unregister(collectors.NewProcessCollector(collectors.ProcessCollectorOpts{})) | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
Does the OpenTelemetry Go package go.opentelemetry.io/contrib/instrumentation/runtimecollect OS process metrics comparable to Prometheus client_golangProcessCollector, or only Go runtime metrics?
💡 Result:
go.opentelemetry.io/contrib/instrumentation/runtime collects Go runtime metrics (e.g., go.memory.*, go.goroutine.count, go.processor.limit, GC-related values), sourced from Go’s runtime/metrics (and optionally some deprecated runtime.* metrics). It does not collect OS-level process metrics like CPU seconds, RSS, open FDs, etc. that Prometheus ProcessCollector provides. [1] [2]
Prometheus ProcessCollector explicitly exports process metrics such as CPU, memory, file-descriptor usage, and process start time. [3]
So: OpenTelemetry .../instrumentation/runtime == Go runtime metrics only; Prometheus ProcessCollector == OS process metrics. [1] [3]
Sources:
[1] https://pkg.go.dev/go.opentelemetry.io/contrib/instrumentation/runtime
[2] https://github.com/open-telemetry/opentelemetry-go-contrib/blob/instrumentation/runtime/v0.67.0/instrumentation/runtime/runtime.go
[3] https://pkg.go.dev/github.com/prometheus/client_golang/prometheus/collectors
Only unregister NewGoCollector() here—keep ProcessCollector().
OTel's runtime.Start() replaces Go runtime metrics, but it doesn't cover OS process metrics like CPU time, memory usage, and file descriptor counts. Those come exclusively from Prometheus's ProcessCollector, so disabling it would silently drop those signals from /metrics. If you need to disable the default Go collector, just remove that one.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@app/common/telemetry.go` around lines 231 - 234, The current block
unregisters both collectors.NewGoCollector() and
collectors.NewProcessCollector(...); change it to only unregister the Go runtime
collector: keep the call that unregisters collectors.NewProcessCollector(...)
intact and remove/unskip unregistering
collectors.NewProcessCollector(collectors.ProcessCollectorOpts{}) so only
prometheus.Unregister(collectors.NewGoCollector()) runs when
metricsConf.Exporters.Prometheus.DisableDefaultCollectors is true; ensure the
condition still references
metricsConf.Exporters.Prometheus.DisableDefaultCollectors and leave
prometheus.Unregister(collectors.NewProcessCollector(...)) out so OS process
metrics remain exposed.
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
openmeter/sink/sink.go (1)
720-725:⚠️ Potential issue | 🔴 CriticalData race: goroutine mutates the outer
errvariable without synchronization.The goroutine launched at line 720 assigns to
errfrom the enclosingRunfunction scope on line 723 (if err = json.Unmarshal(...)). Since the main consumer loop and other code paths also write to this same variable, concurrent modifications create a race condition that can corrupt error handling and clobber unrelated error values.The pattern in
openmeter/ingest/kafkaingest/collector.go(line 193) handles the identical Kafka stats scenario correctly by using a short declaration instead.🔧 Suggested fix
go func() { var stats kafkastats.Stats - if err = json.Unmarshal([]byte(e.String()), &stats); err != nil { + if err := json.Unmarshal([]byte(e.String()), &stats); err != nil { logger.WarnContext(ctx, "failed to unmarshal Kafka client stats", slog.Any("error", err)) + return } - ctx, cancel := context.WithTimeout(ctx, 5*time.Second) + statsCtx, cancel := context.WithTimeout(ctx, 5*time.Second) defer cancel() - s.kafkaMetrics.Add(ctx, &stats) + s.kafkaMetrics.Add(statsCtx, &stats) }()🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@openmeter/sink/sink.go` around lines 720 - 725, The goroutine handling Kafka stats is mutating the outer Run-scope variable err (via `if err = json.Unmarshal(...)`), causing a data race; change that assignment to use a new local error variable (e.g., `if err := json.Unmarshal(...)`) inside the anonymous function so the goroutine uses its own `err` and does not modify the enclosing `err`. Locate the anonymous func where `var stats kafkastats.Stats` and `json.Unmarshal([]byte(e.String()), &stats)` are used in sink.go and replace the inner `if err = ...` with a short declaration to eliminate the race.
♻️ Duplicate comments (1)
app/common/telemetry.go (1)
231-234:⚠️ Potential issue | 🟠 MajorSmall gotcha: keep the Prometheus process collector registered.
runtime.Start()only replaces Go runtime metrics. Line 233 unregisterscollectors.NewProcessCollector(...)too, which means/metricsloses process CPU, RSS, start-time, and FD signals when this flag is enabled. I’d only unregisterNewGoCollector()here.🛠️ Suggested tweak
if metricsConf.Exporters.Prometheus.DisableDefaultCollectors { prometheus.Unregister(collectors.NewGoCollector()) - prometheus.Unregister(collectors.NewProcessCollector(collectors.ProcessCollectorOpts{})) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@app/common/telemetry.go` around lines 231 - 234, The code unregisters both runtime and process collectors when metricsConf.Exporters.Prometheus.DisableDefaultCollectors is true; change the block so only collectors.NewGoCollector() is unregistered and leave collectors.NewProcessCollector(...) registered. Locate the conditional that checks metricsConf.Exporters.Prometheus.DisableDefaultCollectors and replace the prometheus.Unregister call that targets collectors.NewProcessCollector(collectors.ProcessCollectorOpts{}) so it is not called (only call prometheus.Unregister(collectors.NewGoCollector())). Ensure the rest of the export setup remains unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@openmeter/sink/sink.go`:
- Around line 720-725: The goroutine handling Kafka stats is mutating the outer
Run-scope variable err (via `if err = json.Unmarshal(...)`), causing a data
race; change that assignment to use a new local error variable (e.g., `if err :=
json.Unmarshal(...)`) inside the anonymous function so the goroutine uses its
own `err` and does not modify the enclosing `err`. Locate the anonymous func
where `var stats kafkastats.Stats` and `json.Unmarshal([]byte(e.String()),
&stats)` are used in sink.go and replace the inner `if err = ...` with a short
declaration to eliminate the race.
---
Duplicate comments:
In `@app/common/telemetry.go`:
- Around line 231-234: The code unregisters both runtime and process collectors
when metricsConf.Exporters.Prometheus.DisableDefaultCollectors is true; change
the block so only collectors.NewGoCollector() is unregistered and leave
collectors.NewProcessCollector(...) registered. Locate the conditional that
checks metricsConf.Exporters.Prometheus.DisableDefaultCollectors and replace the
prometheus.Unregister call that targets
collectors.NewProcessCollector(collectors.ProcessCollectorOpts{}) so it is not
called (only call prometheus.Unregister(collectors.NewGoCollector())). Ensure
the rest of the export setup remains unchanged.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 03e48eb3-a8c7-4b29-be45-50607966608f
📒 Files selected for processing (14)
app/common/openmeter_sinkworker.goapp/common/telemetry.goapp/config/telemetry.gocmd/balance-worker/wire.gocmd/balance-worker/wire_gen.gocmd/billing-worker/wire.gocmd/billing-worker/wire_gen.gocmd/notification-service/wire.gocmd/notification-service/wire_gen.gocmd/server/wire_gen.gocmd/sink-worker/wire.gocmd/sink-worker/wire_gen.goopenmeter/ingest/kafkaingest/collector.goopenmeter/sink/sink.go
Migrate to golang tracer instead of openmetrics in prometheus.
Summary by CodeRabbit
Refactor
Chores
Style