hmon: add heartbeat monitor#67
Conversation
License Check Results🚀 The license check job ran with the Bazel command: bazel run --lockfile_mode=error //:license-checkStatus: Click to expand output |
|
The created documentation from the pull request is available at: docu-html |
There was a problem hiding this comment.
Pull request overview
Adds a new heartbeat monitor (HMON) to the Rust health monitoring library and integrates it into the existing monitoring worker/supervisor notification flow.
Changes:
- Introduces
heartbeatmodule (monitor + atomic state) and integrates heartbeat monitors intoHealthMonitorBuilder/HealthMonitor. - Updates the monitor evaluation interface to accept a shared
hmon_starting_pointand wires it through the monitoring worker thread. - Refactors
SupervisorAPIClientinto a dedicated module with selectable implementations via Cargo features.
Reviewed changes
Copilot reviewed 13 out of 14 changed files in this pull request and generated 10 comments.
Show a summary per file
| File | Description |
|---|---|
| src/health_monitoring_lib/rust/worker.rs | Passes a shared HMON start instant into monitor evaluations; moves supervisor client trait out. |
| src/health_monitoring_lib/rust/supervisor_api_client/mod.rs | Adds feature-selected SupervisorAPIClient + implementation alias. |
| src/health_monitoring_lib/rust/supervisor_api_client/stub_supervisor_api_client.rs | New stub client implementation. |
| src/health_monitoring_lib/rust/supervisor_api_client/score_supervisor_api_client.rs | New SCORE client implementation. |
| src/health_monitoring_lib/rust/lib.rs | Adds heartbeat monitors to builder + start flow; uses new supervisor client impl selector. |
| src/health_monitoring_lib/rust/heartbeat/mod.rs | Exposes heartbeat monitor API. |
| src/health_monitoring_lib/rust/heartbeat/heartbeat_state.rs | Adds atomic packed heartbeat state and tests. |
| src/health_monitoring_lib/rust/heartbeat/heartbeat_monitor.rs | Implements heartbeat monitor logic + tests (incl. loom). |
| src/health_monitoring_lib/rust/deadline/deadline_monitor.rs | Adapts deadline evaluation to new evaluator signature and shared start instant. |
| src/health_monitoring_lib/rust/common.rs | Extends evaluation error types; adds duration_to_u32; updates evaluator trait signature. |
| src/health_monitoring_lib/Cargo.toml | Adds optional monitor_rs, loom target dep, and feature defaults. |
| src/health_monitoring_lib/BUILD | Enables score_supervisor_api_client feature in Bazel builds. |
| Cargo.toml | Updates workspace defaults and adds cfg(loom) lint configuration. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
src/health_monitoring_lib/rust/supervisor_api_client/score_supervisor_api_client.rs
Show resolved
Hide resolved
f07c0db to
e65f6a6
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 15 out of 16 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
e65f6a6 to
bd438c3
Compare
bd438c3 to
c5f4b84
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 15 out of 16 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
src/health_monitoring_lib/rust/supervisor_api_client/score_supervisor_api_client.rs
Show resolved
Hide resolved
src/health_monitoring_lib/rust/supervisor_api_client/stub_supervisor_api_client.rs
Outdated
Show resolved
Hide resolved
c5f4b84 to
b012f37
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 15 out of 16 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
src/health_monitoring_lib/rust/supervisor_api_client/score_supervisor_api_client.rs
Show resolved
Hide resolved
b012f37 to
862da21
Compare
862da21 to
59c92ee
Compare
59c92ee to
cf14efb
Compare
cf14efb to
67fe6cc
Compare
a8b11fc to
ef78463
Compare
ef78463 to
3816630
Compare
|
FYI: We are still dropping a refactor on this code today, as we discussed after review with @arkjedrz |
3816630 to
27766ce
Compare
27766ce to
77da130
Compare
77da130 to
125905c
Compare
Add heartbeat monitor HMON.
Rework heartbeat monitor state into two atomics.
- Add new constructor to `TimeRange`. - Rework `time_offset`. - Update state using `compare_exchange`. - Other code, docs, comments fixes.
Use swap to read and reset state in single operation.
| // Check current counter state. | ||
| let counter = snapshot.counter(); | ||
| // Disallow multiple heartbeats in same heartbeat cycle. | ||
| if counter > 1 { |
There was a problem hiding this comment.
Do I understand correctly that there are some hard constraints on the internal_processing_cycle time for this to work. It must be chosen so that evaluate() is always called at the right time such at it never spans across two alive intervals.
In case multiple heartbeat monitors are configured, I wonder if you can find a suitable internal_processing_cycle such that this works?
I guess it works when you choose a very frequent internal_processing_cycle. Is this understanding correct?
There was a problem hiding this comment.
Not sure if I understand Your diagram correctly:
- If it represents a single monitor, then a correct heartbeat is a beginning of a new processing cycle - everything after the first heartbeat is irrelevant in the first line.
- If it represents multiple monitors then heartbeats land at different monitors and their timelines are separate.
Internal processing cycle frequency must be twice as large as largest heartbeat frequency required by any monitor. Not sure if that's a Nyquist frequency, but the rough idea remains the same.
There was a problem hiding this comment.
Yes, its has to be < min(mon1.min, mon2.min,...)*2. This is what I thin we discussed at the beginning, that the internal cycle is separate from the supervisor api notification cycle to be able to tune it because it makse no sense to have ie hearbeat every 10ms and detecting failure only every 100ms. Should we change that idea, the more sophisticated alorithmi is needed, since You need to store up to N samples (then how N is defined etc)
There was a problem hiding this comment.
Ok understood, thanks for the clarification. This means use cases where there is no clear lower bound for the heartbeat are not supported by the heartbeat monitor and in this case you might use a deadline instead.
There was a problem hiding this comment.
Np. We can always decide later that we need extensions here, but this will not affect the user API
| // Check current counter state. | ||
| let counter = snapshot.counter(); | ||
| // Disallow multiple heartbeats in same heartbeat cycle. | ||
| if counter > 1 { |
There was a problem hiding this comment.
Yes, its has to be < min(mon1.min, mon2.min,...)*2. This is what I thin we discussed at the beginning, that the internal cycle is separate from the supervisor api notification cycle to be able to tune it because it makse no sense to have ie hearbeat every 10ms and detecting failure only every 100ms. Should we change that idea, the more sophisticated alorithmi is needed, since You need to store up to N samples (then how N is defined etc)
- Change `from_interval` params. - Add `new_internal` for later FFI usage. - Change message in `duration_to_int`.
Add heartbeat monitor HMON.
Resolves #68