feat(node): subnet read_state cache by andrewbattat · Pull Request #9239 · dfinity/ic

andrewbattat · 2026-03-07T01:41:16Z

This change adds an in-memory cache in the boundary node for subnet read_state responses, keyed by (subnet_id, request paths).

When the same subnet/path request repeats within the TTL (default 30s), the BN serves the cached response directly instead of forwarding to the replica.

rs/boundary_node/ic_boundary/src/http/middleware/subnet_read_state_cache.rs

blind-oracle · 2026-03-18T12:38:32Z

rs/boundary_node/ic_boundary/src/http/middleware/subnet_read_state_cache.rs

+}
+
+#[derive(Clone)]
+struct CachedResponse {


We're missing response headers here completely.

Not sure it's a problem, but I'd just drop CachedResponse and use normal Response<Bytes> as a cache value. You will still have to split it into parts after fetching from the cache and reassemble as Response<AxumBody> but it's unavoidable probably.

Thanks Igor! WDYT? 8fb4414

Very good, body mapping is even more concise than .into_parts()/from_parts() 👍
And we avoid panic-prone .expect() too, which would probably never happen, but still.

blind-oracle · 2026-03-18T12:40:55Z

rs/boundary_node/ic_boundary/src/http/middleware/subnet_read_state_cache.rs

+
+    if response.status().is_success() {
+        let (parts, body) = response.into_parts();
+        let body_bytes = axum::body::to_bytes(body, 10 * 1024 * 1024)


Probably better to use buffer_body from ic-bn-lib and use a configurable limit from CLI instead of hardcoded one. Maybe cache_max_item_size that is applied to normal cached requests. Or add a separate option to your new CLI section (also add timeout too, needed for buffer_body).

8f6a260

Thank, Igor! Used 1mb for subnet_read_state_cache_max_item_size default instaed of cache_max_item_size because 10mb seemed too high, but I'm happy to reuse cache_max_item_size for simplicity.

👍 I guess we also need (separately) to improve the caching CLI opts (cache_...) to add the timeout there too. Currently it resorts to some default of 1 minute and not configurable. But that's out of the scope here.

blind-oracle · 2026-03-18T12:46:14Z

rs/boundary_node/ic_boundary/src/cli.rs

+    pub subnet_read_state_cache_ttl: Duration,
+
+    /// Maximum number of cached subnet read_state entries
+    #[clap(env, long, default_value = "1000")]


I think it would be nicer to use bytes here, not number of entries. You can look at how that's done in generic http cache https://github.com/dfinity/ic-bn-lib/blob/main/ic-bn-lib/src/http/cache.rs using weigh_entry

blind-oracle · 2026-03-18T12:48:42Z

rs/boundary_node/ic_boundary/src/cli.rs

+    /// TTL for cached subnet read_state responses.
+    /// Set to 0 to disable caching.
+    #[clap(env, long, default_value = "30s", value_parser = parse_duration)]
+    pub subnet_read_state_cache_ttl: Duration,


If we want this thing to be enabled by default, I'd add a separate bool flag e.g. subnet_read_state_cache_disable and rely on that to enable/disable it instead of zero TTL. This would be more explicit.

blind-oracle · 2026-03-18T13:42:57Z

rs/boundary_node/ic_boundary/src/http/middleware/subnet_read_state_cache.rs

+    }
+}
+
+fn build_cache_key(subnet_id: SubnetId, ctx: &RequestContext) -> Option<CacheKey> {


There's some path, @r-birkner told me, that leads to the metrics endpoint of the subnets. Probably it's worth figuring it out and bypass the cache if it's metrics (so that we don't serve stale metrics). Not sure if that's critical, but probably worth doing.

The requests that we should cache are the following: https://github.com/dfinity/agent-rs/blob/fa746467213003de2d8316240e55a1cb3a3281ad/ic-agent/src/agent/mod.rs#L1327-L1328

Ah yes, I forgot the ticket context, that it should be an opt-in set of paths that we cache.

blind-oracle · 2026-03-18T13:52:36Z

rs/boundary_node/ic_boundary/src/http/middleware/subnet_read_state_cache.rs

+            .time_to_live(ttl)
+            .build();
+
+        let hits = register_int_counter_with_registry!(


Once you change the limit from number of entries to memory - I'd also add a gauge that shows how much memory it consumes. And the number of entries too.

I think they're stored as atomics in Moka and should be cheap to read on each call.

https://docs.rs/moka/latest/moka/sync/struct.Cache.html#method.entry_count
https://docs.rs/moka/latest/moka/sync/struct.Cache.html#method.weighted_size

blind-oracle · 2026-03-19T09:02:53Z

rs/boundary_node/ic_boundary/src/http/middleware/subnet_read_state_cache.rs

+        let (parts, body) = response.into_parts();
+        let body_bytes = buffer_body(body, state.max_item_size, state.body_timeout)
+            .await
+            .map_err(|e| ErrorCause::Other(format!("failed to buffer response body: {e}")))?;


nit: there are special ErrorCause::UnableToReadBody/ErrorCause::BodyTimedOut and ErrorCause::PayloadTooLarge for such cases.

Idealy we should map the library error causes to our local ones, like this:

ic/rs/boundary_node/ic_boundary/src/http/middleware/process.rs

Lines 71 to 78 in d17cf0b

let body = buffer_body(body, MAX_REQUEST_BODY_SIZE, Duration::from_secs(60))

.await

.map_err(|e| match e {

HttpError::BodyReadingFailed(v) => ErrorCause::UnableToReadBody(v),

HttpError::BodyTooBig => ErrorCause::PayloadTooLarge(MAX_REQUEST_BODY_SIZE),

HttpError::BodyTimedOut => ErrorCause::BodyTimedOut,

_ => ErrorCause::Other(e.to_string()),

})?;

Better even to create a helper function that wraps buffer_body and returns correct error cause, to avoid duplication.

P.S.
Damn, here we also have a hardcoded timeout, need to improve :)

blind-oracle · 2026-03-19T09:09:53Z

rs/boundary_node/ic_boundary/src/http/middleware/subnet_read_state_cache.rs

+#[derive(Clone, Debug, PartialEq, Eq, Hash)]
+struct CacheKey {
+    subnet_id: SubnetId,
+    paths: ReadStatePaths,


nit: how big the paths can be? I wonder if we shouldn't store them directly in the cache and just calculate some hash over them before passing to moka.

We do smth like this here:
https://github.com/dfinity/ic-bn-lib/blob/b58540c74c224266be8fce3f7ebc41840dbe16c4/ic-bn-lib/src/http/cache.rs#L696-L718

Not sure it will save us a lot, though. Especially since we will scope the paths that we cache to only a small subset.

blind-oracle · 2026-03-19T09:13:27Z

rs/boundary_node/ic_boundary/src/http/middleware/subnet_read_state_cache.rs

+    pub hits: IntCounter,
+    pub misses: IntCounter,


nit: do these need to be public?

read state cache

6c76634

andrewbattat self-assigned this Mar 7, 2026

github-actions bot added the feat label Mar 7, 2026

IDX GitHub Automation and others added 9 commits March 7, 2026 01:45

Automatically fixing code for linting and formatting issues

9b73c0d

Merge branch 'master' into andrew/read-state-cache

5d59ab5

Merge branch 'master' into andrew/read-state-cache

b9543cc

Rename read_state_paths

6bae549

Merge branch 'master' into andrew/read-state-cache

723114b

Remove unnecessary Hash and PartialEq implementations

b6aa9fe

Add readability aliases

0b6216a

Replace unwraps with expects

e665df0

Refactor unit tests

3b1fad5

andrewbattat changed the title ~~feat(node): read state cache~~ feat(node): subnet read_state cache Mar 17, 2026

Fix cargo

d156686

andrewbattat marked this pull request as ready for review March 18, 2026 03:22

andrewbattat requested a review from a team as a code owner March 18, 2026 03:22

github-actions bot added the @node label Mar 18, 2026

blind-oracle reviewed Mar 18, 2026

View reviewed changes

rs/boundary_node/ic_boundary/src/http/middleware/subnet_read_state_cache.rs Show resolved Hide resolved

blind-oracle requested changes Mar 18, 2026

View reviewed changes

blind-oracle reviewed Mar 18, 2026

View reviewed changes

andrewbattat added 2 commits March 19, 2026 03:08

Replace CachedResponse with Response<Bytes>>

8fb4414

Use ic-bn-lib buffer_body with max_item_size and body_timeout

8f6a260

blind-oracle reviewed Mar 19, 2026

View reviewed changes

	let body = buffer_body(body, MAX_REQUEST_BODY_SIZE, Duration::from_secs(60))
	.await
	.map_err(\|e\| match e {
	HttpError::BodyReadingFailed(v) => ErrorCause::UnableToReadBody(v),
	HttpError::BodyTooBig => ErrorCause::PayloadTooLarge(MAX_REQUEST_BODY_SIZE),
	HttpError::BodyTimedOut => ErrorCause::BodyTimedOut,
	_ => ErrorCause::Other(e.to_string()),
	})?;

Conversation

andrewbattat commented Mar 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

blind-oracle Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

blind-oracle Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

blind-oracle Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

blind-oracle Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

andrewbattat commented Mar 7, 2026 •

edited

Loading

blind-oracle Mar 18, 2026 •

edited

Loading

blind-oracle Mar 19, 2026 •

edited

Loading

blind-oracle Mar 18, 2026 •

edited

Loading

blind-oracle Mar 19, 2026 •

edited

Loading