-
Notifications
You must be signed in to change notification settings - Fork 1
feat(echo-cas): content-addressed blob store (Phase 1) #263
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…oryTier) Introduces echo-cas, a leaf crate providing the BlobStore trait and an in-memory implementation (MemoryTier) for content-addressed storage keyed by BLAKE3 hash. Sufficient for the in-browser website demo; disk/cold tiers and GC are deferred to Phase 3.
|
Warning Rate limit exceeded
⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. WalkthroughA new echo-cas crate is added to the workspace, providing a content‑addressed blob store: a 32‑byte BLAKE3 BlobHash type, blob_hash() helper, CasError enum, a sync object‑safe BlobStore trait (put / put_verified / get / has / pin / unpin), and an in‑memory MemoryTier implementation with optional byte budgeting and tests. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~23 minutes Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 12
🤖 Fix all issues with AI agents
In `@crates/echo-cas/Cargo.toml`:
- Line 19: Remove the dangling empty [dev-dependencies] header in Cargo.toml or
make it intentional: either delete the entire [dev-dependencies] section if
there are no dev dependencies, or populate it (e.g., add proptest/criterion) or
replace the header with a commented TODO (e.g., "# TODO: add dev-dependencies")
so the presence of the section is explicit; locate the orphaned
"[dev-dependencies]" token in the Cargo.toml and apply one of these options.
- Line 17: Update the thiserror dependency in Cargo.toml from the 1.x line to
the 2.x line: replace the dependency spec `thiserror = "1"` with `thiserror =
"2"` (or a caret/semver range like `^2`) and then run cargo update / cargo build
to verify proc-macro derives still compile for wasm32; if there is a documented
downstream blocker, add a comment next to the thiserror entry explaining why v2
cannot be used.
In `@crates/echo-cas/README.md`:
- Line 1: The README's SPDX header ("Apache-2.0 OR MIND-UCAL-1.0") conflicts
with the workspace Cargo.toml which uses license.workspace = true and the root
workspace license = "Apache-2.0"; fix by making the license declarations
consistent: either update the workspace/root Cargo.toml to include the
dual-license string "Apache-2.0 OR MIND-UCAL-1.0" (so all crates inherit the OR
license) or remove "MIND-UCAL-1.0" from the README SPDX header; check and update
the crate's Cargo.toml/license or the root workspace license field accordingly
to ensure Cargo.toml, license.workspace, and README.md match.
In `@crates/echo-cas/src/lib.rs`:
- Around line 60-62: Change BlobHash to hide its inner field (make the [u8; 32]
private) so callers cannot construct it directly; keep #[repr(transparent)] and
the derives, expose a single safe public constructor blob_hash(bytes: [u8;32])
(already present) for normal creation, add a from_bytes([u8;32]) constructor
intended for deserialization/wire use with a doc comment "caller asserts this
came from BLAKE3", and keep an as_bytes(&self) -> &[u8;32] accessor for
read-only access; if you intentionally allowed public construction, add a clear
doc warning on the BlobHash struct that constructing from non-BLAKE3 bytes is
legal but semantically meaningless.
- Around line 130-132: The current trait method signature fn get(&self, hash:
&BlobHash) -> Option<Arc<[u8]>> bakes an Arc-backed in-memory representation
into the trait contract; add a FIXME comment above the get signature (reference
symbols: get, BlobHash, Arc<[u8]>) stating FIXME(phase3): avoid forcing
Arc<[u8]> — consider returning bytes::Bytes or using an associated type (e.g.,
type Blob: AsRef<[u8]>) or another zero-copy representation in Phase 3, so
future DiskTier/ColdTier implementations can choose mmap/Bytes/Cow/streams
instead of allocating an Arc.
- Around line 71-78: The Display impl for BlobHash (fn fmt on BlobHash) does 32
separate write! calls via the Formatter; instead, build the full hex string
first and do a single write to the Formatter. In fmt, allocate a String with
capacity self.0.len()*2, use std::fmt::Write::write_fmt or write! into that
String in the loop to append each byte as "{:02x}", then call
f.write_str(&hex_string) once; this keeps the existing Display behavior but
reduces Formatter invocations while remaining safe (no unsafe code).
In `@crates/echo-cas/src/memory.rs`:
- Around line 223-234: Add a unit test that verifies pin idempotence: create a
MemoryTier via MemoryTier::new(), put some data with put(...) to get a hash,
call pin(&hash) twice, assert pinned_count() == 1, then unpin(&hash) and assert
is_pinned(&hash) is false; this ensures pin uses set semantics rather than a
reference count (refer to MemoryTier::new, put, pin, pinned_count, unpin,
is_pinned).
- Around line 154-168: Add a new unit test that exercises the happy path and
idempotence of MemoryTier::put_verified: create a MemoryTier via
MemoryTier::new, compute the correct hash for some test data using blob_hash,
call put_verified(hash, data) and assert it returns Ok, then assert store.len()
== 1, store.byte_count() == data.len(), and that store.get(&hash) returns the
stored bytes; finally call put_verified again with the same hash/data and assert
it still returns Ok and that len() and byte_count() have not increased
(idempotence). Ensure the test is named (e.g., put_verified_happy_path) and
mirrors the style of existing tests.
- Around line 27-32: Add a doc comment on MemoryTier explaining that byte_count
is a monotonically increasing tally of stored bytes for Phase 1 (no GC) and
therefore is_over_budget() will remain true once exceeded until explicit blob
removals are implemented in Phase 3; when implementing remove/evict in the
future you must decrement byte_count by the removed blob's length (compute via
the Arc<[u8]>::len() at removal time) so the budget can recover, and update the
is_over_budget() doc to state this invariant and recovery condition.
- Around line 98-106: The put method currently calls blob_hash then probes the
blobs map twice via contains_key and insert; change both put and put_verified to
use the HashMap entry API (e.g., blobs.entry(hash)) to perform a single lookup:
on Vacant(entry) insert Arc::from(bytes) and increment byte_count only when you
actually insert, and on Occupied return the existing hash without modifying
byte_count; apply the same entry-based logic in put_verified to avoid the double
lookup and ensure byte_count stays correct.
- Around line 108-118: The put_verified function currently does a double lookup
by calling self.blobs.contains_key(&computed) then insert, which can be replaced
with the HashMap entry API to do a single lookup and update byte_count only when
a new blob is actually inserted. Change put_verified to use
self.blobs.entry(computed) and on Vacant insert Arc::from(bytes) while
incrementing self.byte_count only for the new insertion (avoid incrementing when
the entry is Occupied); reference function name put_verified and map self.blobs
to locate the code to update.
- Around line 86-88: The is_over_budget method currently uses strict >
(self.max_bytes.is_some_and(|max| self.byte_count > max)), so a stored
byte_count equal to max_bytes is considered within budget; update the API docs
to explicitly state the budget is inclusive (i.e., exactly max_bytes is NOT over
budget) and mention the related symbols is_over_budget, with_limits, max_bytes,
and byte_count so callers understand the off-by-one behavior; if you prefer
exclusive semantics instead, change the comparison to >= in is_over_budget and
update tests/docs accordingly.
| @@ -0,0 +1,6 @@ | |||
| <!-- SPDX-License-Identifier: Apache-2.0 OR MIND-UCAL-1.0 --> | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
License mismatch: README declares Apache-2.0 OR MIND-UCAL-1.0 but Cargo.toml inherits workspace license Apache-2.0 only.
This is a compliance inconsistency. Cargo.toml → license.workspace = true → root workspace declares license = "Apache-2.0". This README introduces a second license (MIND-UCAL-1.0) that doesn't appear anywhere in the manifest chain. Either:
- Update the workspace/crate license field to match the dual-license, or
- Drop
MIND-UCAL-1.0from the README header.
Shipping contradictory license declarations is a legal landmine.
🤖 Prompt for AI Agents
In `@crates/echo-cas/README.md` at line 1, The README's SPDX header ("Apache-2.0
OR MIND-UCAL-1.0") conflicts with the workspace Cargo.toml which uses
license.workspace = true and the root workspace license = "Apache-2.0"; fix by
making the license declarations consistent: either update the workspace/root
Cargo.toml to include the dual-license string "Apache-2.0 OR MIND-UCAL-1.0" (so
all crates inherit the OR license) or remove "MIND-UCAL-1.0" from the README
SPDX header; check and update the crate's Cargo.toml/license or the root
workspace license field accordingly to ensure Cargo.toml, license.workspace, and
README.md match.
| fn put(&mut self, bytes: &[u8]) -> BlobHash { | ||
| let hash = blob_hash(bytes); | ||
| if self.blobs.contains_key(&hash) { | ||
| return hash; | ||
| } | ||
| self.byte_count += bytes.len(); | ||
| self.blobs.insert(hash, Arc::from(bytes)); | ||
| hash | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick | 🔵 Trivial
put computes hash then probes the map twice on the cold path.
contains_key followed by insert is two lookups. Use the entry API:
Proposed fix
fn put(&mut self, bytes: &[u8]) -> BlobHash {
let hash = blob_hash(bytes);
- if self.blobs.contains_key(&hash) {
- return hash;
- }
- self.byte_count += bytes.len();
- self.blobs.insert(hash, Arc::from(bytes));
+ if let std::collections::hash_map::Entry::Vacant(e) = self.blobs.entry(hash) {
+ self.byte_count += bytes.len();
+ e.insert(Arc::from(bytes));
+ }
hash
}Same applies to put_verified (Lines 113-115).
🤖 Prompt for AI Agents
In `@crates/echo-cas/src/memory.rs` around lines 98 - 106, The put method
currently calls blob_hash then probes the blobs map twice via contains_key and
insert; change both put and put_verified to use the HashMap entry API (e.g.,
blobs.entry(hash)) to perform a single lookup: on Vacant(entry) insert
Arc::from(bytes) and increment byte_count only when you actually insert, and on
Occupied return the existing hash without modifying byte_count; apply the same
entry-based logic in put_verified to avoid the double lookup and ensure
byte_count stays correct.
- Upgrade thiserror 1 → 2 (leaf crate, no compatibility concern) - Add put_verified happy-path test (store, retrieve, idempotence) - Add pin idempotence test (set semantics: double-pin keeps count at 1) - Document is_over_budget budget semantics (exclusive) and byte_count monotonic invariant (no removal until Phase 3 GC) - Remove empty [dev-dependencies] section - Update bytes 1.11.0 → 1.11.1 (RUSTSEC-2026-0007, workspace-wide)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@crates/echo-cas/src/memory.rs`:
- Around line 286-296: Add a test that asserts the critical boundary where
stored bytes equal the configured limit remains within budget: create a
MemoryTier via MemoryTier::with_limits(n), call put to store exactly n bytes,
assert !store.is_over_budget(), then put one more byte and assert
store.is_over_budget(); reference the existing with_limits, put, and
is_over_budget methods so the test will catch off-by-one regressions (e.g., `>`
vs `>=`) and ensure the budget semantics remain exclusive.
- Around line 110-120: In put_verified, avoid computing blob_hash(bytes) when
the expected blob is already present: first check
self.blobs.contains_key(&expected) and return Ok(()) if found; only then compute
let computed = blob_hash(bytes) and compare to expected, returning
CasError::HashMismatch on mismatch; if absent, insert into self.blobs and update
self.byte_count — this also removes the redundant second contains_key probe
before insert. Use the existing symbols put_verified, expected, blob_hash,
self.blobs, and self.byte_count to locate and change the logic.
- Make BlobHash inner field private; add from_bytes() for deserialization - Optimize Display impl: fold into pre-allocated String, single write_str - Add FIXME(phase3) on get() return type (Arc<[u8]> coupling) - Use entry API in put() to eliminate double map lookup - Optimize put_verified() to skip hashing when blob already stored - Add budget boundary test for exactly-at-limit off-by-one protection
Summary
echo-casleaf crate withBlobStoretrait andMemoryTierimplementationwasm32-unknown-unknowntarget checks clean)What's deferred
serdederives, async methods, iteration APITest plan
cargo check -p echo-cascargo test -p echo-cas— 14/14 passcargo clippy -p echo-cas -- -D warnings— zero warningscargo check -p echo-cas --target wasm32-unknown-unknowncargo doc -p echo-cas --no-deps