Skip to content

Conversation

@flyingrobots
Copy link
Owner

Summary

  • Adds echo-cas leaf crate with BlobStore trait and MemoryTier implementation
  • Content-addressed storage keyed by BLAKE3 hash — no dependency on warp-core
  • Advisory byte budget tracking for browser contexts (enforcement deferred to Phase 3 GC)
  • Set-based pin/unpin for retention roots (pre-pin on missing blobs is legal)
  • 14 unit tests covering round-trips, idempotence, hash verification, pinning invariants, and 8MB payload smoke test
  • WASM-compatible (wasm32-unknown-unknown target checks clean)

What's deferred

  • DiskTier / ColdTier (Phase 3)
  • Wire protocol: WANT / PROVIDE / FRAME (Phase 3)
  • GC sweep / eviction (Phase 3)
  • serde derives, async methods, iteration API

Test plan

  • cargo check -p echo-cas
  • cargo test -p echo-cas — 14/14 pass
  • cargo clippy -p echo-cas -- -D warnings — zero warnings
  • cargo check -p echo-cas --target wasm32-unknown-unknown
  • cargo doc -p echo-cas --no-deps
  • Pre-push hook passes (fmt, clippy, tests, rustdoc, patterns, determinism)

…oryTier)

Introduces echo-cas, a leaf crate providing the BlobStore trait and an
in-memory implementation (MemoryTier) for content-addressed storage keyed
by BLAKE3 hash. Sufficient for the in-browser website demo; disk/cold
tiers and GC are deferred to Phase 3.
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 7, 2026

Warning

Rate limit exceeded

@flyingrobots has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 23 minutes and 44 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

Walkthrough

A new echo-cas crate is added to the workspace, providing a content‑addressed blob store: a 32‑byte BLAKE3 BlobHash type, blob_hash() helper, CasError enum, a sync object‑safe BlobStore trait (put / put_verified / get / has / pin / unpin), and an in‑memory MemoryTier implementation with optional byte budgeting and tests.

Changes

Cohort / File(s) Summary
Workspace Configuration
Cargo.toml
Added crates/echo-cas to [workspace] members and added echo-cas = { version = "0.1.0", path = "crates/echo-cas" } to [workspace.dependencies].
Crate Manifest & Docs
crates/echo-cas/Cargo.toml, crates/echo-cas/README.md
New crate manifest with metadata and dependencies (blake3, thiserror); README with license/description for echo-cas.
Core API
crates/echo-cas/src/lib.rs
Introduces BlobHash (32‑byte wrapper + Display + as_bytes), blob_hash() helper, CasError::HashMismatch, and the BlobStore trait. Re-exports MemoryTier.
In‑Memory Tier
crates/echo-cas/src/memory.rs
Adds MemoryTier implementing BlobStore using HashMap<BlobHash, Arc<[u8]>> and HashSet for pins; optional max_bytes advisory budgeting; observables (len, byte_count, is_over_budget, pinned_count, is_pinned); comprehensive unit tests.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~23 minutes

Poem

🧩 A tiny crate with hashes bright,
Stores each blob by blake3 light,
Pin and proof, verify with care,
MemoryTier holds bytes in air —
Neat, tested, tidy; review beware.

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title 'feat(echo-cas): content-addressed blob store (Phase 1)' directly describes the main change: adding a new echo-cas crate with a content-addressed blob store implementation, matching the PR's primary objective.
Description check ✅ Passed The description provides a comprehensive summary of the changeset, covering the BlobStore trait, MemoryTier implementation, testing approach, deferred work, and verification performed—all directly related to the PR's scope.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch echo-cas-phase1

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 12

🤖 Fix all issues with AI agents
In `@crates/echo-cas/Cargo.toml`:
- Line 19: Remove the dangling empty [dev-dependencies] header in Cargo.toml or
make it intentional: either delete the entire [dev-dependencies] section if
there are no dev dependencies, or populate it (e.g., add proptest/criterion) or
replace the header with a commented TODO (e.g., "# TODO: add dev-dependencies")
so the presence of the section is explicit; locate the orphaned
"[dev-dependencies]" token in the Cargo.toml and apply one of these options.
- Line 17: Update the thiserror dependency in Cargo.toml from the 1.x line to
the 2.x line: replace the dependency spec `thiserror = "1"` with `thiserror =
"2"` (or a caret/semver range like `^2`) and then run cargo update / cargo build
to verify proc-macro derives still compile for wasm32; if there is a documented
downstream blocker, add a comment next to the thiserror entry explaining why v2
cannot be used.

In `@crates/echo-cas/README.md`:
- Line 1: The README's SPDX header ("Apache-2.0 OR MIND-UCAL-1.0") conflicts
with the workspace Cargo.toml which uses license.workspace = true and the root
workspace license = "Apache-2.0"; fix by making the license declarations
consistent: either update the workspace/root Cargo.toml to include the
dual-license string "Apache-2.0 OR MIND-UCAL-1.0" (so all crates inherit the OR
license) or remove "MIND-UCAL-1.0" from the README SPDX header; check and update
the crate's Cargo.toml/license or the root workspace license field accordingly
to ensure Cargo.toml, license.workspace, and README.md match.

In `@crates/echo-cas/src/lib.rs`:
- Around line 60-62: Change BlobHash to hide its inner field (make the [u8; 32]
private) so callers cannot construct it directly; keep #[repr(transparent)] and
the derives, expose a single safe public constructor blob_hash(bytes: [u8;32])
(already present) for normal creation, add a from_bytes([u8;32]) constructor
intended for deserialization/wire use with a doc comment "caller asserts this
came from BLAKE3", and keep an as_bytes(&self) -> &[u8;32] accessor for
read-only access; if you intentionally allowed public construction, add a clear
doc warning on the BlobHash struct that constructing from non-BLAKE3 bytes is
legal but semantically meaningless.
- Around line 130-132: The current trait method signature fn get(&self, hash:
&BlobHash) -> Option<Arc<[u8]>> bakes an Arc-backed in-memory representation
into the trait contract; add a FIXME comment above the get signature (reference
symbols: get, BlobHash, Arc<[u8]>) stating FIXME(phase3): avoid forcing
Arc<[u8]> — consider returning bytes::Bytes or using an associated type (e.g.,
type Blob: AsRef<[u8]>) or another zero-copy representation in Phase 3, so
future DiskTier/ColdTier implementations can choose mmap/Bytes/Cow/streams
instead of allocating an Arc.
- Around line 71-78: The Display impl for BlobHash (fn fmt on BlobHash) does 32
separate write! calls via the Formatter; instead, build the full hex string
first and do a single write to the Formatter. In fmt, allocate a String with
capacity self.0.len()*2, use std::fmt::Write::write_fmt or write! into that
String in the loop to append each byte as "{:02x}", then call
f.write_str(&hex_string) once; this keeps the existing Display behavior but
reduces Formatter invocations while remaining safe (no unsafe code).

In `@crates/echo-cas/src/memory.rs`:
- Around line 223-234: Add a unit test that verifies pin idempotence: create a
MemoryTier via MemoryTier::new(), put some data with put(...) to get a hash,
call pin(&hash) twice, assert pinned_count() == 1, then unpin(&hash) and assert
is_pinned(&hash) is false; this ensures pin uses set semantics rather than a
reference count (refer to MemoryTier::new, put, pin, pinned_count, unpin,
is_pinned).
- Around line 154-168: Add a new unit test that exercises the happy path and
idempotence of MemoryTier::put_verified: create a MemoryTier via
MemoryTier::new, compute the correct hash for some test data using blob_hash,
call put_verified(hash, data) and assert it returns Ok, then assert store.len()
== 1, store.byte_count() == data.len(), and that store.get(&hash) returns the
stored bytes; finally call put_verified again with the same hash/data and assert
it still returns Ok and that len() and byte_count() have not increased
(idempotence). Ensure the test is named (e.g., put_verified_happy_path) and
mirrors the style of existing tests.
- Around line 27-32: Add a doc comment on MemoryTier explaining that byte_count
is a monotonically increasing tally of stored bytes for Phase 1 (no GC) and
therefore is_over_budget() will remain true once exceeded until explicit blob
removals are implemented in Phase 3; when implementing remove/evict in the
future you must decrement byte_count by the removed blob's length (compute via
the Arc<[u8]>::len() at removal time) so the budget can recover, and update the
is_over_budget() doc to state this invariant and recovery condition.
- Around line 98-106: The put method currently calls blob_hash then probes the
blobs map twice via contains_key and insert; change both put and put_verified to
use the HashMap entry API (e.g., blobs.entry(hash)) to perform a single lookup:
on Vacant(entry) insert Arc::from(bytes) and increment byte_count only when you
actually insert, and on Occupied return the existing hash without modifying
byte_count; apply the same entry-based logic in put_verified to avoid the double
lookup and ensure byte_count stays correct.
- Around line 108-118: The put_verified function currently does a double lookup
by calling self.blobs.contains_key(&computed) then insert, which can be replaced
with the HashMap entry API to do a single lookup and update byte_count only when
a new blob is actually inserted. Change put_verified to use
self.blobs.entry(computed) and on Vacant insert Arc::from(bytes) while
incrementing self.byte_count only for the new insertion (avoid incrementing when
the entry is Occupied); reference function name put_verified and map self.blobs
to locate the code to update.
- Around line 86-88: The is_over_budget method currently uses strict >
(self.max_bytes.is_some_and(|max| self.byte_count > max)), so a stored
byte_count equal to max_bytes is considered within budget; update the API docs
to explicitly state the budget is inclusive (i.e., exactly max_bytes is NOT over
budget) and mention the related symbols is_over_budget, with_limits, max_bytes,
and byte_count so callers understand the off-by-one behavior; if you prefer
exclusive semantics instead, change the comparison to >= in is_over_budget and
update tests/docs accordingly.

@@ -0,0 +1,6 @@
<!-- SPDX-License-Identifier: Apache-2.0 OR MIND-UCAL-1.0 -->
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

License mismatch: README declares Apache-2.0 OR MIND-UCAL-1.0 but Cargo.toml inherits workspace license Apache-2.0 only.

This is a compliance inconsistency. Cargo.tomllicense.workspace = true → root workspace declares license = "Apache-2.0". This README introduces a second license (MIND-UCAL-1.0) that doesn't appear anywhere in the manifest chain. Either:

  1. Update the workspace/crate license field to match the dual-license, or
  2. Drop MIND-UCAL-1.0 from the README header.

Shipping contradictory license declarations is a legal landmine.

🤖 Prompt for AI Agents
In `@crates/echo-cas/README.md` at line 1, The README's SPDX header ("Apache-2.0
OR MIND-UCAL-1.0") conflicts with the workspace Cargo.toml which uses
license.workspace = true and the root workspace license = "Apache-2.0"; fix by
making the license declarations consistent: either update the workspace/root
Cargo.toml to include the dual-license string "Apache-2.0 OR MIND-UCAL-1.0" (so
all crates inherit the OR license) or remove "MIND-UCAL-1.0" from the README
SPDX header; check and update the crate's Cargo.toml/license or the root
workspace license field accordingly to ensure Cargo.toml, license.workspace, and
README.md match.

Comment on lines 98 to 106
fn put(&mut self, bytes: &[u8]) -> BlobHash {
let hash = blob_hash(bytes);
if self.blobs.contains_key(&hash) {
return hash;
}
self.byte_count += bytes.len();
self.blobs.insert(hash, Arc::from(bytes));
hash
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

put computes hash then probes the map twice on the cold path.

contains_key followed by insert is two lookups. Use the entry API:

Proposed fix
 fn put(&mut self, bytes: &[u8]) -> BlobHash {
     let hash = blob_hash(bytes);
-    if self.blobs.contains_key(&hash) {
-        return hash;
-    }
-    self.byte_count += bytes.len();
-    self.blobs.insert(hash, Arc::from(bytes));
+    if let std::collections::hash_map::Entry::Vacant(e) = self.blobs.entry(hash) {
+        self.byte_count += bytes.len();
+        e.insert(Arc::from(bytes));
+    }
     hash
 }

Same applies to put_verified (Lines 113-115).

🤖 Prompt for AI Agents
In `@crates/echo-cas/src/memory.rs` around lines 98 - 106, The put method
currently calls blob_hash then probes the blobs map twice via contains_key and
insert; change both put and put_verified to use the HashMap entry API (e.g.,
blobs.entry(hash)) to perform a single lookup: on Vacant(entry) insert
Arc::from(bytes) and increment byte_count only when you actually insert, and on
Occupied return the existing hash without modifying byte_count; apply the same
entry-based logic in put_verified to avoid the double lookup and ensure
byte_count stays correct.

- Upgrade thiserror 1 → 2 (leaf crate, no compatibility concern)
- Add put_verified happy-path test (store, retrieve, idempotence)
- Add pin idempotence test (set semantics: double-pin keeps count at 1)
- Document is_over_budget budget semantics (exclusive) and byte_count
  monotonic invariant (no removal until Phase 3 GC)
- Remove empty [dev-dependencies] section
- Update bytes 1.11.0 → 1.11.1 (RUSTSEC-2026-0007, workspace-wide)
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@crates/echo-cas/src/memory.rs`:
- Around line 286-296: Add a test that asserts the critical boundary where
stored bytes equal the configured limit remains within budget: create a
MemoryTier via MemoryTier::with_limits(n), call put to store exactly n bytes,
assert !store.is_over_budget(), then put one more byte and assert
store.is_over_budget(); reference the existing with_limits, put, and
is_over_budget methods so the test will catch off-by-one regressions (e.g., `>`
vs `>=`) and ensure the budget semantics remain exclusive.
- Around line 110-120: In put_verified, avoid computing blob_hash(bytes) when
the expected blob is already present: first check
self.blobs.contains_key(&expected) and return Ok(()) if found; only then compute
let computed = blob_hash(bytes) and compare to expected, returning
CasError::HashMismatch on mismatch; if absent, insert into self.blobs and update
self.byte_count — this also removes the redundant second contains_key probe
before insert. Use the existing symbols put_verified, expected, blob_hash,
self.blobs, and self.byte_count to locate and change the logic.

CI Bot added 2 commits February 12, 2026 12:37
- Make BlobHash inner field private; add from_bytes() for deserialization
- Optimize Display impl: fold into pre-allocated String, single write_str
- Add FIXME(phase3) on get() return type (Arc<[u8]> coupling)
- Use entry API in put() to eliminate double map lookup
- Optimize put_verified() to skip hashing when blob already stored
- Add budget boundary test for exactly-at-limit off-by-one protection
@flyingrobots flyingrobots merged commit 8ff8017 into main Feb 12, 2026
18 of 19 checks passed
@flyingrobots flyingrobots deleted the echo-cas-phase1 branch February 12, 2026 21:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant