chore: perf fixes by tac0turtle · Pull Request #33 · evstack/apex

tac0turtle · 2026-02-20T10:11:15Z

Overview

Summary by CodeRabbit

New Features
- Configurable HTTP read/write timeouts and RPC retry with backoff.
- Per-subscription max-subscriber limits with error handling.
- New metrics and profiling configuration.
Bug Fixes
- Improved error propagation when subscription creation fails.
Chores
- Increased sync throughput via larger batch size and higher concurrency; updated default config values.
Documentation
- Updated running guide to reflect new RPC timeouts and subscription settings.
Tests
- Added tests for retry behavior and transient-RPC error classification; adjusted tests for notifier changes.

Add an S3Store implementation that batches multiple heights into single S3 objects (chunks) to reduce object count and request costs. Writes are buffered in memory and flushed at checkpoint boundaries (SetSyncState). - New S3Config with bucket, prefix, region, endpoint, chunk_size - Store factory in main.go (sqlite/s3 switch) - 13 mock-based unit tests covering round-trips, chunk boundaries, buffer read-through, idempotent writes, commitment index lookups - Config validation for S3-specific required fields Also includes profiling support, backfill progress tracking, and metrics formatting fixes from the working tree. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Document SetMetrics must-call-before-use contract - Use dedicated nsMu for PutNamespace to avoid holding buffer mutex during S3 network I/O - Add flushMu to serialize flush() calls, preventing concurrent read-modify-write races on S3 chunk objects - Pre-allocate merged slice in flushHeaderChunk (lint fix) - Guard division by zero in backfill progress when done==0 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

coderabbitai · 2026-02-20T10:11:37Z

Caution

Review failed

The pull request is closed.

📝 Walkthrough

Walkthrough

Introduces subscriber capacity limits and enforcement in the notifier, updates notifier APIs to return errors on subscribe failures, adds HTTP server Read/Write timeouts sourced from config, expands configuration options and validation, and implements retry-with-backoff for transient RPC errors in the fetcher.

Changes

Cohort / File(s)	Summary
Notifier Core `pkg/api/notifier.go`, `pkg/api/notifier_test.go`	NewNotifier now accepts `maxSubscribers`; Notifier gains `maxSubscribers` field; `Subscribe` returns `(*Subscription, error)` and enforces subscriber limits; tests updated and a TestNotifierMaxSubscribers added.
Service API `pkg/api/service.go`, `pkg/api/service_test.go`	BlobSubscribe/HeaderSubscribe signatures changed to return `(*Subscription, error)` and propagate subscription errors; tests updated to call NewNotifier with maxSubscribers.
gRPC Handlers & Tests `pkg/api/grpc/blob_service.go`, `pkg/api/grpc/header_service.go`, `pkg/api/grpc/server_test.go`	Subscription initiation now checks and returns errors (gRPC ResourceExhausted) on subscribe failure; tests updated for NewNotifier signature.
JSON-RPC Handlers & Tests `pkg/api/jsonrpc/blob.go`, `pkg/api/jsonrpc/header.go`, `pkg/api/jsonrpc/server_test.go`	Subscribe paths now handle and propagate subscription errors; server tests updated for NewNotifier signature.
CLI / HTTP Server `cmd/apex/main.go`, `pkg/api/health_test.go`	HTTP server `ReadTimeout` and `WriteTimeout` set from `cfg.RPC`; NewNotifier invocations updated to include maxSubscribers in initialization/tests.
Configuration & Validation `config/config.go`, `config/load.go`, `config/load_test.go`	Added `RPC.ReadTimeout`/`WriteTimeout`, `Subscription.MaxSubscribers`; Sync defaults changed (BatchSize 64→256, Concurrency 4→12); added Metrics/Profiling sections; introduced dedicated validators (`validateRPC`, `validateSync`, `validateSubscription`, `validateMetrics`, `validateProfiling`); default values updated and tests adjusted.
Docs `docs/running.md`	Documented new RPC timeouts and updated sync/subscription default values.
Fetcher Retry Logic `pkg/fetch/celestia_node.go`, `pkg/fetch/celestia_node_test.go`	Added `callRawWithRetry` with backoff/jitter, `isTransientRPCError`, and default RPC retry constants; GetHeader/GetBlobs/GetNetworkHead/GetProof routed through retry wrapper; tests for transient error classification and retry behavior added.

Sequence Diagram(s)

sequenceDiagram
  participant Client
  participant gRPC as gRPC Handler
  participant Service
  participant Notifier
  Client->>gRPC: Subscribe request
  gRPC->>Service: call BlobSubscribe()/HeaderSubscribe()
  Service->>Notifier: Subscribe(namespaces)
  alt maxSubscribers exceeded
    Notifier-->>Service: error (ResourceExhausted)
    Service-->>gRPC: return error
    gRPC-->>Client: ResourceExhausted
  else success
    Notifier-->>Service: subscription
    Service-->>gRPC: subscription returned
    gRPC-->>Client: stream events
  end

sequenceDiagram
  participant Fetcher
  participant RPCNode
  participant Logger
  Fetcher->>RPCNode: perform RPC call
  alt transient error (timeout/EOF/net)
    RPCNode-->>Fetcher: error
    Fetcher->>Logger: log retry attempt
    Fetcher->>RPCNode: retry with backoff (retryDelay)
    RPCNode-->>Fetcher: response or final error
  else success
    RPCNode-->>Fetcher: response
  end
  Fetcher-->>Caller: response or error

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Feat/phase2 api layer #29: Related changes to the notifier API (NewNotifier signature and Subscribe behavior) that overlap with these notifier modifications.
feat: implement core storage, fetcher, and sync engine (Phase 1) #27: Related modifications to the fetcher and RPC call patterns; touches the same celestia_node fetch methods that now use retry logic.

Poem

🐰 I hopped through code with nimble paws and cheer,
I capped the crowds so signals stay sincere,
I set the clocks for HTTP to time the sun,
And taught the RPCs to try again — just one, two, then done. ✨

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 5.66% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check	❓ Inconclusive	The title 'chore: perf fixes' is vague and generic, using non-descriptive terms that don't convey meaningful information about the changeset's main objective.	Replace with a more specific title describing the primary change, such as 'perf: add subscriber limits and HTTP timeouts' or similar that captures the core improvements.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings (stacked PR)
📝 Generate docstrings (commit on current branch)

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch marko/fixes

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist · 2026-02-20T10:11:57Z

Summary of Changes

Hello @tac0turtle, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly improves the performance, robustness, and configurability of the Apex indexer. Key changes include the introduction of an S3-compatible storage backend for scalable data persistence, integration of pprof for performance diagnostics, and enhanced RPC client resilience with retry mechanisms. Additionally, API subscription management is made more robust with configurable limits, and default synchronization parameters have been tuned for better indexing speed. These updates collectively aim to make Apex more efficient and reliable in various operational environments.

Highlights

S3-Compatible Storage Backend: Introduced a new S3-compatible object storage backend, allowing Apex to persist indexed data to services like AWS S3, MinIO, or Cloudflare R2, providing a scalable and robust alternative to SQLite.
Performance Profiling (pprof): Added support for Go's pprof profiling endpoints, enabling developers to diagnose and optimize performance bottlenecks by exposing CPU, memory, and other profiles via an HTTP server.
Configurable RPC Timeouts: Implemented configurable read and write timeouts for the HTTP RPC server, enhancing control over API server behavior and resilience against slow or unresponsive clients.
API Subscription Limits: Added a maximum subscriber limit to the API notifier, preventing resource exhaustion from an excessive number of concurrent API subscriptions.
Robust Celestia Node Fetching: Enhanced the CelestiaNodeFetcher with retry logic for transient RPC errors, improving the reliability of data fetching from Celestia nodes.
Optimized Sync Defaults: Increased the default batch_size and concurrency settings for the sync coordinator, leading to faster backfill operations and improved indexing performance.
Hex String Utility: Added a utility function StripHexPrefix to gracefully handle hex-encoded strings that may or may not include a '0x' prefix, improving input flexibility for blob commands.
Enhanced Backfill Metrics: Introduced new Prometheus metrics to track the duration and errors of individual stages within the backfill process, providing deeper insights into indexing performance.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

cmd/apex/blob_cmd.go
- Imported the types package.
- Applied types.StripHexPrefix to all hex.DecodeString calls for blob arguments, allowing hex strings to optionally include a '0x' prefix.
cmd/apex/main.go
- Imported the profile package.
- Added a setupProfiling function to initialize and start the pprof HTTP server based on configuration.
- Refactored data source and store initialization into openDataSource and openStore helper functions.
- Updated the api.NewNotifier call to include the new MaxSubscribers configuration.
- Configured HTTP server ReadTimeout and WriteTimeout based on new RPC configuration parameters.
- Integrated the profiling server into the gracefulShutdown function.
config.yaml
- Added a new profiling section with enabled and listen_addr options.
- Updated the data_source celestia_node_url example.
- Updated sync batch_size and concurrency defaults to 256 and 12 respectively.
config/config.go
- Added ProfilingConfig struct for pprof settings.
- Expanded StorageConfig to include a Type field and an S3Config pointer for S3 backend configuration.
- Added S3Config struct for S3-compatible object store settings (Bucket, Prefix, Region, Endpoint, ChunkSize).
- Added ReadTimeout and WriteTimeout fields to RPCConfig.
- Added MaxSubscribers field to SubscriptionConfig.
- Updated DefaultConfig to reflect new defaults for Sync (BatchSize, Concurrency), Subscription (MaxSubscribers), and RPC (ReadTimeout, WriteTimeout), and added default Profiling settings.
config/load.go
- Updated the config.yaml example with new storage and rpc fields, including S3 configuration options and HTTP timeouts.
- Added a new profiling section to the example configuration.
- Updated sync batch_size and concurrency defaults in the example.
- Added max_subscribers to the subscription section in the example.
- Introduced validateStorage, validateRPC, validateSync, validateSubscription, validateMetrics, and validateProfiling functions for more granular configuration validation.
- Refactored the main validate function to use the new granular validation functions.
config/load_test.go
- Added TestDefaultConfigProfilingDefaults to verify default profiling and sync settings.
- Added TestLoadProfilingEnabledRequiresListenAddr to ensure profiling.listen_addr is required when profiling is enabled.
docs/running.md
- Updated the example configuration for sync.batch_size to 256 and sync.concurrency to 12.
go.mod
- Added new dependencies for AWS SDK v2 (aws-sdk-go-v2, config, credentials, service/s3).
go.sum
- Updated with checksums for the newly added AWS SDK v2 dependencies.
pkg/api/grpc/blob_service.go
- Modified Subscribe method to return an error if the maximum number of subscribers is reached.
pkg/api/grpc/header_service.go
- Modified Subscribe method to return an error if the maximum number of subscribers is reached.
pkg/api/grpc/server_test.go
- Updated all api.NewNotifier calls to include the new maxSubscribers argument (set to 1024).
pkg/api/health_test.go
- Updated NewNotifier calls to include the new maxSubscribers argument (set to 1024).
pkg/api/jsonrpc/blob.go
- Modified Subscribe method to return an error if the maximum number of subscribers is reached.
pkg/api/jsonrpc/header.go
- Modified Subscribe method to return an error if the maximum number of subscribers is reached.
pkg/api/jsonrpc/server_test.go
- Updated all api.NewNotifier calls to include the new maxSubscribers argument (set to 1024).
pkg/api/notifier.go
- Added maxSubscribers field to the Notifier struct.
- Updated NewNotifier constructor to accept and initialize maxSubscribers.
- Modified the Subscribe method to check against maxSubscribers and return an error if the limit is exceeded.
pkg/api/notifier_test.go
- Updated all NewNotifier calls to include the new maxSubscribers argument (set to 1024 or a specific test value).
- Added TestNotifierMaxSubscribers to verify the new subscriber limit functionality.
pkg/api/service.go
- Modified BlobSubscribe and HeaderSubscribe methods to return an error, propagating potential maxSubscribers errors from the notifier.
pkg/api/service_test.go
- Updated all NewNotifier calls to include the new maxSubscribers argument (set to 1024).
pkg/fetch/celestia_node.go
- Added defaultRPCTimeout, defaultRPCMaxRetries, and defaultRPCRetryDelay constants.
- Implemented callRawWithRetry to add retry logic with exponential backoff and jitter for transient RPC errors.
- Updated GetHeader and GetNetworkHead to utilize the new callRawWithRetry function.
- Added isTransientRPCError helper function to identify retryable network errors.
- Optimized namespacesToBytes to avoid unnecessary byte copying.
pkg/fetch/celestia_node_test.go
- Added TestIsTransientRPCError to validate the transient error detection logic.
- Added TestGetBlobsRetriesTransient to verify the retry mechanism for RPC calls.
pkg/metrics/metrics.go
- Extended the Recorder interface with ObserveBackfillStageDuration and IncBackfillStageErrors methods.
- Updated nopRecorder to include the new metric methods.
- Added backfillStageDur (histogram) and backfillStageErr (counter) Prometheus metrics to PromRecorder to track backfill stage performance and errors.
- Implemented ObserveBackfillStageDuration and IncBackfillStageErrors for PromRecorder.
pkg/metrics/metrics_test.go
- Updated TestPromRecorderRegisters to include calls for the new backfill stage metrics and verify their presence.
- Updated TestNopRecorderDoesNotPanic to include calls for the new backfill stage metrics.
pkg/profile/server.go
- Added a new file implementing a Server for exposing pprof endpoints over HTTP, including NewServer, Start, and Shutdown methods.
pkg/store/s3.go
- Added a new file implementing S3Store, an S3-compatible object storage backend.
- Implemented PutBlobs, PutHeader, PutNamespace, GetBlob, GetBlobs, GetBlobByCommitment, GetHeader, GetNamespaces, GetSyncState, SetSyncState, and Close methods for S3 storage.
- Introduced buffering for writes and flushing at checkpoint boundaries to optimize S3 interactions.
- Included helper functions for S3 object key management, encoding/decoding, and error handling (e.g., isNotFound).
pkg/store/s3_test.go
- Added a new file containing comprehensive tests for the S3Store implementation, including tests for blob and header storage, commitment indexing, sync state, namespaces, flush behavior, idempotency, chunk boundaries, and buffer read-through.
pkg/sync/backfill.go
- Added metrics.Recorder to the Backfiller struct.
- Implemented periodic progress reporting during backfill operations, showing completion percentage, rate, and estimated time remaining.
- Observed batch duration metrics after each batch completion.
- Integrated metrics.ObserveBackfillStageDuration and metrics.IncBackfillStageErrors to track performance and errors for fetch_header, store_header, fetch_blobs, store_blobs, and observer stages.
pkg/sync/coordinator.go
- Passed the metrics recorder to the Backfiller instance during initialization.
pkg/types/types.go
- Added StripHexPrefix function to remove '0x' or '0X' prefixes from hex strings.
- Updated NamespaceFromHex to use StripHexPrefix, making it more flexible for input formats.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces significant performance and scalability improvements, including an S3 storage backend, RPC call retries, and more detailed metrics and profiling. The code is well-structured, with good refactoring of the main application logic and configuration validation. I've identified a few areas for improvement, mainly concerning potential race conditions in the new S3 store when used in a multi-instance setup, and a minor weakness in the RPC retry jitter implementation.

I am having trouble creating individual review comments. Click here to see my feedback.

pkg/store/s3.go (619-650)

The flushBlobChunk function implements a read-modify-write pattern on S3 objects. This is not atomic and can lead to race conditions and data loss if multiple indexer processes are configured to write to the same S3 bucket and prefix. The flushMu mutex only prevents races within a single process.

A more robust approach for handling concurrent writes to S3 is to use conditional puts with ETags. The pattern would be:

GetObject to get the current object and its ETag.
Perform the merge in memory.
PutObject with the If-Match condition set to the ETag from step 1.
If the PutObject fails with a PreconditionFailed error, it means another process modified the object. Retry the entire read-modify-write cycle.

This same issue applies to flushHeaderChunk and PutNamespace.

cmd/apex/main.go (216-221)

Using a type switch to set metrics on different store implementations works, but it's not easily extensible. If more store types are added in the future, this switch statement will need to be modified.

A more maintainable approach would be to define an optional interface, for example:

type MeteredStore interface {
    SetMetrics(metrics.Recorder)
}

Then you could simply do a type assertion:

if ms, ok := db.(MeteredStore); ok {
    ms.SetMetrics(rec)
}

This decouples runIndexer from the concrete store implementations.

config.yaml (9)

The example configuration contains a hardcoded IP address 100.97.172.78:36668. While this is a private IP (in the Carrier-Grade NAT range), default configuration files should ideally use localhost or a placeholder like your-celestia-node-ip:26658 to avoid confusion and prevent accidental connection attempts to internal infrastructure.

  celestia_node_url: "http://localhost:26658"

pkg/fetch/celestia_node.go (301-312)

The current jitter implementation in retryDelay uses time.Now().UnixNano(). This is not a good source of randomness, especially if multiple goroutines call this function at nearly the same time, as they might get very similar jitter values, partially defeating the purpose of jitter. Using math/rand would provide better random distribution. Remember to add import "math/rand" and seed it once at application startup.

func retryDelay(attempt int) time.Duration {
	base := defaultRPCRetryDelay * time.Duration(1<<attempt)
	jitterCap := base / 2
	if jitterCap <= 0 {
		return base
	}
	// Note: math/rand should be seeded once at application startup.
	jitter := rand.Int63n(int64(jitterCap))
	return base + time.Duration(jitter)
}

tac0turtle · 2026-02-20T12:04:22Z

/gemini review

tac0turtle · 2026-02-20T12:04:35Z

@CodeRabbit review

coderabbitai · 2026-02-20T12:04:46Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

gemini-code-assist

Code Review

The pull request introduces several important performance and reliability enhancements, such as HTTP timeouts, increased synchronization concurrency, and a robust retry mechanism for RPC calls. These changes significantly improve the indexer's resilience under load. I have identified a redundant validation check in the configuration loading logic and a minor improvement for the jitter implementation in the retry logic.

gemini-code-assist · 2026-02-20T12:06:48Z

config/load.go

+	if err := validateProfiling(&cfg.Profiling); err != nil {
+		return err
 	}


The call to validateProfiling introduces redundant validation because the same check is performed manually immediately following this block (lines 208-210). The manual check should be removed in favor of the new validation function.

gemini-code-assist · 2026-02-20T12:06:48Z

pkg/fetch/celestia_node.go

+	n := time.Now().UnixNano()
+	if n < 0 {
+		n = -n
+	}
+	return base + time.Duration(n%int64(jitterCap))


Using time.Now().UnixNano() % jitterCap is not a recommended way to generate jitter, as it can produce predictable patterns if calls are frequent. It is better to use a proper random number generator. Since this project targets Go 1.22+, math/rand/v2 is preferred for its improved API and performance.

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)

docs/running.md (1)
70-81: ⚠️ Potential issue | 🟡 Minor

Full Config Reference is missing newly added fields.

The "Full Config Reference" section doesn't document three new config fields introduced in this PR:

rpc.read_timeout and rpc.write_timeout (defaulting to 30s in config/config.go)

subscription.max_subscribers (defaulting to 1024 in config/config.go)

These are present in the defaultConfigYAML template in config/load.go but omitted from the user-facing docs here.
Proposed fix
 rpc:
   listen_addr: ":8080"                             # JSON-RPC + health
   grpc_listen_addr: ":9090"                        # gRPC
+  read_timeout: 30                                 # HTTP read timeout in seconds
+  write_timeout: 30                                # HTTP write timeout in seconds
 
 sync:
   start_height: 0                                  # 0 = genesis
   batch_size: 256                                  # headers per backfill batch
   concurrency: 12                                  # concurrent fetch workers
 
 subscription:
   buffer_size: 64                                  # event buffer per subscriber
+  max_subscribers: 1024                            # maximum concurrent subscribers
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/running.md` around lines 70 - 81, Docs missing three new config fields
that exist in defaultConfigYAML; update the "Full Config Reference" in
docs/running.md to include rpc.read_timeout and rpc.write_timeout (defaults from
config/config.go set to 30s) and subscription.max_subscribers (default 1024),
matching the entries in config/load.go's defaultConfigYAML and using the same
names and default values so the user-facing docs align with the actual defaults.
config/load.go (1)
205-210: 🛠️ Refactor suggestion | 🟠 Major

Duplicate profiling validation.

The profiling.listen_addr check on lines 208–210 is already performed inside validateProfiling (lines 264–268), which is called on line 205. The inline check is now dead code.
Proposed fix
 	if err := validateProfiling(&cfg.Profiling); err != nil {
 		return err
 	}
-	if cfg.Profiling.Enabled && cfg.Profiling.ListenAddr == "" {
-		return fmt.Errorf("profiling.listen_addr is required when profiling is enabled")
-	}
 	if !validLogLevels[cfg.Log.Level] {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@config/load.go` around lines 205 - 210, The code performs duplicate
validation of profiling.listen_addr: validateProfiling(&cfg.Profiling) already
enforces that Profiling.ListenAddr is set when Profiling.Enabled is true, so
remove the redundant inline check that follows the call (the if block
referencing cfg.Profiling.Enabled && cfg.Profiling.ListenAddr == "" and its
fmt.Errorf return). Keep the validateProfiling function as the single validation
point for Profiling configuration (refer to validateProfiling and cfg.Profiling
fields) and ensure no other duplicate checks remain in load.go.
pkg/fetch/celestia_node.go (1)
183-198: ⚠️ Potential issue | 🟡 Minor

Included cannot use callRawWithRetry due to return type mismatch with the ProofForwarder interface.

The ProofForwarder.Included signature returns (bool, error), but callRawWithRetry only wraps functions returning (json.RawMessage, error). This is a hard constraint from the interface contract.

However, GetProof does return json.RawMessage and could potentially benefit from the retry pattern like GetHeader, GetBlobs, and GetNetworkHead. If proof operations intentionally skip retry logic (e.g., different semantics or latency requirements), a comment would clarify the design.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/fetch/celestia_node.go` around lines 183 - 198, The Included method
currently cannot use callRawWithRetry because callRawWithRetry only wraps
functions returning (json.RawMessage, error) while ProofForwarder.Included must
return (bool, error); update the code by either (A) adding a thin retry helper
(e.g., callBoolWithRetry) that mirrors callRawWithRetry but handles (bool,
error) and use it from CelestiaNodeFetcher.Included, or (B) if proofs
intentionally skip retries, add a short clarifying comment above
CelestiaNodeFetcher.Included explaining why retry semantics are omitted;
additionally consider applying callRawWithRetry to GetProof (which returns
json.RawMessage) to reuse retry logic used by GetHeader/GetBlobs/GetNetworkHead.
Ensure you reference and update the functions/methods: callRawWithRetry,
CelestiaNodeFetcher.Included, CelestiaNodeFetcher.GetProof, and
ProofForwarder.Included.

🧹 Nitpick comments (4)

config/load_test.go (1)
10-26: Test name no longer reflects its scope; new defaults lack coverage.

TestDefaultConfigProfilingDefaults now also asserts sync defaults. Consider renaming it (e.g., TestDefaultConfigDefaults) and adding assertions for the other new fields: RPC.ReadTimeout, RPC.WriteTimeout, and Subscription.MaxSubscribers.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@config/load_test.go` around lines 10 - 26, The test
TestDefaultConfigProfilingDefaults now checks sync defaults too and its name is
misleading; rename the test (e.g., TestDefaultConfigDefaults) and extend its
assertions to cover the newly added defaults from DefaultConfig: verify
cfg.RPC.ReadTimeout and cfg.RPC.WriteTimeout equal the expected default
durations and verify cfg.Subscription.MaxSubscribers equals its expected default
integer; update any error messages in t.Fatal/t.Fatalf to reference the new test
name and the exact expected values to make failures clear.
pkg/api/grpc/server_test.go (1)
21-33: Consider extracting shared mock types to reduce duplication across test packages.

The mockStore and mockFetcher types are nearly identical across pkg/api/service_test.go, pkg/api/jsonrpc/server_test.go, and pkg/api/grpc/server_test.go. A shared internal/testutil (or pkg/api/apitest) package with reusable mock implementations would reduce maintenance burden when the Store or DataFetcher interfaces evolve.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/api/grpc/server_test.go` around lines 21 - 33, The mockStore and
mockFetcher implementations (types mockStore, mockFetcher and constructor
newMockStore / newMockFetcher) are duplicated across service_test.go,
jsonrpc/server_test.go and grpc/server_test.go; extract these shared test mocks
into a single test helper package (e.g. internal/testutil or pkg/api/apitest)
containing the mock types, constructors and any helper methods, update each test
file to import and use those mock types (replace local mockStore/mockFetcher
declarations and newMockStore/newMockFetcher calls with the shared versions),
and run tests to ensure imports and interfaces (Store, DataFetcher) are
satisfied so future interface changes are fixed in one place.
pkg/api/notifier_test.go (1)
209-225: Good coverage of the max-subscribers limit; consider adding cleanup and using t.Fatal for the first two subscribes.

Two minor observations:

Subscriptions 1 and 2 are never unsubscribed. While not harmful in a short-lived test, adding defer n.Unsubscribe(...) would be consistent with every other test in this file and would exercise the cleanup path.

Using t.Errorf for the first two subscribes means the test continues even if the notifier is fundamentally broken. t.Fatalf would fail fast and make diagnostics clearer (matching the pattern in other tests).
Proposed cleanup
 func TestNotifierMaxSubscribers(t *testing.T) {
 	n := NewNotifier(1, 2, zerolog.Nop())
 
-	_, err1 := n.Subscribe(nil)
-	_, err2 := n.Subscribe(nil)
+	sub1, err1 := n.Subscribe(nil)
+	sub2, err2 := n.Subscribe(nil)
 	_, err3 := n.Subscribe(nil)
 
 	if err1 != nil {
-		t.Errorf("first subscribe failed: %v", err1)
+		t.Fatalf("first subscribe failed: %v", err1)
 	}
+	defer n.Unsubscribe(sub1)
 	if err2 != nil {
-		t.Errorf("second subscribe failed: %v", err2)
+		t.Fatalf("second subscribe failed: %v", err2)
 	}
+	defer n.Unsubscribe(sub2)
 	if err3 == nil {
 		t.Error("third subscribe should have failed")
 	}
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/api/notifier_test.go` around lines 209 - 225, In
TestNotifierMaxSubscribers, ensure early failures abort the test and subscribers
are cleaned up: after calling NewNotifier and n.Subscribe (the first two calls),
use t.Fatalf (or t.Fatal) instead of t.Errorf when err1/err2 != nil so the test
fails fast, and immediately defer n.Unsubscribe(...) for each successful
subscription returned by Subscribe to exercise the cleanup path and match the
pattern in other tests.
pkg/fetch/celestia_node.go (1)
301-312: Consider using math/rand/v2 for jitter instead of UnixNano.

time.Now().UnixNano() is not random — concurrent calls in the same nanosecond get identical jitter, which defeats its purpose. For a single-fetcher scenario this is unlikely to matter, but rand.IntN (or rand.Int64N) from math/rand/v2 is trivially correct and avoids the subtlety.
Proposed fix
+import "math/rand/v2"
+
 func retryDelay(attempt int) time.Duration {
 	base := defaultRPCRetryDelay * time.Duration(1<<attempt)
 	jitterCap := base / 2
 	if jitterCap <= 0 {
 		return base
 	}
-	n := time.Now().UnixNano()
-	if n < 0 {
-		n = -n
-	}
-	return base + time.Duration(n%int64(jitterCap))
+	return base + time.Duration(rand.Int64N(int64(jitterCap)))
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/fetch/celestia_node.go` around lines 301 - 312, The jitter calculation in
retryDelay uses time.Now().UnixNano() which can produce identical values across
concurrent calls; replace that logic to use math/rand/v2's random integer
generator (e.g., rand.Int63n) to compute the jitter: compute jitter :=
rand.Int63n(int64(jitterCap)) and return base + time.Duration(jitter); ensure
the package initializes the global rand source once (e.g., seed in an init()
with time.Now().UnixNano()) so retryDelay (and symbol retryDelay) uses a proper
random jitter; keep the existing base and jitterCap calculations
(defaultRPCRetryDelay referenced) and handle jitterCap <= 0 as before.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@config/load.go`:
- Around line 67-69: The generated config template in config/load.go currently
leaves write_timeout commented out which misleads users; update the template so
write_timeout: 30 is active (uncomment the line) to match DefaultConfig() and
the behavior of Load(), ensuring the generated YAML shows both read_timeout and
write_timeout with explicit values.

---

Outside diff comments:
In `@config/load.go`:
- Around line 205-210: The code performs duplicate validation of
profiling.listen_addr: validateProfiling(&cfg.Profiling) already enforces that
Profiling.ListenAddr is set when Profiling.Enabled is true, so remove the
redundant inline check that follows the call (the if block referencing
cfg.Profiling.Enabled && cfg.Profiling.ListenAddr == "" and its fmt.Errorf
return). Keep the validateProfiling function as the single validation point for
Profiling configuration (refer to validateProfiling and cfg.Profiling fields)
and ensure no other duplicate checks remain in load.go.

In `@docs/running.md`:
- Around line 70-81: Docs missing three new config fields that exist in
defaultConfigYAML; update the "Full Config Reference" in docs/running.md to
include rpc.read_timeout and rpc.write_timeout (defaults from config/config.go
set to 30s) and subscription.max_subscribers (default 1024), matching the
entries in config/load.go's defaultConfigYAML and using the same names and
default values so the user-facing docs align with the actual defaults.

In `@pkg/fetch/celestia_node.go`:
- Around line 183-198: The Included method currently cannot use callRawWithRetry
because callRawWithRetry only wraps functions returning (json.RawMessage, error)
while ProofForwarder.Included must return (bool, error); update the code by
either (A) adding a thin retry helper (e.g., callBoolWithRetry) that mirrors
callRawWithRetry but handles (bool, error) and use it from
CelestiaNodeFetcher.Included, or (B) if proofs intentionally skip retries, add a
short clarifying comment above CelestiaNodeFetcher.Included explaining why retry
semantics are omitted; additionally consider applying callRawWithRetry to
GetProof (which returns json.RawMessage) to reuse retry logic used by
GetHeader/GetBlobs/GetNetworkHead. Ensure you reference and update the
functions/methods: callRawWithRetry, CelestiaNodeFetcher.Included,
CelestiaNodeFetcher.GetProof, and ProofForwarder.Included.

---

Nitpick comments:
In `@config/load_test.go`:
- Around line 10-26: The test TestDefaultConfigProfilingDefaults now checks sync
defaults too and its name is misleading; rename the test (e.g.,
TestDefaultConfigDefaults) and extend its assertions to cover the newly added
defaults from DefaultConfig: verify cfg.RPC.ReadTimeout and cfg.RPC.WriteTimeout
equal the expected default durations and verify cfg.Subscription.MaxSubscribers
equals its expected default integer; update any error messages in
t.Fatal/t.Fatalf to reference the new test name and the exact expected values to
make failures clear.

In `@pkg/api/grpc/server_test.go`:
- Around line 21-33: The mockStore and mockFetcher implementations (types
mockStore, mockFetcher and constructor newMockStore / newMockFetcher) are
duplicated across service_test.go, jsonrpc/server_test.go and
grpc/server_test.go; extract these shared test mocks into a single test helper
package (e.g. internal/testutil or pkg/api/apitest) containing the mock types,
constructors and any helper methods, update each test file to import and use
those mock types (replace local mockStore/mockFetcher declarations and
newMockStore/newMockFetcher calls with the shared versions), and run tests to
ensure imports and interfaces (Store, DataFetcher) are satisfied so future
interface changes are fixed in one place.

In `@pkg/api/notifier_test.go`:
- Around line 209-225: In TestNotifierMaxSubscribers, ensure early failures
abort the test and subscribers are cleaned up: after calling NewNotifier and
n.Subscribe (the first two calls), use t.Fatalf (or t.Fatal) instead of t.Errorf
when err1/err2 != nil so the test fails fast, and immediately defer
n.Unsubscribe(...) for each successful subscription returned by Subscribe to
exercise the cleanup path and match the pattern in other tests.

In `@pkg/fetch/celestia_node.go`:
- Around line 301-312: The jitter calculation in retryDelay uses
time.Now().UnixNano() which can produce identical values across concurrent
calls; replace that logic to use math/rand/v2's random integer generator (e.g.,
rand.Int63n) to compute the jitter: compute jitter :=
rand.Int63n(int64(jitterCap)) and return base + time.Duration(jitter); ensure
the package initializes the global rand source once (e.g., seed in an init()
with time.Now().UnixNano()) so retryDelay (and symbol retryDelay) uses a proper
random jitter; keep the existing base and jitterCap calculations
(defaultRPCRetryDelay referenced) and handle jitterCap <= 0 as before.

config/load.go

tac0turtle and others added 5 commits February 19, 2026 20:55

Harden Celestia fetch retries and raise backfill defaults

392b8b2

fixes

b4a3209

Merge branch 'main' into marko/fixes

ef8b665

revert config push

a09f24b

tac0turtle force-pushed the marko/fixes branch from 6bde87a to a09f24b Compare February 20, 2026 10:13

gemini-code-assist bot reviewed Feb 20, 2026

View reviewed changes

coderabbitai bot reviewed Feb 20, 2026

View reviewed changes

config/load.go Outdated Show resolved Hide resolved

amendments

ea4b03e

tac0turtle merged commit dea4b8b into main Feb 20, 2026
3 of 4 checks passed

tac0turtle deleted the marko/fixes branch February 20, 2026 12:21

Comments

Conversation

tac0turtle commented Feb 20, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Feb 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning, 1 inconclusive)

Uh oh!

gemini-code-assist bot commented Feb 20, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

pkg/store/s3.go (619-650)

cmd/apex/main.go (216-221)

config.yaml (9)

pkg/fetch/celestia_node.go (301-312)

Uh oh!

tac0turtle commented Feb 20, 2026

Uh oh!

tac0turtle commented Feb 20, 2026

Uh oh!

coderabbitai bot commented Feb 20, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

tac0turtle commented Feb 20, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 20, 2026 •

edited

Loading