Skip to content

Fix DuckLake transaction conflict and SSL connection loss handling#389

Merged
fuziontech merged 3 commits intomainfrom
fix/ducklake-transaction-conflict-retry
Apr 3, 2026
Merged

Fix DuckLake transaction conflict and SSL connection loss handling#389
fuziontech merged 3 commits intomainfrom
fix/ducklake-transaction-conflict-retry

Conversation

@fuziontech
Copy link
Copy Markdown
Member

Summary

  • Fix isTransientDuckLakeError missing SSL variant: the function checked for "server closed the connection unexpectedly" but production logs show "SSL connection has been closed unexpectedly" (35 occurrences). Also adds "Current transaction is aborted" for cascading PG failures.
  • Add retryOnConflict for DuckLake transaction conflicts: exponential backoff with jitter (50ms initial, 5 retries, 2s cap) for autocommit queries in both control-plane (Flight SQL) and standalone modes. User-managed transactions propagate the error.
  • Map transaction conflicts to SQLSTATE 40001 (serialization_failure) so PG-aware clients like SQLMesh know to retry.
  • Add Prometheus counters: duckgres_{worker_,}ducklake_conflict_{total,retries_total,retry_successes_total,retries_exhausted_total} for monitoring conflict rates and retry effectiveness.

Test plan

  • go build ./server/... ./duckdbservice/... ./controlplane/...
  • go test ./duckdbservice/... -count=1 — all pass
  • go test ./server/... -count=1 — all pass
  • go vet ./server/... ./duckdbservice/... — clean
  • just test-ducklake-concurrency — verify reduced client-visible conflicts
  • Deploy to staging and monitor Prometheus metrics + error logs

🤖 Generated with Claude Code

fuziontech and others added 3 commits April 2, 2026 13:15
Two distinct failure modes were causing elevated commit failures after
the DuckDB 1.5.1/DuckLake 0.4 upgrade:

1. SSL connection loss (35 occurrences): isTransientDuckLakeError checked
   for "server closed the connection unexpectedly" but the actual error
   says "SSL connection has been closed unexpectedly" — the SSL variant
   was never detected, so retryOnTransient never fired.

2. Transaction conflicts: concurrent DuckLake model builds conflict on
   commit due to global snapshot IDs. Add retryOnConflict with exponential
   backoff + jitter (50ms initial, 5 retries, 2s cap) for autocommit
   queries in both control-plane and standalone modes.

Also maps transaction conflicts to SQLSTATE 40001 (serialization_failure)
so PG-aware clients know to retry, and adds Prometheus counters for
conflict tracking.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Fix indentation in executeQueryDirect retryOnConflict closure
- Preserve last DuckLake error in retryOnConflict exhaustion message
  (wraps with %w so callers see the actual conflict details)
- Add conflict retry to executeSelectQuery for symmetry with Flight SQL
  handler's GetFlightInfoStatement (SELECT can conflict during schema probing)
- Add comments explaining transient→conflict retry chaining behavior
- Fix misleading test names and comments for non-conflict error case
- Update exhaustion tests for the extra final fn() call

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
retryOnConflict now captures the last error from within the retry loop
rather than making a hidden 6th attempt after exhaustion. The
conflictMaxRetries constant now accurately reflects the actual number
of retry attempts.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@fuziontech fuziontech merged commit 08446c1 into main Apr 3, 2026
27 of 28 checks passed
@fuziontech fuziontech deleted the fix/ducklake-transaction-conflict-retry branch April 3, 2026 01:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant