fix(lapis): health check: consider LAPIS "UP" even when SILO is down#1594
fix(lapis): health check: consider LAPIS "UP" even when SILO is down#1594fengelniederhammer merged 3 commits intomainfrom
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
There was a problem hiding this comment.
Pull request overview
Adjusts LAPIS’ Spring Boot Actuator health reporting so that the service remains reported as UP even if the dependent SILO service is unavailable, while still surfacing SILO-related diagnostics in health details.
Changes:
- Always returns
Health.up()fromSiloHealthIndicator. - Adds SILO diagnostic details (versions on success;
siloStatus/error info on failure) instead of marking overall health as DOWN.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
lapis/src/main/kotlin/org/genspectrum/lapis/health/SiloHealthIndicator.kt
Outdated
Show resolved
Hide resolved
lapis/src/main/kotlin/org/genspectrum/lapis/health/SiloHealthIndicator.kt
Show resolved
Hide resolved
| Health | ||
| .up() // LAPIS should always be "up", independent of SILO. | ||
| .let { | ||
| try { | ||
| val info = cachedSiloClient.callInfo() | ||
| it | ||
| .withDetail("dataVersion", info.dataVersion) | ||
| .withDetail("siloVersion", info.siloVersion ?: "unknown") | ||
| } catch (e: SiloNotReachableException) { | ||
| it | ||
| .withDetail("siloStatus", "DOWN") | ||
| .withDetail("error", "SILO not reachable") | ||
| .withDetail("message", e.message) | ||
| } catch (e: SiloUnavailableException) { | ||
| it | ||
| .withDetail("siloStatus", "DOWN") | ||
| .withDetail("error", "SILO unavailable (HTTP 503)") | ||
| .withDetail("message", e.message) | ||
| .withDetail("retryAfter", e.retryAfter) | ||
| } catch (e: Exception) { | ||
| it | ||
| .withDetail("siloStatus", "DOWN") | ||
| .withDetail("error", "Unexpected error checking SILO") | ||
| .withDetail("message", e.message) | ||
| } | ||
| } | ||
| .build() |
There was a problem hiding this comment.
SiloHealthIndicator now always reports Spring Actuator status UP (even when SILO is down) and only encodes SILO failure via details. This is easy to misinterpret because /actuator/health components will still show this indicator as UP; if the intent is to keep overall LAPIS health UP, consider moving SILO checks into a separate health group/contributor (e.g., readiness) or otherwise returning a real DOWN status for SILO while excluding it from the liveness group.
| Health | |
| .up() // LAPIS should always be "up", independent of SILO. | |
| .let { | |
| try { | |
| val info = cachedSiloClient.callInfo() | |
| it | |
| .withDetail("dataVersion", info.dataVersion) | |
| .withDetail("siloVersion", info.siloVersion ?: "unknown") | |
| } catch (e: SiloNotReachableException) { | |
| it | |
| .withDetail("siloStatus", "DOWN") | |
| .withDetail("error", "SILO not reachable") | |
| .withDetail("message", e.message) | |
| } catch (e: SiloUnavailableException) { | |
| it | |
| .withDetail("siloStatus", "DOWN") | |
| .withDetail("error", "SILO unavailable (HTTP 503)") | |
| .withDetail("message", e.message) | |
| .withDetail("retryAfter", e.retryAfter) | |
| } catch (e: Exception) { | |
| it | |
| .withDetail("siloStatus", "DOWN") | |
| .withDetail("error", "Unexpected error checking SILO") | |
| .withDetail("message", e.message) | |
| } | |
| } | |
| .build() | |
| try { | |
| val info = cachedSiloClient.callInfo() | |
| Health.up() | |
| .withDetail("dataVersion", info.dataVersion) | |
| .withDetail("siloVersion", info.siloVersion ?: "unknown") | |
| .build() | |
| } catch (e: SiloNotReachableException) { | |
| Health.down() | |
| .withDetail("siloStatus", "DOWN") | |
| .withDetail("error", "SILO not reachable") | |
| .withDetail("message", e.message) | |
| .build() | |
| } catch (e: SiloUnavailableException) { | |
| Health.down() | |
| .withDetail("siloStatus", "DOWN") | |
| .withDetail("error", "SILO unavailable (HTTP 503)") | |
| .withDetail("message", e.message) | |
| .withDetail("retryAfter", e.retryAfter) | |
| .build() | |
| } catch (e: Exception) { | |
| Health.down() | |
| .withDetail("siloStatus", "DOWN") | |
| .withDetail("error", "Unexpected error checking SILO") | |
| .withDetail("message", e.message) | |
| .build() | |
| } |
There was a problem hiding this comment.
This sounds valid; we're in the 'SiloHealthIndicator', shouldn't there be a LapisHealthIndicator somewhere maybe?
Edit: This was in response to what ChatGPT said above.
There was a problem hiding this comment.
The Lapis health is indicated automatically by Spring.
If we don't want to have SILO in here at all, then we should maybe completely remove the SiloHealthIndicator? I tried to make it "info only" without affecting the LAPIS health.
lapis/src/main/kotlin/org/genspectrum/lapis/health/SiloHealthIndicator.kt
Show resolved
Hide resolved
fhennig
left a comment
There was a problem hiding this comment.
What do you think about having UP, DOWN and DEGRADED?
| Health | ||
| .up() // LAPIS should always be "up", independent of SILO. | ||
| .let { | ||
| try { | ||
| val info = cachedSiloClient.callInfo() | ||
| it | ||
| .withDetail("dataVersion", info.dataVersion) | ||
| .withDetail("siloVersion", info.siloVersion ?: "unknown") | ||
| } catch (e: SiloNotReachableException) { | ||
| it | ||
| .withDetail("siloStatus", "DOWN") | ||
| .withDetail("error", "SILO not reachable") | ||
| .withDetail("message", e.message) | ||
| } catch (e: SiloUnavailableException) { | ||
| it | ||
| .withDetail("siloStatus", "DOWN") | ||
| .withDetail("error", "SILO unavailable (HTTP 503)") | ||
| .withDetail("message", e.message) | ||
| .withDetail("retryAfter", e.retryAfter) | ||
| } catch (e: Exception) { | ||
| it | ||
| .withDetail("siloStatus", "DOWN") | ||
| .withDetail("error", "Unexpected error checking SILO") | ||
| .withDetail("message", e.message) | ||
| } | ||
| } | ||
| .build() |
There was a problem hiding this comment.
This sounds valid; we're in the 'SiloHealthIndicator', shouldn't there be a LapisHealthIndicator somewhere maybe?
Edit: This was in response to what ChatGPT said above.
🤖 I have created a release *beep* *boop* --- ## [0.8.0](v0.7.2...v0.8.0) (2026-03-19) ### ⚠ BREAKING CHANGES * **lapis:** Backslash is now an escape character inside quoted strings. In regex queries that use `\` as a regex escape, you'll need to pass `\\` now, e.g. before: `division.regex=Basel\{1,2\}` -> after: `division.regex=Basel\\{1,2\\}` ### Features * **lapis:** support escaping single quotes in advanced query string values ([#1599](#1599)) ([88d59e9](88d59e9)) ### Bug Fixes * **lapis:** health check: consider LAPIS "UP" even when SILO is down ([#1594](#1594)) ([b9abdf9](b9abdf9)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
So that Kubernetes will not restart LAPIS just because SILO is down (in which case restarting LAPIS is pointless)
PR Checklist
- [ ] All necessary documentation has been adapted.- [ ] All necessary changes are explained in thellms.txt.- [ ] The implemented feature is covered by an appropriate test.