Skip to content

fix(lapis): health check: consider LAPIS "UP" even when SILO is down#1594

Merged
fengelniederhammer merged 3 commits intomainfrom
upStatus
Mar 18, 2026
Merged

fix(lapis): health check: consider LAPIS "UP" even when SILO is down#1594
fengelniederhammer merged 3 commits intomainfrom
upStatus

Conversation

@fengelniederhammer
Copy link
Contributor

@fengelniederhammer fengelniederhammer commented Mar 12, 2026

So that Kubernetes will not restart LAPIS just because SILO is down (in which case restarting LAPIS is pointless)

PR Checklist

- [ ] All necessary documentation has been adapted.
- [ ] All necessary changes are explained in the llms.txt.
- [ ] The implemented feature is covered by an appropriate test.

@vercel
Copy link

vercel bot commented Mar 12, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
lapis Ready Ready Preview, Comment Mar 17, 2026 10:21am

Request Review

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adjusts LAPIS’ Spring Boot Actuator health reporting so that the service remains reported as UP even if the dependent SILO service is unavailable, while still surfacing SILO-related diagnostics in health details.

Changes:

  • Always returns Health.up() from SiloHealthIndicator.
  • Adds SILO diagnostic details (versions on success; siloStatus/error info on failure) instead of marking overall health as DOWN.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Comment on lines +15 to +41
Health
.up() // LAPIS should always be "up", independent of SILO.
.let {
try {
val info = cachedSiloClient.callInfo()
it
.withDetail("dataVersion", info.dataVersion)
.withDetail("siloVersion", info.siloVersion ?: "unknown")
} catch (e: SiloNotReachableException) {
it
.withDetail("siloStatus", "DOWN")
.withDetail("error", "SILO not reachable")
.withDetail("message", e.message)
} catch (e: SiloUnavailableException) {
it
.withDetail("siloStatus", "DOWN")
.withDetail("error", "SILO unavailable (HTTP 503)")
.withDetail("message", e.message)
.withDetail("retryAfter", e.retryAfter)
} catch (e: Exception) {
it
.withDetail("siloStatus", "DOWN")
.withDetail("error", "Unexpected error checking SILO")
.withDetail("message", e.message)
}
}
.build()
Copy link

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SiloHealthIndicator now always reports Spring Actuator status UP (even when SILO is down) and only encodes SILO failure via details. This is easy to misinterpret because /actuator/health components will still show this indicator as UP; if the intent is to keep overall LAPIS health UP, consider moving SILO checks into a separate health group/contributor (e.g., readiness) or otherwise returning a real DOWN status for SILO while excluding it from the liveness group.

Suggested change
Health
.up() // LAPIS should always be "up", independent of SILO.
.let {
try {
val info = cachedSiloClient.callInfo()
it
.withDetail("dataVersion", info.dataVersion)
.withDetail("siloVersion", info.siloVersion ?: "unknown")
} catch (e: SiloNotReachableException) {
it
.withDetail("siloStatus", "DOWN")
.withDetail("error", "SILO not reachable")
.withDetail("message", e.message)
} catch (e: SiloUnavailableException) {
it
.withDetail("siloStatus", "DOWN")
.withDetail("error", "SILO unavailable (HTTP 503)")
.withDetail("message", e.message)
.withDetail("retryAfter", e.retryAfter)
} catch (e: Exception) {
it
.withDetail("siloStatus", "DOWN")
.withDetail("error", "Unexpected error checking SILO")
.withDetail("message", e.message)
}
}
.build()
try {
val info = cachedSiloClient.callInfo()
Health.up()
.withDetail("dataVersion", info.dataVersion)
.withDetail("siloVersion", info.siloVersion ?: "unknown")
.build()
} catch (e: SiloNotReachableException) {
Health.down()
.withDetail("siloStatus", "DOWN")
.withDetail("error", "SILO not reachable")
.withDetail("message", e.message)
.build()
} catch (e: SiloUnavailableException) {
Health.down()
.withDetail("siloStatus", "DOWN")
.withDetail("error", "SILO unavailable (HTTP 503)")
.withDetail("message", e.message)
.withDetail("retryAfter", e.retryAfter)
.build()
} catch (e: Exception) {
Health.down()
.withDetail("siloStatus", "DOWN")
.withDetail("error", "Unexpected error checking SILO")
.withDetail("message", e.message)
.build()
}

Copilot uses AI. Check for mistakes.
Copy link
Contributor

@fhennig fhennig Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds valid; we're in the 'SiloHealthIndicator', shouldn't there be a LapisHealthIndicator somewhere maybe?

Edit: This was in response to what ChatGPT said above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Lapis health is indicated automatically by Spring.

If we don't want to have SILO in here at all, then we should maybe completely remove the SiloHealthIndicator? I tried to make it "info only" without affecting the LAPIS health.

Copy link
Contributor

@fhennig fhennig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about having UP, DOWN and DEGRADED?

Comment on lines +15 to +41
Health
.up() // LAPIS should always be "up", independent of SILO.
.let {
try {
val info = cachedSiloClient.callInfo()
it
.withDetail("dataVersion", info.dataVersion)
.withDetail("siloVersion", info.siloVersion ?: "unknown")
} catch (e: SiloNotReachableException) {
it
.withDetail("siloStatus", "DOWN")
.withDetail("error", "SILO not reachable")
.withDetail("message", e.message)
} catch (e: SiloUnavailableException) {
it
.withDetail("siloStatus", "DOWN")
.withDetail("error", "SILO unavailable (HTTP 503)")
.withDetail("message", e.message)
.withDetail("retryAfter", e.retryAfter)
} catch (e: Exception) {
it
.withDetail("siloStatus", "DOWN")
.withDetail("error", "Unexpected error checking SILO")
.withDetail("message", e.message)
}
}
.build()
Copy link
Contributor

@fhennig fhennig Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds valid; we're in the 'SiloHealthIndicator', shouldn't there be a LapisHealthIndicator somewhere maybe?

Edit: This was in response to what ChatGPT said above.

@fengelniederhammer fengelniederhammer merged commit b9abdf9 into main Mar 18, 2026
9 checks passed
@fengelniederhammer fengelniederhammer deleted the upStatus branch March 18, 2026 09:58
fengelniederhammer pushed a commit that referenced this pull request Mar 19, 2026
🤖 I have created a release *beep* *boop*
---


## [0.8.0](v0.7.2...v0.8.0)
(2026-03-19)


### ⚠ BREAKING CHANGES

* **lapis:** Backslash is now an escape character inside quoted strings.
In regex queries that use `\` as a regex escape, you'll need to pass
`\\` now, e.g. before: `division.regex=Basel\{1,2\}` -> after:
`division.regex=Basel\\{1,2\\}`

### Features

* **lapis:** support escaping single quotes in advanced query string
values ([#1599](#1599))
([88d59e9](88d59e9))


### Bug Fixes

* **lapis:** health check: consider LAPIS "UP" even when SILO is down
([#1594](#1594))
([b9abdf9](b9abdf9))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants