fix(engine): isolate search store from async load crashes#369
Open
QuinnWilton wants to merge 1 commit intoelixir-lang:mainfrom
Open
fix(engine): isolate search store from async load crashes#369QuinnWilton wants to merge 1 commit intoelixir-lang:mainfrom
QuinnWilton wants to merge 1 commit intoelixir-lang:mainfrom
Conversation
This was referenced Feb 10, 2026
3e18e7d to
ac24ce1
Compare
Replace Task.async with Task.Supervisor.async_nolink in
State.prepare_backend_async/2 so a crashing index build no longer
kills the Store GenServer.
Add a {:DOWN, ...} handler that logs the error and clears
async_load_ref, allowing the Store to retry on the next
project_compiled event. Demonitor the ref on successful completion
to avoid processing the normal DOWN.
ac24ce1 to
291a8b8
Compare
mhanberg
reviewed
Feb 14, 2026
Comment on lines
+490
to
+491
| # Allow time for the task to crash and the DOWN message to be processed. | ||
| Process.sleep(100) |
Member
There was a problem hiding this comment.
I suggest having the function that raises send a messsage to the test process instead, and we can assert_receive on that instead of waiting.
| {:DOWN, ref, :process, _pid, reason}, | ||
| {update_ref, %State{async_load_ref: ref} = state} | ||
| ) do | ||
| Logger.error("Search index async load crashed: #{inspect(reason)}") |
Member
There was a problem hiding this comment.
Suggested change
| Logger.error("Search index async load crashed: #{inspect(reason)}") | |
| Logger.error("Failed to prepare search store backend: #{inspect(reason)}") |
| assert_eventually alive?() | ||
|
|
||
| on_exit(fn -> | ||
| after_each_test(Ets, project) |
Member
There was a problem hiding this comment.
Suggested change
| after_each_test(Ets, project) | |
| destroy_backend(Ets, project) |
Member
There was a problem hiding this comment.
this after each test function can probably be deleted, but we can deal with that in a different pr (not you, just as a note for myself)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This is part of a set of 4 PRs that arose out of some static analysis tooling I'm working on:
That means that these aren't crashes or issues that I've observed in practice, however based on my reading of the code, they do represent issues worth addressing.
Problem:
State.prepare_backend_async/2usesTask.async/1to build the search index. If this task raises, theStorecrashes, and the task silently fails. The store is supervised under a:one_for_onestrategy, and as far as I can tell, no indexed data is permanently lost, however:Storewill be returned:noprocinstead of{:error, :loading}, and the crash will cascadeproject_compiledevents are dispatched between the crash and the restart, these will be missed by theStore. Since these events are never replayed, the event will be lost, and the server will remain unloaded until the next compilationSolution
By using
Task.Supervisor.async_nolink/2to start the search index task under a dedicatedEngine.TaskSupervisor, we can instead handle a crash in the task by logging the error, and clearingasync_load_ref, so that the server is immediately ready to process further requests.During this time, the server remains available, callers are isolated from the crash, no
project_compiledevents are lost while waiting for the restart, and previously loaded indexes remain loaded.In the successful case, the monitor is flushed to avoid
:DOWNmessages building up and causing a mailbox leak.