Skip to content

Conversation

@becomeStar
Copy link
Contributor

@becomeStar becomeStar commented Jan 25, 2026

Handshake failures that occur before any writes are buffered can currently be lost to downstream inbound handlers. In this case, the failure is surfaced via the write / promise path, but exceptionCaught is never observed by handlers placed after WriteBufferingAndExceptionHandler.

This makes the original handshake error difficult to diagnose and inconsistent with failures that occur after buffering has started.

This change propagates the exception via fireExceptionCaught before closing the channel when handling the first failure on an active channel. Doing so preserves the original failure while the pipeline is still intact and avoids losing the exception due to close-triggered teardown or reentrancy.

Fixes #8495

When a handshake failure occurs before any writes are buffered on the server
side, WriteBufferingAndExceptionHandler can record the failure internally
but never surface it to downstream inbound handlers.

This makes the original handshake error unobservable and complicates
debugging and instrumentation.

Propagate only the first failure via exceptionCaught, gated on the absence
of a previous failure, so that the canonical error becomes observable while
avoiding duplicate propagation and preserving existing close semantics.
@kannanjgithub
Copy link
Contributor

Replied my thought on issue #8495.

@becomeStar
Copy link
Contributor Author

@kannanjgithub

Thank you very much for your detailed analysis and for taking the time to simulate the failure. Your observation about the object handle changing is incredibly helpful and provides a clear clue as to why the original root cause may be getting lost.

It does seem that failCause can effectively be reset when a new instance of WriteBufferingAndExceptionHandler is introduced into the pipeline, which explains why a secondary exception ends up being surfaced instead of the original handshake failure.

I’ll dig further into where and why the handler instance is being replaced and look for a way to ensure the first meaningful exception is preserved across instances.

Based on your feedback, I’ll work toward a refined solution that addresses this state-loss issue directly. Once I have a clearer fix, I can either update this PR or follow up with a new one, depending on what you think makes the most sense.

Thanks again for the detailed investigation and guidance — it’s been extremely helpful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Netty server loses exception during handshake

2 participants