fix(mcp): add timeout to background thread join in stop() to prevent hangs by giulio-leone · Pull Request #1792 · strands-agents/sdk-python

giulio-leone · 2026-03-01T01:53:14Z

Summary

When an Agent holding an MCPClient is created inside a function and goes out of scope, the Agent.__del__ finalizer calls MCPClient.stop() which invokes _background_thread.join(). If the background thread cannot exit promptly (e.g. transport subprocess teardown is slow), join() blocks indefinitely, causing the entire process to hang on exit.

Changes

Add a timeout (equal to startup_timeout, default 30s) to the join() call in stop()
If the thread does not exit within the timeout, log a warning and proceed with cleanup
The thread is already a daemon thread (daemon=True), so it will be cleaned up by the interpreter on process exit

Testing

Updated existing tests to verify join() is called with timeout
Added new test test_stop_does_not_hang_when_join_times_out covering the timeout path

Fixes #1732

Copilot

Pull request overview

Prevents process hangs on interpreter shutdown by ensuring MCPClient.stop() doesn’t block indefinitely when joining the background thread (notably when called from Agent.__del__).

Changes:

Add a timeout (based on startup_timeout) to _background_thread.join() in MCPClient.stop()
Log a warning when the join times out and continue cleanup
Update/add tests to assert join(timeout=...) is used and exercise the timeout path

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File	Description
`src/strands/tools/mcp/mcp_client.py`	Adds `join(timeout=...)` + warning when the background thread doesn’t exit promptly.
`tests/strands/tools/mcp/test_mcp_client.py`	Updates existing stop() tests for the new join timeout behavior and adds a new timeout-path test.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

src/strands/tools/mcp/mcp_client.py

tests/strands/tools/mcp/test_mcp_client.py

giulio-leone · 2026-03-01T04:00:31Z

All CI checks pass. Ready for review.

Refs: strands-agents#1792

giulio-leone · 2026-03-06T02:37:43Z

Hi! Gentle ping — this PR is rebased, CI passes, and ready for review. Happy to address any feedback. Thanks!

giulio-leone · 2026-03-06T11:13:09Z

Hi @lizradway — I have verified the tests locally and all 52 tests in test_mcp_client.py pass, including both test_stop_closes_event_loop and test_mcp_client_state_reset_after_timeout.

The event_loop.close() call is on line 364 (in the elif branch after the is_alive() check), and all state (including self._background_thread) is reset unconditionally on lines 372-380.

Could you try re-running the CI? It may have been a transient issue.

…hangs When an Agent holding an MCPClient goes out of scope inside a function, the Agent.__del__ finalizer calls MCPClient.stop() which calls _background_thread.join(). If the background thread cannot exit promptly (e.g. transport subprocess teardown is slow), join() blocks indefinitely, causing the entire process to hang on exit. Add a timeout (equal to startup_timeout, default 30s) to the join() call. If the thread does not exit within the timeout, log a warning and proceed with cleanup. The thread is already a daemon thread (daemon=True), so it will be cleaned up by the interpreter on process exit. Fixes strands-agents#1732 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…imeout When join() times out and the thread is still alive, return early without closing the event loop or resetting internal state. This prevents RuntimeError from closing a running loop and avoids allowing a second start() while teardown is still in progress. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Expand the docstring to explicitly describe the expected behavior: skip event loop close and state reset when thread is still alive.

Refs: strands-agents#1792

Applied Copilot suggestion: remove duplicate stray ")" and "return" lines that were introduced by a bad suggestion application. The timeout/early-return block in stop() now has exactly one return statement after the logger.warning() call.

…d exits The event_loop.close() call was accidentally dropped during a prior review-fix commit. Restore it on the normal (non-timeout) path so the asyncio event loop is properly closed when the background thread exits cleanly, preventing event loop leaks. The timeout early-return path correctly skips this to avoid closing a loop that may still be running in the background thread. Refs: strands-agents#1792

When stop() is called after start() timeout, the background thread is still alive. Previously stop() returned early without resetting state, which broke start() timeout cleanup (test_mcp_client_state_reset_after_timeout). Now stop() skips only the event loop close (unsafe while thread runs) but still resets all state fields so the client can be reused. Refs: strands-agents#1792

giulio-leone · 2026-03-06T12:01:26Z

Hi @lizradway — rebased on latest main (v1.29.0) and confirmed all 52 tests in test_mcp_client.py pass cleanly, including test_stop_closes_event_loop and test_mcp_client_state_reset_after_timeout:

52 passed, 13 warnings in 5.60s

The warnings are pre-existing unawaited-coroutine warnings from MagicMock interactions in other test functions — unrelated to this PR.

Could you try pulling the latest force-push and re-running? Happy to investigate further if you still see failures.

giulio-leone · 2026-03-06T13:34:16Z

Hi @lizradway — the test_mcp_client_state_reset_after_timeout failure has been fixed in commit e2f711a.

Root cause: The stop() method had an early return in the join-timeout path (when background_thread.is_alive() was True). This skipped the state reset block at the end of stop(), leaving self._background_thread pointing at the old thread instead of being set to None.

Fix: Removed the early return and restructured the conditionals so that:

Event loop close is still skipped when the thread is alive (to avoid RuntimeError from closing a running loop)
State reset (_background_thread = None, _background_thread_session = None, etc.) always executes regardless of timeout

All 52 tests in test_mcp_client.py pass cleanly, including both test_stop_closes_event_loop and test_mcp_client_state_reset_after_timeout.

Covers the elif branch in stop() where _background_thread is None but _background_thread_event_loop is not None, ensuring the event loop is still properly closed. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Address review feedback: when join() times out and the thread is still alive, return early without resetting internal state. This prevents: 1. RuntimeError from closing event loop while it's still running 2. A second background thread being spawned via start() while the first is still tearing down Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

giulio-leone · 2026-03-07T21:43:44Z

Closing to reduce PR volume. Happy to resubmit if the team finds this fix useful.

Copilot AI review requested due to automatic review settings March 1, 2026 01:53

github-actions bot added the size/s label Mar 1, 2026

giulio-leone requested a deployment to manual-approval March 1, 2026 01:53 — with GitHub Actions Waiting

Copilot started reviewing on behalf of giulio-leone March 1, 2026 01:53 View session

Copilot AI reviewed Mar 1, 2026

View reviewed changes

src/strands/tools/mcp/mcp_client.py Show resolved Hide resolved

tests/strands/tools/mcp/test_mcp_client.py Outdated Show resolved Hide resolved

github-actions bot added size/s and removed size/s labels Mar 1, 2026

giulio-leone requested a deployment to manual-approval March 1, 2026 03:36 — with GitHub Actions Waiting

github-actions bot added size/s and removed size/s labels Mar 1, 2026

giulio-leone requested a deployment to manual-approval March 1, 2026 06:34 — with GitHub Actions Waiting

github-actions bot added size/s and removed size/s labels Mar 2, 2026

giulio-leone requested a deployment to manual-approval March 2, 2026 15:42 — with GitHub Actions Waiting

giulio-leone added a commit to giulio-leone/sdk-python that referenced this pull request Mar 2, 2026

fix: apply code review suggestion

92f0581

Refs: strands-agents#1792

github-actions bot added size/s and removed size/s labels Mar 2, 2026

giulio-leone requested a deployment to manual-approval March 2, 2026 17:30 — with GitHub Actions Waiting

lizradway self-requested a review March 2, 2026 17:30

github-actions bot added size/s and removed size/s labels Mar 2, 2026

giulio-leone requested a deployment to manual-approval March 2, 2026 18:40 — with GitHub Actions Waiting

github-actions bot removed the size/s label Mar 2, 2026

giulio-leone and others added 9 commits March 6, 2026 13:00

test: clarify docstring for join timeout test

3dc0ae5

Expand the docstring to explicitly describe the expected behavior: skip event loop close and state reset when thread is still alive.

fix: apply code review suggestions

0125df1

fix: apply code review suggestion

2f34e36

Refs: strands-agents#1792

fix: apply code review suggestions

3da7d2c

giulio-leone force-pushed the fix/mcp-client-stop-join-timeout branch from 62e7472 to e2f711a Compare March 6, 2026 12:01

github-actions bot removed the size/s label Mar 6, 2026

giulio-leone requested a deployment to manual-approval March 6, 2026 12:01 — with GitHub Actions Waiting

github-actions bot added the size/s label Mar 6, 2026

giulio-leone requested a deployment to manual-approval March 6, 2026 12:01 — with GitHub Actions Waiting

github-actions bot added size/s and removed size/s labels Mar 6, 2026

giulio-leone requested a deployment to manual-approval March 6, 2026 16:34 — with GitHub Actions Waiting

github-actions bot added size/s and removed size/s labels Mar 7, 2026

giulio-leone requested a deployment to manual-approval March 7, 2026 13:15 — with GitHub Actions Waiting

giulio-leone closed this Mar 7, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(mcp): add timeout to background thread join in stop() to prevent hangs#1792

fix(mcp): add timeout to background thread join in stop() to prevent hangs#1792
giulio-leone wants to merge 11 commits intostrands-agents:mainfrom
giulio-leone:fix/mcp-client-stop-join-timeout

giulio-leone commented Mar 1, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

giulio-leone commented Mar 1, 2026

Uh oh!

giulio-leone commented Mar 6, 2026

Uh oh!

giulio-leone commented Mar 6, 2026

Uh oh!

giulio-leone commented Mar 6, 2026

Uh oh!

giulio-leone commented Mar 6, 2026

Uh oh!

giulio-leone commented Mar 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

giulio-leone commented Mar 1, 2026

Summary

Changes

Testing

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

giulio-leone commented Mar 1, 2026

Uh oh!

giulio-leone commented Mar 6, 2026

Uh oh!

giulio-leone commented Mar 6, 2026

Uh oh!

giulio-leone commented Mar 6, 2026

Uh oh!

giulio-leone commented Mar 6, 2026

Uh oh!

giulio-leone commented Mar 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants