fix(crew): preserve full task output in crew.kickoff() (fixes #4603)#4669
fix(crew): preserve full task output in crew.kickoff() (fixes #4603)#4669giulio-leone wants to merge 2 commits intocrewAIInc:mainfrom
Conversation
…runcation When an agent executor was reused across multiple crew.kickoff() calls, the messages list and iteration counter accumulated stale state from previous runs. This caused the LLM to receive a confused context with old messages prepended, leading to truncated or incorrect output. The fix resets messages and iterations at the start of both invoke() and ainvoke() in CrewAgentExecutor, matching the pattern already used by the experimental executor. Fixes crewAIInc#4603
There was a problem hiding this comment.
Pull request overview
This PR fixes state leakage when reusing a CrewAgentExecutor across multiple crew.kickoff() runs by resetting executor message history and iteration counters at the start of each invocation, addressing output truncation reported in #4603.
Changes:
- Reset
CrewAgentExecutor.messagesandCrewAgentExecutor.iterationsat the top ofinvoke()andainvoke(). - Add a new test module intended to validate state reset and output preservation across repeated kickoffs.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
| lib/crewai/src/crewai/agents/crew_agent_executor.py | Resets executor state (messages, iterations) at the start of sync/async invocation to prevent cross-run contamination. |
| lib/crewai/tests/test_executor_state_reset.py | Adds tests targeting executor state reset and ensuring outputs are not truncated across multiple crew runs. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| @pytest.fixture | ||
| def _suppress_telemetry(): | ||
| """Suppress telemetry for tests.""" | ||
| with patch("crewai.agent.core.Telemetry") as mock_tel: | ||
| mock_tel.return_value = MagicMock() | ||
| yield | ||
|
|
There was a problem hiding this comment.
The _suppress_telemetry fixture is defined but never used, so these tests may still initialize telemetry/event listeners and become flaky depending on environment variables. Either make it autouse=True or explicitly request it in the tests that construct Agent/Crew instances; otherwise remove the unused fixture.
There was a problem hiding this comment.
Fixed in 9ab6e31 — made the fixture autouse=True so it applies to all tests in this module automatically.
There was a problem hiding this comment.
Fixed — made _suppress_telemetry autouse=True and switched from patching a non-existent crewai.agent.core.Telemetry to using monkeypatch.setenv("CREWAI_DISABLE_TELEMETRY", "true"), which is the proper env-var mechanism the codebase uses.
| """Messages list is cleared at the start of each invoke call.""" | ||
| executor = MagicMock(spec=CrewAgentExecutor) | ||
| executor.messages = [{"role": "system", "content": "old"}, {"role": "user", "content": "stale"}] | ||
| executor.iterations = 5 | ||
|
|
||
| # Call the real invoke reset logic | ||
| CrewAgentExecutor.invoke.__wrapped__ = None # not needed | ||
| # Directly test: simulate what invoke does | ||
| executor_obj = object.__new__(CrewAgentExecutor) | ||
| executor_obj.messages = [{"role": "system", "content": "old"}] | ||
| executor_obj.iterations = 5 | ||
|
|
||
| # The fix resets at the top of invoke | ||
| executor_obj.messages = [] | ||
| executor_obj.iterations = 0 | ||
|
|
||
| assert executor_obj.messages == [] | ||
| assert executor_obj.iterations == 0 | ||
|
|
||
|
|
There was a problem hiding this comment.
test_invoke_resets_messages is not exercising the production reset logic: it manually assigns messages = [] / iterations = 0, so it would pass even if CrewAgentExecutor.invoke() didn’t reset state. Additionally, mutating CrewAgentExecutor.invoke.__wrapped__ modifies the class method globally and can break other tests relying on wrapper metadata.
| """Messages list is cleared at the start of each invoke call.""" | |
| executor = MagicMock(spec=CrewAgentExecutor) | |
| executor.messages = [{"role": "system", "content": "old"}, {"role": "user", "content": "stale"}] | |
| executor.iterations = 5 | |
| # Call the real invoke reset logic | |
| CrewAgentExecutor.invoke.__wrapped__ = None # not needed | |
| # Directly test: simulate what invoke does | |
| executor_obj = object.__new__(CrewAgentExecutor) | |
| executor_obj.messages = [{"role": "system", "content": "old"}] | |
| executor_obj.iterations = 5 | |
| # The fix resets at the top of invoke | |
| executor_obj.messages = [] | |
| executor_obj.iterations = 0 | |
| assert executor_obj.messages == [] | |
| assert executor_obj.iterations == 0 | |
| """Placeholder regression test for executor state reset. | |
| The actual behavior is exercised indirectly by the kickoff-based tests | |
| in this module, which verify that stale executor state does not cause | |
| truncated outputs across multiple runs. | |
| """ | |
| # This test intentionally does not manipulate CrewAgentExecutor.invoke | |
| # or its internal state directly, to avoid fragile assumptions about | |
| # its implementation and to prevent global side effects. | |
| assert True |
There was a problem hiding this comment.
Fixed in 9ab6e31 — rewrote the test to instantiate a real CrewAgentExecutor, seed it with stale messages, and call invoke() to exercise the actual production reset path. The old version manually assigned messages = [] which proved nothing.
There was a problem hiding this comment.
Fixed — completely rewrote test_invoke_resets_messages. It now constructs a real CrewAgentExecutor, injects stale state (messages + iterations=7), calls the real invoke() method, and verifies stale messages are gone and iterations reset. No more manual assignment that bypasses production logic.
- Make _suppress_telemetry fixture autouse so it applies to all tests - Add assertion that crew output matches task output in test_kickoff_output_matches_task_output - Rewrite test_invoke_resets_messages to exercise real invoke path instead of manually assigning empty messages list Refs: crewAIInc#4669
- Make _suppress_telemetry fixture autouse=True with env var approach - Add tasks_output assertion in test_kickoff_output_matches_task_output - Add iterations reset assertion in test_executor_state_reset_on_invoke - Rewrite test_invoke_resets_messages to use real CrewAgentExecutor.invoke() - Add test_executor_reuse_across_multiple_invocations integration test - Remove unused AgentFinish import Refs: crewAIInc#4603
9ab6e31 to
be0a8b7
Compare
Summary
Fixes #4603 —
crew.kickoff()was truncating LLM output when an agent was reused across multiple crew runs.Root Cause
When
CrewAgentExecutoris reused (via_update_executor_parameters), themessageslist anditerationscounter were not reset between invocations. This caused:max_iterationshandlingThe experimental executor already handles this correctly (resetting state at the top of
invoke()), but the mainCrewAgentExecutordid not.Changes
lib/crewai/src/crewai/agents/crew_agent_executor.py: Resetself.messages = []andself.iterations = 0at the start of bothinvoke()andainvoke()methodslib/crewai/tests/test_executor_state_reset.py: 5 tests covering:Why this is safe
messages = []anditerations = 0_setup_messages()only appends (never clears), so it needs a clean slateself.state.messages.clear(),self.state.iterations = 0)Fixes #4603