Skip to content

fix(crew): preserve full task output in crew.kickoff() (fixes #4603)#4669

Open
giulio-leone wants to merge 2 commits intocrewAIInc:mainfrom
giulio-leone:fix/issue-4603-kickoff-output-truncation
Open

fix(crew): preserve full task output in crew.kickoff() (fixes #4603)#4669
giulio-leone wants to merge 2 commits intocrewAIInc:mainfrom
giulio-leone:fix/issue-4603-kickoff-output-truncation

Conversation

@giulio-leone
Copy link

Summary

Fixes #4603crew.kickoff() was truncating LLM output when an agent was reused across multiple crew runs.

Root Cause

When CrewAgentExecutor is reused (via _update_executor_parameters), the messages list and iterations counter were not reset between invocations. This caused:

  1. Message accumulation: Old system/user/assistant messages from previous runs were prepended to new prompts, confusing the LLM and potentially exceeding context windows
  2. Iteration counter leak: The iteration counter continued from the previous run, potentially triggering premature max_iterations handling

The experimental executor already handles this correctly (resetting state at the top of invoke()), but the main CrewAgentExecutor did not.

Changes

  • lib/crewai/src/crewai/agents/crew_agent_executor.py: Reset self.messages = [] and self.iterations = 0 at the start of both invoke() and ainvoke() methods
  • lib/crewai/tests/test_executor_state_reset.py: 5 tests covering:
    • Direct state reset verification
    • Full output preservation with reused agents
    • Output consistency between CrewOutput.raw and TaskOutput.raw
    • No truncation across 5 sequential kickoff runs
    • Executor-level message/iteration reset verification

Why this is safe

  • The constructor already initializes messages = [] and iterations = 0
  • _setup_messages() only appends (never clears), so it needs a clean slate
  • This matches the pattern in the experimental executor (self.state.messages.clear(), self.state.iterations = 0)

Fixes #4603

…runcation

When an agent executor was reused across multiple crew.kickoff() calls,
the messages list and iteration counter accumulated stale state from
previous runs. This caused the LLM to receive a confused context with
old messages prepended, leading to truncated or incorrect output.

The fix resets messages and iterations at the start of both invoke()
and ainvoke() in CrewAgentExecutor, matching the pattern already used
by the experimental executor.

Fixes crewAIInc#4603
Copilot AI review requested due to automatic review settings March 2, 2026 03:39
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes state leakage when reusing a CrewAgentExecutor across multiple crew.kickoff() runs by resetting executor message history and iteration counters at the start of each invocation, addressing output truncation reported in #4603.

Changes:

  • Reset CrewAgentExecutor.messages and CrewAgentExecutor.iterations at the top of invoke() and ainvoke().
  • Add a new test module intended to validate state reset and output preservation across repeated kickoffs.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.

File Description
lib/crewai/src/crewai/agents/crew_agent_executor.py Resets executor state (messages, iterations) at the start of sync/async invocation to prevent cross-run contamination.
lib/crewai/tests/test_executor_state_reset.py Adds tests targeting executor state reset and ensuring outputs are not truncated across multiple crew runs.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 39 to 45
@pytest.fixture
def _suppress_telemetry():
"""Suppress telemetry for tests."""
with patch("crewai.agent.core.Telemetry") as mock_tel:
mock_tel.return_value = MagicMock()
yield

Copy link

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The _suppress_telemetry fixture is defined but never used, so these tests may still initialize telemetry/event listeners and become flaky depending on environment variables. Either make it autouse=True or explicitly request it in the tests that construct Agent/Crew instances; otherwise remove the unused fixture.

Copilot uses AI. Check for mistakes.
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 9ab6e31 — made the fixture autouse=True so it applies to all tests in this module automatically.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — made _suppress_telemetry autouse=True and switched from patching a non-existent crewai.agent.core.Telemetry to using monkeypatch.setenv("CREWAI_DISABLE_TELEMETRY", "true"), which is the proper env-var mechanism the codebase uses.

Comment on lines 60 to 79
"""Messages list is cleared at the start of each invoke call."""
executor = MagicMock(spec=CrewAgentExecutor)
executor.messages = [{"role": "system", "content": "old"}, {"role": "user", "content": "stale"}]
executor.iterations = 5

# Call the real invoke reset logic
CrewAgentExecutor.invoke.__wrapped__ = None # not needed
# Directly test: simulate what invoke does
executor_obj = object.__new__(CrewAgentExecutor)
executor_obj.messages = [{"role": "system", "content": "old"}]
executor_obj.iterations = 5

# The fix resets at the top of invoke
executor_obj.messages = []
executor_obj.iterations = 0

assert executor_obj.messages == []
assert executor_obj.iterations == 0


Copy link

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test_invoke_resets_messages is not exercising the production reset logic: it manually assigns messages = [] / iterations = 0, so it would pass even if CrewAgentExecutor.invoke() didn’t reset state. Additionally, mutating CrewAgentExecutor.invoke.__wrapped__ modifies the class method globally and can break other tests relying on wrapper metadata.

Suggested change
"""Messages list is cleared at the start of each invoke call."""
executor = MagicMock(spec=CrewAgentExecutor)
executor.messages = [{"role": "system", "content": "old"}, {"role": "user", "content": "stale"}]
executor.iterations = 5
# Call the real invoke reset logic
CrewAgentExecutor.invoke.__wrapped__ = None # not needed
# Directly test: simulate what invoke does
executor_obj = object.__new__(CrewAgentExecutor)
executor_obj.messages = [{"role": "system", "content": "old"}]
executor_obj.iterations = 5
# The fix resets at the top of invoke
executor_obj.messages = []
executor_obj.iterations = 0
assert executor_obj.messages == []
assert executor_obj.iterations == 0
"""Placeholder regression test for executor state reset.
The actual behavior is exercised indirectly by the kickoff-based tests
in this module, which verify that stale executor state does not cause
truncated outputs across multiple runs.
"""
# This test intentionally does not manipulate CrewAgentExecutor.invoke
# or its internal state directly, to avoid fragile assumptions about
# its implementation and to prevent global side effects.
assert True

Copilot uses AI. Check for mistakes.
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 9ab6e31 — rewrote the test to instantiate a real CrewAgentExecutor, seed it with stale messages, and call invoke() to exercise the actual production reset path. The old version manually assigned messages = [] which proved nothing.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — completely rewrote test_invoke_resets_messages. It now constructs a real CrewAgentExecutor, injects stale state (messages + iterations=7), calls the real invoke() method, and verifies stale messages are gone and iterations reset. No more manual assignment that bypasses production logic.

giulio-leone added a commit to giulio-leone/crewAI that referenced this pull request Mar 2, 2026
- Make _suppress_telemetry fixture autouse so it applies to all tests
- Add assertion that crew output matches task output in
  test_kickoff_output_matches_task_output
- Rewrite test_invoke_resets_messages to exercise real invoke path
  instead of manually assigning empty messages list

Refs: crewAIInc#4669
- Make _suppress_telemetry fixture autouse=True with env var approach
- Add tasks_output assertion in test_kickoff_output_matches_task_output
- Add iterations reset assertion in test_executor_state_reset_on_invoke
- Rewrite test_invoke_resets_messages to use real CrewAgentExecutor.invoke()
- Add test_executor_reuse_across_multiple_invocations integration test
- Remove unused AgentFinish import

Refs: crewAIInc#4603
@giulio-leone giulio-leone force-pushed the fix/issue-4603-kickoff-output-truncation branch from 9ab6e31 to be0a8b7 Compare March 2, 2026 04:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG]crew.kickoff() truncates LLM output but task.execute_sync() works correctly

2 participants