feat(hooks): add resume flag to AfterInvocationEvent#1767
feat(hooks): add resume flag to AfterInvocationEvent#1767mkmeral wants to merge 7 commits intostrands-agents:mainfrom
Conversation
…-invocation Add a writable 'resume' field (AgentInput | None) to AfterInvocationEvent. When a hook sets resume to a non-None value, the agent automatically re-invokes with that input, triggering a full new invocation cycle including BeforeInvocationEvent.
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
| result: The result of the agent invocation, if available. | ||
| This will be None when invoked from structured_output methods, as those return typed output directly rather | ||
| than AgentResult. | ||
| resume: When set to a non-None agent input by a hook callback, the agent will |
There was a problem hiding this comment.
Did we brainstorm any other names? My gut reaction is continue instead of resume though I can see arguments for both.
There was a problem hiding this comment.
Nope, we didn't. I had the original slack canvas where we discussed, I see you mentioned continue there too 😅
It's a simple find-and-replace, so I just wanted to have the discussion in PR. I'm okay with continue
There was a problem hiding this comment.
From a reviewer perspective, both names are reasonable:
resume- Implies continuing from where you left off, which aligns with the interrupt handling use casecontinue- More general, suggests iteration/looping, butcontinueis a Python keyword which might cause confusion in some contexts
Given that this feature can be used both for "resuming after interrupts" AND for "continuing with new input for validation loops", I slightly lean toward resume as it's more distinctive and not a Python keyword. However, I defer to the team's preference since either works.
There was a problem hiding this comment.
Agreed that "resume" makes sense with "resume", but it feels off if it's not paused originally but rather you're attempting to force the agent to keep going.
continue still makes more sense if you're not interrupting.
When a hook sets resume on AfterInvocationEvent while an interrupt is active, the resume loop now calls _interrupt_state.resume() to process the interrupt responses before continuing. This enables hooks to automatically handle interrupts by providing interruptResponse input via event.resume, keeping the agent loop running without returning to the caller. If a hook provides invalid input (e.g. a plain string) while an interrupt is active, _interrupt_state.resume() raises TypeError, same as the normal caller-side interrupt resume path. Added four tests covering: - Hook handles interrupt automatically via resume with responses - Invalid resume input during interrupt raises TypeError - Interrupt without resume still returns to caller normally - Interrupt during a resumed invocation handled end-to-end
|
/strands review |
| than AgentResult. | ||
| resume: When set to a non-None agent input by a hook callback, the agent will | ||
| re-invoke itself with this input. The value can be any valid AgentInput | ||
| (str, content blocks, messages, etc.). Defaults to None (no resume). |
There was a problem hiding this comment.
Issue: Missing safety guidance in documentation
The docstring should warn users about the risk of infinite loops if resume is always set to a non-None value, and recommend implementing termination conditions in their hooks.
Suggestion: Add a note like:
Note:
Hooks using ``resume`` should implement explicit termination conditions to prevent
infinite loops. For example, track iteration count in ``invocation_state`` or check
for specific conditions in the result before setting ``resume``.
There was a problem hiding this comment.
This is the pattern that we have with retries for tools and model providers. I wanted to keep the same.
| ) | ||
|
|
||
| # Convert resume input to messages for next iteration, or None to stop | ||
| if after_invocation_event.resume is not None: |
There was a problem hiding this comment.
Issue: No safety mechanism to prevent infinite resume loops
A hook that always sets event.resume (e.g., due to a bug or misconfiguration) could cause the agent to loop indefinitely. This could lead to runaway API costs and resource exhaustion.
Suggestion: Consider adding a configurable max_resume_iterations parameter with a sensible default (e.g., 10) to prevent infinite loops. This could be set at the Agent level or passed through invocation_state. At minimum, this risk should be documented clearly in the docstring for AfterInvocationEvent.resume.
There was a problem hiding this comment.
This is the pattern that we have with retries for tools and model providers. I wanted to keep the same.
|
Question: The PR links to issue #1181 ("Add support for skills"), but that issue appears to be about loading knowledge files based on task context, which seems unrelated to the resume/re-invocation functionality. Is there a different issue that this PR addresses, or is this feature part of the broader skills implementation? If this is a standalone feature, consider creating a dedicated issue or clarifying the relationship in the PR description. |
|
Issue: Documentation PR section is incomplete The PR description lists "TBD" for the Documentation PR section. Per the contribution guidelines, PRs that introduce new public API features should either:
This feature adds a significant new capability (autonomous looping via hooks) that users need to understand. Documentation should cover:
Action Required: Please either link a documentation PR or provide justification for why documentation is not needed. |
There was a problem hiding this comment.
Assessment: Request Changes
Well-designed feature that enables powerful autonomous looping patterns through the hooks system. The implementation is clean and follows existing SDK patterns.
Review Categories
- Documentation (Blocking): The Documentation PR section is marked "TBD" - this must be addressed before merge
- Safety: Consider adding guidance or guardrails for infinite loop prevention
- API Design: Open discussion on
resumevscontinuenaming should be resolved
Good test coverage with comprehensive edge case handling, including interrupt scenarios.
|
On agent review above: I wanted to keep the same logic for retrying for tools and model providers through hooks. Those "gaps" also exist there. I don't want to overload this feature for now Docs is WIP |
Description
Motivation
Hooks that run after an agent invocation sometimes need to trigger a follow-up invocation automatically. For example, a coding agent hook could run tests and linting after each response, and if checks fail, resume the agent with the error output so it can fix the issues autonomously. Similarly, a validation hook could inspect the agent's output against acceptance criteria and loop until the result passes. Today there's no way to do this from a hook; callers have to build the retry loop externally, which duplicates lifecycle logic and bypasses the hook system entirely.
This PR adds a
resumefield toAfterInvocationEvent. When a hook setsresumeto any validAgentInput, the agent re-invokes itself with that input, firing a full new invocation cycle (includingBeforeInvocationEvent). WhenresumeisNone(the default), behavior is unchanged.Public API Changes
AfterInvocationEventgains a writableresumeattribute:Each resume triggers a complete invocation cycle:
BeforeInvocationEvent→ event loop →AfterInvocationEvent, so all existing hooks participate in every iteration. The resume input is converted to messages via the same_convert_prompt_to_messagespath as normal agent calls.Use Cases
Related Issues
Resolves: #1181
Documentation PR
TBD
Type of Change
New feature
Testing
4 unit tests covering resume triggering, no-resume default, BeforeInvocationEvent firing on resume, and multi-step chaining
3 unit tests for the
resumefield onAfterInvocationEvent(default value, writability, input type acceptance)Manually tested with a Jupyter notebook demo