-
Notifications
You must be signed in to change notification settings - Fork 6.1k
Add KCP manifest and TL;DR summaries: 76% fewer agent navigation calls #4658
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
totto
wants to merge
6
commits into
crewAIInc:main
Choose a base branch
from
totto:main
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
6 commits
Select commit
Hold shift + click to select a range
25517b0
Add KCP manifest, TL;DR files, and benchmark results
totto c826224
fix: use REPO_ROOT derived from __file__ instead of hardcoded path
totto 28f79d1
fix: remove hardcoded path from BENCHMARK.md; restrict file access to…
totto 04df41f
fix: harden path validation — use pathlib.relative_to and restrict gl…
totto 5363656
fix: restore base_dir support with validation; use -e for grep pattern
totto f532932
fix(kcp): correct summary_of relationships for combined TL;DR units
totto File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,36 @@ | ||
| # KCP Benchmark Results — CrewAI | ||
|
|
||
| ## Summary | ||
|
|
||
| **76% reduction in tool calls** when using the Knowledge Context Protocol (KCP) manifest compared to unguided repository exploration. | ||
|
|
||
| - Baseline total: **123 tool calls** | ||
| - KCP total: **30 tool calls** | ||
| - Saved: **93 tool calls** across 8 queries | ||
|
|
||
| ## Results Table | ||
|
|
||
| | Query | Baseline | KCP | Saved | | ||
| | :---- | -------: | --: | ----: | | ||
| | What is the difference between Flows and Crews in CrewAI? | 14 | 2 | 12 | | ||
| | How do I create my first agent and assign it a task? | 7 | 3 | 4 | | ||
| | How do I create a custom tool for my agent? | 8 | 3 | 5 | | ||
| | How do I add memory to my crew? | 7 | 3 | 4 | | ||
| | Which LLM providers does CrewAI support? | 17 | 5 | 12 | | ||
| | How do I build a flow that triggers a crew? | 15 | 2 | 13 | | ||
| | How do I implement a hierarchical crew with a manager agent? | 22 | 9 | 13 | | ||
| | How do I add knowledge (RAG) to my crew? | 33 | 3 | 30 | | ||
| | **TOTAL** | **123** | **30** | **93** | | ||
|
|
||
| ## Methodology | ||
|
|
||
| Each query was run twice against a local clone of the CrewAI repository: | ||
|
|
||
| 1. **Baseline**: The agent was told the repository path and instructed to explore it freely using `read_file`, `glob_files`, and `grep_content` tools to find the answer. | ||
| 2. **KCP**: The agent was instructed to first read `knowledge.yaml`, match the query against unit triggers, and read only the files pointed to by matching units — preferring TL;DR summary files when available. | ||
|
|
||
| Both runs used `claude-haiku-4-5-20251001` with `max_tokens=2048` and up to 20 turns. Tool call counts measure retrieval efficiency only (not answer quality). | ||
|
|
||
| ## Findings | ||
|
|
||
| The KCP manifest delivered a **76% reduction in tool calls**, with the largest gains on broad or unfamiliar queries. The "knowledge (RAG)" query showed the most dramatic improvement (33 → 3 calls, 91% reduction): without KCP the agent recursively explored the docs directory; with KCP it read `knowledge.yaml`, matched the `rag crew` trigger directly to `tools-memory-tldr.mdx`, and answered immediately. The hierarchical crew query had the smallest relative gain (22 → 9), because the answer required reading the full `crews.mdx` and `tasks.mdx` even with guidance — demonstrating that KCP eliminates exploration overhead but cannot shrink inherently large source files. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,165 @@ | ||
| import anthropic | ||
| import os | ||
| import glob as glob_module | ||
| import subprocess | ||
| from pathlib import Path | ||
|
|
||
| client = anthropic.Anthropic() | ||
|
|
||
| REPO_ROOT = os.path.dirname(os.path.abspath(__file__)) | ||
| _REPO_ROOT_REAL = Path(os.path.realpath(REPO_ROOT)) | ||
|
|
||
|
|
||
| def _within_repo(path: str) -> bool: | ||
| """Return True if path resolves to a location inside REPO_ROOT.""" | ||
| try: | ||
| Path(os.path.realpath(path)).relative_to(_REPO_ROOT_REAL) | ||
| return True | ||
| except (ValueError, OSError): | ||
| return False | ||
|
|
||
| TOOLS = [ | ||
| { | ||
| "name": "read_file", | ||
| "description": "Read the content of a file", | ||
| "input_schema": { | ||
| "type": "object", | ||
| "properties": {"path": {"type": "string"}}, | ||
| "required": ["path"] | ||
| } | ||
| }, | ||
| { | ||
| "name": "glob_files", | ||
| "description": "Find files matching a pattern", | ||
| "input_schema": { | ||
| "type": "object", | ||
| "properties": { | ||
| "pattern": {"type": "string"}, | ||
| "base_dir": {"type": "string"} | ||
| }, | ||
| "required": ["pattern"] | ||
| } | ||
| }, | ||
| { | ||
| "name": "grep_content", | ||
| "description": "Search for text in files", | ||
| "input_schema": { | ||
| "type": "object", | ||
| "properties": { | ||
| "pattern": {"type": "string"}, | ||
| "path": {"type": "string"} | ||
| }, | ||
| "required": ["pattern", "path"] | ||
| } | ||
| } | ||
| ] | ||
|
|
||
| def execute_tool(tool_name, tool_input): | ||
| if tool_name == "read_file": | ||
| path = tool_input["path"] | ||
| if not _within_repo(path): | ||
| return "Error: access denied — path is outside the repository" | ||
| try: | ||
| with open(path, 'r', encoding='utf-8', errors='replace') as f: | ||
| content = f.read() | ||
| if len(content) > 8000: | ||
| content = content[:8000] + "\n...[truncated]" | ||
| return content | ||
| except Exception as e: | ||
| return f"Error: {e}" | ||
| elif tool_name == "glob_files": | ||
| pattern = tool_input["pattern"] | ||
| base = tool_input.get("base_dir", REPO_ROOT) | ||
| if not _within_repo(base): | ||
| base = REPO_ROOT | ||
| if not pattern.startswith("/"): | ||
| pattern = os.path.join(base, pattern) | ||
| matches = [m for m in glob_module.glob(pattern, recursive=True) if _within_repo(m)] | ||
| return "\n".join(matches[:20]) if matches else "No files found" | ||
cursor[bot] marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| elif tool_name == "grep_content": | ||
| pattern = tool_input["pattern"] | ||
| path = tool_input["path"] | ||
| if not _within_repo(path): | ||
| return "Error: access denied — path is outside the repository" | ||
| try: | ||
| result = subprocess.run( | ||
| ["grep", "-r", "-l", "-m", "5", "-e", pattern, path], | ||
| capture_output=True, text=True, timeout=10 | ||
| ) | ||
cursor[bot] marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| return result.stdout[:2000] if result.stdout else "No matches" | ||
| except Exception as e: | ||
| return f"Error: {e}" | ||
| return "Unknown tool" | ||
cursor[bot] marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| def run_agent(system_prompt, query, max_turns=20): | ||
| messages = [{"role": "user", "content": query}] | ||
| tool_count = 0 | ||
| for _ in range(max_turns): | ||
| response = client.messages.create( | ||
| model="claude-haiku-4-5-20251001", | ||
| max_tokens=2048, | ||
| system=system_prompt, | ||
| tools=TOOLS, | ||
| messages=messages | ||
| ) | ||
| tool_uses = [b for b in response.content if b.type == "tool_use"] | ||
| tool_count += len(tool_uses) | ||
| if response.stop_reason == "end_turn" or not tool_uses: | ||
| return "", tool_count | ||
| messages.append({"role": "assistant", "content": response.content}) | ||
| tool_results = [] | ||
| for tool_use in tool_uses: | ||
| result = execute_tool(tool_use.name, tool_use.input) | ||
| tool_results.append({ | ||
| "type": "tool_result", | ||
| "tool_use_id": tool_use.id, | ||
| "content": result | ||
| }) | ||
| messages.append({"role": "user", "content": tool_results}) | ||
| return "", tool_count | ||
|
|
||
| BASELINE_PROMPT = f"""You are a helpful assistant answering questions about the CrewAI framework. | ||
| The repository is at {REPO_ROOT}. | ||
| Use the available tools to read files and find the answer. | ||
| Start by exploring the repository structure to understand where to find information.""" | ||
|
|
||
| KCP_PROMPT = f"""You are a helpful assistant answering questions about the CrewAI framework. | ||
| The repository is at {REPO_ROOT}. | ||
| IMPORTANT: First read {REPO_ROOT}/knowledge.yaml to understand the repository structure. | ||
| Match the question to the triggers in knowledge.yaml and read only the files pointed to by matching units. | ||
| If a unit has summary_available: true, read the summary_unit file first (it's much smaller).""" | ||
|
|
||
| QUERIES = [ | ||
| "What is the difference between Flows and Crews in CrewAI?", | ||
| "How do I create my first agent and assign it a task?", | ||
| "How do I create a custom tool for my agent?", | ||
| "How do I add memory to my crew?", | ||
| "Which LLM providers does CrewAI support?", | ||
| "How do I build a flow that triggers a crew?", | ||
| "How do I implement a hierarchical crew with a manager agent?", | ||
| "How do I add knowledge (RAG) to my crew?", | ||
| ] | ||
|
|
||
| if __name__ == "__main__": | ||
| print("CrewAI KCP Benchmark") | ||
| print("=" * 60) | ||
| results = [] | ||
| for i, query in enumerate(QUERIES): | ||
| print(f"\nQuery {i+1}: {query[:60]}...") | ||
| _, baseline = run_agent(BASELINE_PROMPT, query) | ||
| print(f" Baseline: {baseline} tool calls") | ||
| _, kcp = run_agent(KCP_PROMPT, query) | ||
| print(f" KCP: {kcp} tool calls") | ||
| results.append((query, baseline, kcp)) | ||
|
|
||
| print("\n" + "=" * 60) | ||
| total_baseline = sum(r[1] for r in results) | ||
| total_kcp = sum(r[2] for r in results) | ||
| print(f"\n{'Query':<55} {'Base':>5} {'KCP':>5} {'Saved':>6}") | ||
| print("-" * 75) | ||
| for query, b, k in results: | ||
| print(f"{query[:55]:<55} {b:>5} {k:>5} {b-k:>6}") | ||
| print("-" * 75) | ||
| print(f"{'TOTAL':<55} {total_baseline:>5} {total_kcp:>5} {total_baseline-total_kcp:>6}") | ||
| pct = round((1 - total_kcp/total_baseline) * 100) if total_baseline > 0 else 0 | ||
| print(f"\nReduction: {pct}% fewer tool calls with KCP") | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,80 @@ | ||
| --- | ||
| title: Agents & Tasks (TL;DR) | ||
| description: The 5 key agent attributes and how to define tasks — quick reference | ||
| icon: robot | ||
| --- | ||
|
|
||
| ## Agent: The 5 Key Attributes | ||
|
|
||
| An `Agent` is an autonomous unit with a role, a goal, and the tools to get things done. | ||
|
|
||
| | Attribute | Parameter | What it does | | ||
| | :--- | :--- | :--- | | ||
| | **Role** | `role` | Defines the agent's function and expertise | | ||
| | **Goal** | `goal` | The individual objective guiding decisions | | ||
| | **Backstory** | `backstory` | Context and personality enriching interactions | | ||
| | **Tools** | `tools` | List of capabilities the agent can use (default: `[]`) | | ||
| | **LLM** | `llm` | The language model powering the agent (default: `gpt-4o`) | | ||
|
|
||
| ### Minimal Agent Example | ||
|
|
||
| ```python | ||
| from crewai import Agent | ||
|
|
||
| researcher = Agent( | ||
| role="Research Analyst", | ||
| goal="Find accurate, up-to-date information on any topic", | ||
| backstory="An expert at gathering data from multiple sources and identifying key insights.", | ||
| tools=[], # add tools here, e.g. SerperDevTool() | ||
| verbose=True, # enable logs for debugging | ||
| ) | ||
| ``` | ||
|
|
||
| ## Task: Description + Expected Output + Agent | ||
|
|
||
| A `Task` is a specific assignment given to an agent. Two fields are required; agent assignment is strongly recommended. | ||
|
|
||
| | Attribute | Parameter | What it does | | ||
| | :--- | :--- | :--- | | ||
| | **Description** | `description` | What the agent must do | | ||
| | **Expected Output** | `expected_output` | What a successful completion looks like | | ||
| | **Agent** | `agent` | Which agent handles this task | | ||
| | **Context** | `context` | Other tasks whose outputs feed into this one | | ||
| | **Output File** | `output_file` | Save output to a file path | | ||
|
|
||
| ### Minimal Task Example | ||
|
|
||
| ```python | ||
| from crewai import Task | ||
|
|
||
| research_task = Task( | ||
| description="Research the top 5 AI frameworks released in 2025 and summarize their key features.", | ||
| expected_output="A markdown list of 5 frameworks with name, key feature, and one-sentence summary.", | ||
| agent=researcher, | ||
| ) | ||
| ``` | ||
|
|
||
| ## Process Types: How Tasks Are Executed | ||
|
|
||
| ```python | ||
| from crewai import Crew, Process | ||
|
|
||
| # Sequential (default): tasks run in order, output feeds into the next | ||
| crew = Crew(agents=[...], tasks=[...], process=Process.sequential) | ||
|
|
||
| # Hierarchical: a manager LLM assigns tasks based on agent capabilities | ||
| crew = Crew( | ||
| agents=[...], | ||
| tasks=[...], | ||
| process=Process.hierarchical, | ||
| manager_llm="gpt-4o", # required for hierarchical | ||
| ) | ||
| ``` | ||
|
|
||
| **Sequential** — use when tasks have a clear order and each builds on the previous. | ||
| **Hierarchical** — use when tasks should be dynamically assigned by a manager agent. | ||
|
|
||
| ## Full Reference | ||
|
|
||
| - All agent attributes: [concepts/agents.mdx](/en/concepts/agents) | ||
| - All task attributes: [concepts/tasks.mdx](/en/concepts/tasks) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,102 @@ | ||
| --- | ||
| title: Flows (TL;DR) | ||
| description: Event-driven workflows with state management — quick reference | ||
| icon: arrow-progress | ||
| --- | ||
|
|
||
| ## What is a Flow? | ||
|
|
||
| A Flow is the control plane of your CrewAI application. It chains tasks together with state, conditional logic, and event-driven triggers. Crews run *inside* Flow steps when you need autonomous agent intelligence. | ||
|
|
||
| ## Key Decorators | ||
|
|
||
| | Decorator | Purpose | | ||
| | :--- | :--- | | ||
| | `@start()` | Entry point — runs when `flow.kickoff()` is called | | ||
| | `@listen(method)` | Runs after the specified method completes, receives its output | | ||
| | `@router(method)` | Routes execution to different branches based on return value | | ||
| | `@and_(a, b)` | Runs only after **both** `a` and `b` complete | | ||
| | `@or_(a, b)` | Runs when **either** `a` or `b` completes | | ||
|
|
||
| ## Minimal Flow + Crew Example | ||
|
|
||
| ```python | ||
| from crewai import Agent, Task, Crew | ||
| from crewai.flow.flow import Flow, start, listen | ||
|
|
||
| class ResearchFlow(Flow): | ||
| # State is a dict accessible as self.state throughout the flow | ||
| model = "gpt-4o-mini" | ||
|
|
||
| @start() | ||
| def get_topic(self): | ||
| # Set initial state | ||
| self.state["topic"] = "AI agent frameworks" | ||
| return self.state["topic"] | ||
|
|
||
| @listen(get_topic) | ||
| def run_research_crew(self, topic): | ||
| # Spin up a Crew inside a Flow step | ||
| researcher = Agent( | ||
| role="Research Analyst", | ||
| goal=f"Research {topic} thoroughly", | ||
| backstory="Expert at finding and synthesizing information.", | ||
| ) | ||
| task = Task( | ||
| description=f"Research the latest developments in {topic}.", | ||
| expected_output="A 3-bullet summary of key findings.", | ||
| agent=researcher, | ||
| ) | ||
| crew = Crew(agents=[researcher], tasks=[task], verbose=False) | ||
| result = crew.kickoff() | ||
| self.state["research"] = str(result) | ||
| return str(result) | ||
|
|
||
| @listen(run_research_crew) | ||
| def save_result(self, research): | ||
| # Flow step: save to file (plain Python, no agent needed) | ||
| with open("research_output.txt", "w") as f: | ||
| f.write(research) | ||
| print("Saved!") | ||
| return research | ||
|
|
||
|
|
||
| flow = ResearchFlow() | ||
| result = flow.kickoff() | ||
| ``` | ||
|
|
||
| ## Routing Example | ||
|
|
||
| ```python | ||
| from crewai.flow.flow import Flow, start, listen, router | ||
|
|
||
| class BranchingFlow(Flow): | ||
| @start() | ||
| def check_input(self): | ||
| return "short" # or "long" | ||
|
|
||
| @router(check_input) | ||
| def route_by_length(self, result): | ||
| if result == "short": | ||
| return "handle_short" | ||
| return "handle_long" | ||
|
|
||
| @listen("handle_short") | ||
| def short_path(self): | ||
| return "Quick answer" | ||
|
|
||
| @listen("handle_long") | ||
| def long_path(self): | ||
| return "Detailed analysis" | ||
| ``` | ||
|
|
||
| ## State Management | ||
|
|
||
| - `self.state` is a dict persisted across all steps in the flow | ||
| - Every flow instance gets a unique UUID at `self.state["id"]` | ||
| - State is accessible in every `@start`, `@listen`, and `@router` method | ||
|
|
||
| ## Full Reference | ||
|
|
||
| - Complete Flows docs: [concepts/flows.mdx](/en/concepts/flows) | ||
| - Step-by-step tutorial: [guides/flows/first-flow.mdx](/en/guides/flows/first-flow) |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.