Add AGENTS.md with project conventions for AI coding agents#15529
Add AGENTS.md with project conventions for AI coding agents#15529RussellSpitzer wants to merge 4 commits intoapache:mainfrom
Conversation
Introduces an AGENTS.md file at the repository root following the open standard for providing AI coding agents with project-specific context. The conventions were synthesized from analysis of 58,000+ review comments across 4,300+ merged PRs spanning the project history. Covers module boundaries, high-sensitivity areas, design patterns, coding style, naming, serialization, error handling, performance, testing, and REST/OpenAPI conventions. Made-with: Cursor
4d83e00 to
9d0b7ea
Compare
Following the pattern established by apache/airflow's AGENTS.md, add the three remaining recommended sections: executable build/test commands, PR/commit conventions, and explicit safety boundaries. Made-with: Cursor
jbonofre
left a comment
There was a problem hiding this comment.
I would suggest to add link to https://www.apache.org/legal/generative-tooling.html as the ASF is working on guidelines for the projects. There are several efforts at foundation level, so this page will probably evolve (and maybe define policies). So I think it's important to be aligned at project level.
jbonofre
left a comment
There was a problem hiding this comment.
LGTM, the mention to ASF guidelines are already included in the corresponding Iceberg documentation.
Nevermind, already done in the corresponding Iceberg documentation. |
AGENTS.md
Outdated
| - Compute expected values, don't hardcode. Tests belong in the module that owns the code. | ||
| - Write the most direct test for the bug. Parameterized tests for type variations. | ||
| - JUnit 5 + AssertJ: `@Test` (no `test` prefix), `assertThat`, `assertThatThrownBy`. | ||
| - `waitUntilAfter` for time-dependent tests. `boolean threw` for cleanup. Separate tests over combined. |
There was a problem hiding this comment.
What does "boolean threw" for cleanup means?
There was a problem hiding this comment.
This is the pattern of
Do a whole lot of stuff
boolean threw = false
try {
something
} Catch {
cleanup try stuff
}
if (threw) {
cleanup stuff from outside try block
}I don't think we actually do this that often as I mentioned to @nastra , I think it's just overrepresented in our comments. I think we should just remove it
AGENTS.md
Outdated
| - **Builder pattern**: For complex creation. Never require passing `null` for optional parameters. | ||
| - **Package-private by default**: Only make things public with demonstrated need. | ||
| - **Postel's Law**: Accept case-insensitive input, produce canonical output. | ||
| - **`boolean threw`**: `boolean threw = true; try { ...; threw = false; } finally { if (threw) cleanup(); }`. |
There was a problem hiding this comment.
I don't think this is a common pattern? We only use it in a very small set of cases
There was a problem hiding this comment.
We probably just have lots of comments about it in TableOperations modifications so it probably is talked about alot even if it doesn't happen alot. This is based on comments not actual code usage
|
I also have my own local CLAUDE.MD generated recently. I think we can add some more rules like below to AGENTS.md. |
There was a problem hiding this comment.
I think many people are using Claude today as well as Cursor, CLAUDE.md is a special filename that Claude Code automatically loads into context at the start of every conversation. Should we also create a file CLAUDE.md and add a one-line in CLAUDE.md that says "Read AGENTS.md first"
I don't think we can just use a vendor specific file name in this project. It's probably better to pressure Claude Code to support AGENTS or some more neutral standard. For individual users I think you can always make your own claude.md file and add it to a global .gitignore or something like that. |
Using my Anthropic Dollars Appropriately
I've been working recently using my own personal review skill to help me go over PR's and quickly get up to date on new requests before I take my own look at the PR and I realized this is probably useful to far more folks than just me. I decided to get ambitious and just analyze absolutely everything in the repo and try to get a good amalgam skill based on everything that's happened so far. Below is the result.
As noted below this was produced by Cursor and I used claude-4.6-opus-high for the work.
To avoid anyone else having to wait for all the comments to download I made a copy here for you to pull if you want to do your own analysis.
The raw review comment dataset is available as a public gist for reproducibility: https://gist.github.com/RussellSpitzer/8dddd1915d0c9fb9e027ab5fd5331c87
--- Bot Text Follows --
Summary
Adds an
AGENTS.mdfile at the repository root following the AGENTS.md open standard — a convention adopted by 60,000+ repositories for providing AI coding agents with project-specific context. This file is automatically discovered by tools like Cursor, Copilot, Claude Code, Codex, and others.The conventions were synthesized from 58,381 review comments across 4,309 merged PRs spanning the full history of the Apache Iceberg project. Comments were collected via the GitHub GraphQL API from every merged PR with 3+ reviews, filtered to PMC members and committers.
Creation Process
gh api graphqlwith pagination across the full PR history.What's Covered
What's NOT included
Test plan
Made with Cursor