Add AGENTS.md with project conventions for AI coding agents by RussellSpitzer · Pull Request #15529 · apache/iceberg

RussellSpitzer · 2026-03-08T02:03:56Z

Using my Anthropic Dollars Appropriately

I've been working recently using my own personal review skill to help me go over PR's and quickly get up to date on new requests before I take my own look at the PR and I realized this is probably useful to far more folks than just me. I decided to get ambitious and just analyze absolutely everything in the repo and try to get a good amalgam skill based on everything that's happened so far. Below is the result.

As noted below this was produced by Cursor and I used claude-4.6-opus-high for the work.

To avoid anyone else having to wait for all the comments to download I made a copy here for you to pull if you want to do your own analysis.

The raw review comment dataset is available as a public gist for reproducibility: https://gist.github.com/RussellSpitzer/8dddd1915d0c9fb9e027ab5fd5331c87

--- Bot Text Follows --

Summary

Adds an AGENTS.md file at the repository root following the AGENTS.md open standard — a convention adopted by 60,000+ repositories for providing AI coding agents with project-specific context. This file is automatically discovered by tools like Cursor, Copilot, Claude Code, Codex, and others.

The conventions were synthesized from 58,381 review comments across 4,309 merged PRs spanning the full history of the Apache Iceberg project. Comments were collected via the GitHub GraphQL API from every merged PR with 3+ reviews, filtered to PMC members and committers.

Creation Process

Data collection: Fetched all review comments from merged PRs using gh api graphql with pagination across the full PR history.
Topic clustering: Classified comments into ~17 topic buckets (API design, naming, testing, performance, serialization, REST/spec, error handling, configuration, code style, module boundaries, etc.) using keyword-based pattern matching.
Sampling: Within each bucket, selected the most substantive comments (>80 chars, diverse file paths, de-duplicated).
Rule synthesis: Extracted concrete, actionable conventions from each topic cluster, focusing on patterns that appeared repeatedly and consistently across different reviewers and time periods.
Depersonalization: All rules are generic — no attribution to specific reviewers or behavioral profiles.

What's Covered

Architecture: Module boundaries, high-sensitivity areas
Design patterns: Refinement, CloseableIterable, null-over-Optional, builder, boolean-threw, Tasks, immutable metadata, etc.
Coding conventions: API design, naming, code style, placement, serialization, error handling, performance, configuration, testing, REST/OpenAPI

What's NOT included

No reviewer behavioral profiles or personal attributions // Russ note - It really wanted to make comments about what specific folks like to say and I thought that was inappropriate

Test plan

Verify file renders correctly on GitHub
Verify conventions are accurate against existing codebase patterns
Solicit feedback from PMC members on coverage and accuracy

Made with Cursor

Introduces an AGENTS.md file at the repository root following the open standard for providing AI coding agents with project-specific context. The conventions were synthesized from analysis of 58,000+ review comments across 4,300+ merged PRs spanning the project history. Covers module boundaries, high-sensitivity areas, design patterns, coding style, naming, serialization, error handling, performance, testing, and REST/OpenAPI conventions. Made-with: Cursor

Following the pattern established by apache/airflow's AGENTS.md, add the three remaining recommended sections: executable build/test commands, PR/commit conventions, and explicit safety boundaries. Made-with: Cursor

jbonofre

I would suggest to add link to https://www.apache.org/legal/generative-tooling.html as the ASF is working on guidelines for the projects. There are several efforts at foundation level, so this page will probably evolve (and maybe define policies). So I think it's important to be aligned at project level.

jbonofre

LGTM, the mention to ASF guidelines are already included in the corresponding Iceberg documentation.

jbonofre · 2026-03-08T17:12:01Z

I would suggest to add link to https://www.apache.org/legal/generative-tooling.html as the ASF is working on guidelines for the projects. There are several efforts at foundation level, so this page will probably evolve (and maybe define policies). So I think it's important to be aligned at project level.

Nevermind, already done in the corresponding Iceberg documentation.

pvary · 2026-03-09T09:59:50Z

AGENTS.md

+- Compute expected values, don't hardcode. Tests belong in the module that owns the code.
+- Write the most direct test for the bug. Parameterized tests for type variations.
+- JUnit 5 + AssertJ: `@Test` (no `test` prefix), `assertThat`, `assertThatThrownBy`.
+- `waitUntilAfter` for time-dependent tests. `boolean threw` for cleanup. Separate tests over combined.


What does "boolean threw" for cleanup means?

This is the pattern of

Do a whole lot of stuff boolean threw = false try { something } Catch { cleanup try stuff } if (threw) { cleanup stuff from outside try block }

I don't think we actually do this that often as I mentioned to @nastra , I think it's just overrepresented in our comments. I think we should just remove it

+1 for removal

AGENTS.md

nastra · 2026-03-09T13:35:00Z

AGENTS.md

+- **Builder pattern**: For complex creation. Never require passing `null` for optional parameters.
+- **Package-private by default**: Only make things public with demonstrated need.
+- **Postel's Law**: Accept case-insensitive input, produce canonical output.
+- **`boolean threw`**: `boolean threw = true; try { ...; threw = false; } finally { if (threw) cleanup(); }`.


I don't think this is a common pattern? We only use it in a very small set of cases

We probably just have lots of comments about it in TableOperations modifications so it probably is talked about alot even if it doesn't happen alot. This is based on comments not actual code usage

ajantha-bhat · 2026-03-09T13:44:26Z

I also have my own local CLAUDE.MD generated recently.

I think we can add some more rules like below to AGENTS.md.
But I think this file will keep on evolving. So, we can do this in follow up PRs too.

Build/Test:
  - ./gradlew build (full build with tests)                                                                                                                                                  
  - ./gradlew spotlessApply -DallModules (format all Spark/Flink versions)                                                                                                                   
  - Multi-version testing: -DsparkVersions=3.4,3.5,4.0,4.1, -DflinkVersions=1.20,2.0,2.1, -DscalaVersion=2.13                                                                                
  - ./gradlew integrationTest (requires Docker)                                                                                                                                              
  - Test logs location: build/testlogs/ per module                                                                                                                                           
                                                                                                                                                                                             
  Code Style (enforced by tooling):                                                                                                                                                          
  - Google Java Format 1.22.0 via Spotless; Scalafmt 3.9.7 for Scala                                                                                                                         
  - Logger field must be named LOG (not log) — error-prone LoggerEnclosingClass check                                                                                                        
  - No wildcard imports (individual imports only); specific allowed static wildcards (Preconditions, Collections, Collectors, AssertJ, Spark functions)
  - Banned packages: repackaged/shaded Guava, sun.*, deprecated commons libs                                                                                                                 
  - Preconditions.checkNotNull() must include a message parameter                                                                                                                            
  - No C-style array declarations (int arr[] → int[] arr)                                                                                                                                    
  - One statement per line, no more than two consecutive blank lines                                                                                                                         
  - @Override required on all overriding methods                                                                                                                                             
  - Inner classes that can be static must be static (ClassCanBeStatic error-prone check)                                                                                                     
  - LF only line endings (no CRLF)                                                                                                                                                           
                                                                                                                                                                                             
  Architecture:                                                                                                                                                                              
  - Explicit module dependency hierarchy (API → Core → Format → Catalog → Cloud → Engine)                                                                                                    
  - iceberg-common and iceberg-data modules not mentioned                                                                                                                                    
  - Catalog abstraction trio: Catalog (tables), ViewCatalog (views), SupportsNamespaces (namespaces)
  - REST client chain: RESTCatalog → RESTSessionCatalog → HTTP client                                                                                                                        
  - FileIO/InputFile/OutputFile abstraction explained                                                                                                                                        
  - Expression system uses visitor pattern                                                                                                                                                   
                                                                                                                                                                                             
  Binary Compatibility:                                                                                                                                                                      
  - RevAPI checks core API modules (api, core, parquet, orc, common, data) against baseline in .palantir/revapi.yml                                                                          
                                                                                                                                                                                             
  Config Locations:
  - .baseline/checkstyle/checkstyle.xml and suppressions                                                                                                                                     
  - baseline.gradle for Spotless/error-prone config                                                                                                                                          
  - .baseline/copyright/copyright-header-java.txt for license template
  - project/scalastyle_config.xml for Scala                                                                                                                                                  
                                                                                                                                                                                             
  Naming:                                                                                                                                                                                    
  - Test classes must use Test prefix (TestNewFeature, NOT NewFeatureTest)                                                                                                                   
  - Type variables: single capital letter or CamelCase ending in T                                                                                                                           
  - Constants: UPPER_SNAKE_CASE

zheliu2

I think many people are using Claude today as well as Cursor, CLAUDE.md is a special filename that Claude Code automatically loads into context at the start of every conversation. Should we also create a file CLAUDE.md and add a one-line in CLAUDE.md that says "Read AGENTS.md first"

RussellSpitzer · 2026-03-09T17:59:27Z

I think many people are using Claude today as well as Cursor, CLAUDE.md is a special filename that Claude Code automatically loads into context at the start of every conversation. Should we also create a file CLAUDE.md and add a one-line in CLAUDE.md that says "Read AGENTS.md first"

I don't think we can just use a vendor specific file name in this project. It's probably better to pressure Claude Code to support AGENTS or some more neutral standard. For individual users I think you can always make your own claude.md file and add it to a global .gitignore or something like that.

AGENTS.md

RussellSpitzer force-pushed the AddAgentsMD branch from 4d83e00 to 9d0b7ea Compare March 8, 2026 02:10

Add commands, PR conventions, and boundaries sections

9b91b28

Following the pattern established by apache/airflow's AGENTS.md, add the three remaining recommended sections: executable build/test commands, PR/commit conventions, and explicit safety boundaries. Made-with: Cursor

RussellSpitzer marked this pull request as ready for review March 8, 2026 02:13

jbonofre reviewed Mar 8, 2026

View reviewed changes

jbonofre approved these changes Mar 8, 2026

View reviewed changes

pvary reviewed Mar 9, 2026

View reviewed changes

AGENTS.md Show resolved Hide resolved

nastra reviewed Mar 9, 2026

View reviewed changes

AGENTS.md Show resolved Hide resolved

nastra reviewed Mar 9, 2026

View reviewed changes

Address Reviewer Comments

0f5d444

zheliu2 reviewed Mar 9, 2026

View reviewed changes

huaxingao reviewed Mar 11, 2026

View reviewed changes

AGENTS.md Show resolved Hide resolved

Add PR Title Example

637fb49

Conversation

RussellSpitzer commented Mar 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Using my Anthropic Dollars Appropriately

Summary

Creation Process

What's Covered

What's NOT included

Test plan

Uh oh!

jbonofre left a comment

Choose a reason for hiding this comment

Uh oh!

jbonofre left a comment

Choose a reason for hiding this comment

Uh oh!

jbonofre commented Mar 8, 2026

Uh oh!

pvary Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

RussellSpitzer Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

pvary Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

RussellSpitzer Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

nastra Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

RussellSpitzer Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

ajantha-bhat commented Mar 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zheliu2 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

RussellSpitzer commented Mar 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

RussellSpitzer commented Mar 8, 2026 •

edited

Loading

ajantha-bhat commented Mar 9, 2026 •

edited

Loading

zheliu2 left a comment •

edited

Loading