Add benchmark for `infer_json_schema` by Rafferty97 · Pull Request #9546 · apache/arrow-rs

Rafferty97 · 2026-03-12T23:49:29Z

Which issue does this PR close?

Split out from #9494 to make review easier. It simply adds a benchmark for JSON schema inference.

Rationale for this change

I have an open PR that significantly refactors the JSON schema inference code, so I want confidence that not only is the new code correct, but also has better performance than the existing code.

What changes are included in this PR?

Adds a benchmark.

Are these changes tested?

N/A

Are there any user-facing changes?

No

alamb · 2026-03-13T09:53:18Z

arrow-json/benches/json_reader.rs

+    }
+
+    let mut data = vec![];
+    for row in pseudorandom_sequence::<Row>(ROWS) {


I think other benchmarks we have use seedable_rng to get repeatable psuedo random numbers. Is there a reason we shouldn't follow the same pattern here?

I did explore using seedable_rng, but I can't remember why I abandoned that approach. But I agree, better to use the established pattern and avoid an extra dependency. Since this is already merged, I'll create a follow up PR when I get the time that addresses this.

alamb · 2026-03-13T09:54:55Z

arrow-json/Cargo.toml

 bytes = "1.4"
 criterion = { workspace = true, default-features = false }
 rand = { version = "0.9", default-features = false, features = ["std", "std_rng", "thread_rng"] }
+arbitrary = { version = "1.4.2", features = ["derive"] }


I would prefer not to add a new dependency (even a dev one) unless really necessary as that is then one more thing to chase down / maintain. I think you could get the same effect using a random number generator directly

Will also address this in the follow up PR

alamb · 2026-03-13T09:55:18Z

Thanks @Rafferty97 for this

Add benchmark for infer_json_schema

5b89669

github-actions bot added the arrow Changes to the arrow crate label Mar 12, 2026

Dandandan approved these changes Mar 13, 2026

View reviewed changes

Dandandan merged commit c214c3c into apache:main Mar 13, 2026
24 checks passed

alamb reviewed Mar 13, 2026

View reviewed changes

Rafferty97 mentioned this pull request Mar 13, 2026

Remove dependency on arbitrary #9550

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add benchmark for `infer_json_schema`#9546

Add benchmark for `infer_json_schema`#9546
Dandandan merged 1 commit intoapache:mainfrom
Rafferty97:json-infer-benchmark

Rafferty97 commented Mar 12, 2026

Uh oh!

Uh oh!

alamb Mar 13, 2026

Uh oh!

Rafferty97 Mar 13, 2026

Uh oh!

alamb Mar 13, 2026

Uh oh!

Rafferty97 Mar 13, 2026

Uh oh!

alamb commented Mar 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Rafferty97 commented Mar 12, 2026

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

Uh oh!

alamb Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

Rafferty97 Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

alamb Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

Rafferty97 Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

alamb commented Mar 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants