Skip to content

Add benchmark for infer_json_schema#9546

Merged
Dandandan merged 1 commit intoapache:mainfrom
Rafferty97:json-infer-benchmark
Mar 13, 2026
Merged

Add benchmark for infer_json_schema#9546
Dandandan merged 1 commit intoapache:mainfrom
Rafferty97:json-infer-benchmark

Conversation

@Rafferty97
Copy link
Contributor

Which issue does this PR close?

Split out from #9494 to make review easier. It simply adds a benchmark for JSON schema inference.

Rationale for this change

I have an open PR that significantly refactors the JSON schema inference code, so I want confidence that not only is the new code correct, but also has better performance than the existing code.

What changes are included in this PR?

Adds a benchmark.

Are these changes tested?

N/A

Are there any user-facing changes?

No

@github-actions github-actions bot added the arrow Changes to the arrow crate label Mar 12, 2026
@Dandandan Dandandan merged commit c214c3c into apache:main Mar 13, 2026
24 checks passed
}

let mut data = vec![];
for row in pseudorandom_sequence::<Row>(ROWS) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think other benchmarks we have use seedable_rng to get repeatable psuedo random numbers. Is there a reason we shouldn't follow the same pattern here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did explore using seedable_rng, but I can't remember why I abandoned that approach. But I agree, better to use the established pattern and avoid an extra dependency. Since this is already merged, I'll create a follow up PR when I get the time that addresses this.

bytes = "1.4"
criterion = { workspace = true, default-features = false }
rand = { version = "0.9", default-features = false, features = ["std", "std_rng", "thread_rng"] }
arbitrary = { version = "1.4.2", features = ["derive"] }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer not to add a new dependency (even a dev one) unless really necessary as that is then one more thing to chase down / maintain. I think you could get the same effect using a random number generator directly

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will also address this in the follow up PR

@alamb
Copy link
Contributor

alamb commented Mar 13, 2026

Thanks @Rafferty97 for this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

arrow Changes to the arrow crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants