Add benchmark for infer_json_schema#9546
Conversation
| } | ||
|
|
||
| let mut data = vec![]; | ||
| for row in pseudorandom_sequence::<Row>(ROWS) { |
There was a problem hiding this comment.
I think other benchmarks we have use seedable_rng to get repeatable psuedo random numbers. Is there a reason we shouldn't follow the same pattern here?
There was a problem hiding this comment.
I did explore using seedable_rng, but I can't remember why I abandoned that approach. But I agree, better to use the established pattern and avoid an extra dependency. Since this is already merged, I'll create a follow up PR when I get the time that addresses this.
| bytes = "1.4" | ||
| criterion = { workspace = true, default-features = false } | ||
| rand = { version = "0.9", default-features = false, features = ["std", "std_rng", "thread_rng"] } | ||
| arbitrary = { version = "1.4.2", features = ["derive"] } |
There was a problem hiding this comment.
I would prefer not to add a new dependency (even a dev one) unless really necessary as that is then one more thing to chase down / maintain. I think you could get the same effect using a random number generator directly
There was a problem hiding this comment.
Will also address this in the follow up PR
|
Thanks @Rafferty97 for this |
Which issue does this PR close?
Split out from #9494 to make review easier. It simply adds a benchmark for JSON schema inference.
Rationale for this change
I have an open PR that significantly refactors the JSON schema inference code, so I want confidence that not only is the new code correct, but also has better performance than the existing code.
What changes are included in this PR?
Adds a benchmark.
Are these changes tested?
N/A
Are there any user-facing changes?
No