Re-enable EvaluateOverhead to subtract method-call overhead from benchmark results#5142
Re-enable EvaluateOverhead to subtract method-call overhead from benchmark results#5142lewing wants to merge 1 commit intodotnet:mainfrom
Conversation
…hmark results BenchmarkDotNet PR dotnet/BenchmarkDotNet#3007 (merged Feb 16, 2026) changed the EvaluateOverhead default from true to false. This means BDN no longer runs idle/overhead iterations to measure and subtract the cost of the benchmark method call itself. On native JIT platforms the call overhead is <1ns and imperceptible, but on interpreted WASM it is ~8-10ns and on AOT WASM ~1-3ns. Combined with a 2.5-month WASM perf data gap (Dec 5, 2025 – Feb 26, 2026), this caused the auto-filer to report 2300+ false regressions: - dotnet/perf-autofiling-issues#69444: 1416 interpreted WASM regressions - dotnet/perf-autofiling-issues#69430: 864 AOT WASM regressions The regression ratio inversely correlates with baseline duration — a constant additive overhead, not a real runtime change. Re-enabling EvaluateOverhead restores the previous measurement methodology for all platforms. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
That was the exact discussion we had before the change was made. cc @tannergooding |
|
Yeah, from the experience of triaging, we often run into spurious regressions where we end up at 0ns time spent, as well as what is mentioned in the linked issue about fast tests being super noisy. Now that being said, if we have larger, variable overheads in the WASM tests we should try and turn them on for just that scenario. I don't have a lot of experience looking at the WASM tests so I trust your judgement @lewing. I imagine that should be fairly easy, and am happy to add to the PR when I get back from a walk. |
Right. While the intuitive thing would be that measuring and subtracting overhead is "better", it's often actually quite the opposite in practice due to the precision of the hardware timers and other factors. Here it sounds rather like WASM interpreter is simply slow enough, in contrast to other scenarios, that no longer subtracting the overhead is showing up as measurable. What would be interesting to know is whether the WASM results are overall more stable now that the overhead isn't being subtracted. If they are more stable, then while there is a "spike" we would be at a better and probably more representative baseline for what user code actually sees. This would also help reduce noise in future triage. However, if they stay the overall same stability levels, then it doesn't really matter and we can toggle subtraction back on for WASM and revisit that in the future if the overhead is ever reduced. |
|
closing in favor of #5143 |
Summary
Re-enable
EvaluateOverheadin the default benchmark job configuration. BenchmarkDotNet PR #3007 (merged Feb 16, 2026) changed theEvaluateOverheaddefault fromtruetofalse, which means BDN no longer runs idle/overhead iterations to measure and subtract the cost of the benchmark method call itself from workload measurements.Problem
This one-line default change, combined with a 2.5-month WASM perf data gap (Dec 5, 2025 – Feb 26, 2026), caused the auto-filer to report 2,300+ false regressions:
On native JIT platforms the method-call overhead is <1ns (imperceptible), but on WASM interpreter it's ~8-10ns and on AOT WASM ~1-3ns. Without overhead subtraction, every WASM microbenchmark reports raw time including this call overhead, creating an additive bias that makes short-baseline benchmarks appear dramatically regressed:
Evidence
The regression distribution across all 2,300+ benchmarks perfectly matches an additive constant, not a multiplicative factor. Time-series data from the perf portal confirms a step function coinciding exactly with the BDN methodology change — no gradual drift, and no runtime code changes in the window. See the detailed analysis in the issue comments.
Fix
One-line change in
RecommendedConfig.csto explicitly set.WithEvaluateOverhead(true)in the default Job configuration, restoring the previous measurement methodology for all platforms.This is arguably the right default for a performance lab anyway — overhead subtraction gives more accurate results for the actual workload being measured, especially for sub-microsecond benchmarks.
/cc @AaronRobinsonMSFT @AaronRobinsonMSFT