Skip to content

Re-enable EvaluateOverhead to subtract method-call overhead from benchmark results#5142

Closed
lewing wants to merge 1 commit intodotnet:mainfrom
lewing:evaluate-overhead-fix
Closed

Re-enable EvaluateOverhead to subtract method-call overhead from benchmark results#5142
lewing wants to merge 1 commit intodotnet:mainfrom
lewing:evaluate-overhead-fix

Conversation

@lewing
Copy link
Member

@lewing lewing commented Mar 5, 2026

Summary

Re-enable EvaluateOverhead in the default benchmark job configuration. BenchmarkDotNet PR #3007 (merged Feb 16, 2026) changed the EvaluateOverhead default from true to false, which means BDN no longer runs idle/overhead iterations to measure and subtract the cost of the benchmark method call itself from workload measurements.

Problem

This one-line default change, combined with a 2.5-month WASM perf data gap (Dec 5, 2025 – Feb 26, 2026), caused the auto-filer to report 2,300+ false regressions:

On native JIT platforms the method-call overhead is <1ns (imperceptible), but on WASM interpreter it's ~8-10ns and on AOT WASM ~1-3ns. Without overhead subtraction, every WASM microbenchmark reports raw time including this call overhead, creating an additive bias that makes short-baseline benchmarks appear dramatically regressed:

Baseline +8ns overhead Apparent regression
10 ns → 18 ns 1.80x
50 ns → 58 ns 1.16x
150 ns → 158 ns 1.05x (at detection threshold)
500 ns → 508 ns 1.02x (below threshold)

Evidence

The regression distribution across all 2,300+ benchmarks perfectly matches an additive constant, not a multiplicative factor. Time-series data from the perf portal confirms a step function coinciding exactly with the BDN methodology change — no gradual drift, and no runtime code changes in the window. See the detailed analysis in the issue comments.

Fix

One-line change in RecommendedConfig.cs to explicitly set .WithEvaluateOverhead(true) in the default Job configuration, restoring the previous measurement methodology for all platforms.

This is arguably the right default for a performance lab anyway — overhead subtraction gives more accurate results for the actual workload being measured, especially for sub-microsecond benchmarks.

/cc @AaronRobinsonMSFT @AaronRobinsonMSFT

…hmark results

BenchmarkDotNet PR dotnet/BenchmarkDotNet#3007 (merged Feb 16, 2026) changed
the EvaluateOverhead default from true to false. This means BDN no longer
runs idle/overhead iterations to measure and subtract the cost of the
benchmark method call itself.

On native JIT platforms the call overhead is <1ns and imperceptible, but on
interpreted WASM it is ~8-10ns and on AOT WASM ~1-3ns. Combined with a
2.5-month WASM perf data gap (Dec 5, 2025 – Feb 26, 2026), this caused the
auto-filer to report 2300+ false regressions:

- dotnet/perf-autofiling-issues#69444: 1416 interpreted WASM regressions
- dotnet/perf-autofiling-issues#69430: 864 AOT WASM regressions

The regression ratio inversely correlates with baseline duration — a
constant additive overhead, not a real runtime change. Re-enabling
EvaluateOverhead restores the previous measurement methodology for all
platforms.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@timcassell
Copy link

This is arguably the right default for a performance lab anyway — overhead subtraction gives more accurate results for the actual workload being measured, especially for sub-microsecond benchmarks.

That was the exact discussion we had before the change was made. cc @tannergooding

@lewing lewing requested a review from tannergooding March 5, 2026 22:25
@DrewScoggins
Copy link
Member

Yeah, from the experience of triaging, we often run into spurious regressions where we end up at 0ns time spent, as well as what is mentioned in the linked issue about fast tests being super noisy. Now that being said, if we have larger, variable overheads in the WASM tests we should try and turn them on for just that scenario. I don't have a lot of experience looking at the WASM tests so I trust your judgement @lewing. I imagine that should be fairly easy, and am happy to add to the PR when I get back from a walk.

@tannergooding
Copy link
Member

That was the exact dotnet/BenchmarkDotNet#1802 we had before the change was made. cc @tannergooding

Right. While the intuitive thing would be that measuring and subtracting overhead is "better", it's often actually quite the opposite in practice due to the precision of the hardware timers and other factors.

Here it sounds rather like WASM interpreter is simply slow enough, in contrast to other scenarios, that no longer subtracting the overhead is showing up as measurable.

What would be interesting to know is whether the WASM results are overall more stable now that the overhead isn't being subtracted. If they are more stable, then while there is a "spike" we would be at a better and probably more representative baseline for what user code actually sees. This would also help reduce noise in future triage. However, if they stay the overall same stability levels, then it doesn't really matter and we can toggle subtraction back on for WASM and revisit that in the future if the overhead is ever reduced.

@lewing
Copy link
Member Author

lewing commented Mar 6, 2026

closing in favor of #5143

@lewing lewing closed this Mar 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants