Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Number of warm-up runs should not be the same every time #1565

Open
twharmon opened this issue Jan 12, 2024 · 3 comments
Open

Number of warm-up runs should not be the same every time #1565

twharmon opened this issue Jan 12, 2024 · 3 comments

Comments

@twharmon
Copy link
Contributor

I found an irregularity on a handful of benchmarks where the duration was 10 times the typical duration, just because the assigned number of warm-up runs led to garbage collection on the timed run (or something like that).

In my example, select row took ~6ms when the number of warm-up runs was 0, 10, or 20. But it took 60ms with 5 warm-up runs.

I was using tinygo to compile to wasm, so it could be that specific allocator / runtime that leads to this issue.

The difference is easy to see in the screenshot below. goui-alt is the same app compiled with the standard go compiler. Tinygo performs better than Go for the most part, but the warm-up runs in a few instances can cause tinygo's runtime to be slow in a consistent way.
results

Just changing the number of warm-up runs to 8 completely changed the results:
8warmups

Suggestion

Instead of 5 warm-ups for everything, why don't we make it rand(5), rand(10), or nth-run % 5?

@twharmon twharmon changed the title Number of warm-up runs should be random within a range Number of warm-up runs should not be the same every time Jan 12, 2024
@krausest
Copy link
Owner

krausest commented Jan 12, 2024

Originally I wanted to include GC in the results (and thus punish high GC pressure), but it turned out that this conflicted with the goal to reduce variance. Thus some time ago I decided to run GC before each benchmark iteration:
https://github.com/krausest/js-framework-benchmark/blob/master/webdriver-ts/src/forkedBenchmarkRunnerPuppeteer.ts#L127

So I think the problem you describe occurs only for WASM frameworks since they managed the memory on their own.
I'm not sure what's the best way to handle that.

  • Keeping GC for JS might penalize WASM frameworks
  • Removing GC from JS introduces noise that makes comparison between close contenders worthless

I think I'll check the next few days how big the impact is when including GC again. Maybe then randomizing the warmup count will make sense.

@Erithax
Copy link

Erithax commented Jun 23, 2024

Dioxus has wild variation (7.1 to 13.8, +94%) in select row between the Chrome 124 and 126 bench. Which use the same source for Dioxus. Perhaps due to this issue.
Clear rows also had ca. +45%.

@krausest
Copy link
Owner

Chrome 124 was really a challenging release and it forced me to rework the full GC before running the benchmark.
Nevertheless it was necessary to produce results with low variance.

I never spent much thoughts on what the impact of that GC for webasm implementations was. I guess it might help regarding DOM nodes, but of course not regarding the WASM heap. All in all it should help JS implementations way more than WASM implementations (at least those that must perform a memory cleanup during the benchmark, which isn't triggered by a chrome GC).
Does anyone know whether such expensive cleanup of the WASM heap (compaction / "internal" GC) happens for one of the implementations?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants