-
Notifications
You must be signed in to change notification settings - Fork 97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Benchmarking #123
Comments
Yeah I've seen interest in having benchmarks with a fully LTO system. Unfortunately, no one has suggested any benchmarks to actually run. I was looking into "system responsiveness" benchmarks a bit, but found no existing suites. |
I would also be really interested in general system benchmarks. I would totally try this but I'm not sure if its worth the time investment, and some benchmarks would be really helpful. Even general system benchmarks would be useful, like those used by Phoronix (e.g. here). The tests should be available in the Phoronix Test Suite. |
I was curious so I decided to run some of the benchmarks: https://openbenchmarking.org/result/1807107-AR-MERGE406580 I only messed around with O3, Graphite, LTO, and some PGO for gcc+python. I didn't try more aggressive optimizations. I used gcc version 7.3.0. Explanation of configuration names:
For PyBench, we have two more configurations:
Some analysis of the results:
So all-in-all, there are more benchmarks that benefit from O3+LTO+Graphite than don't (4 vs 1). That difference becomes 6 vs 2 if you also include results where the benchmark itself was further optimized, which for Gentoo, is a fairer comparison since almost everything is built from source anyway. I would be interested in trying more aggressive flags; I'm open to suggestions. In my experience, rebuilding world with O3+LTO+Graphite was actually pretty easy, thanks to this overlay. But having to recompile world with every gcc update doesn't sound very fun. Thankfully, gcc updates are few and far between in gentoo unstable (which is what I use). Compiling the kernel with more aggressive optimizations was more difficult. I ended up replacing all instances of -O2 in the Makefiles with the desired flags (does anyone know how to do this more easily?). |
Actually, thinking about the results a bit more, I realize that the benefit seen in compilebench and Timed Linux Kernel Compilation could have come from PGO in gcc. So I re-ran those two tests using gcc without PGO: https://openbenchmarking.org/result/1807119-AR-1807110AR51 Surprisingly, it turns out that compilebench runs even faster after disabling PGO in gcc. Less surprisingly, Timed Linux Kernel Compilation is about the same speed as -O2 without LTO or Graphite. |
Wow, this is excellent! So we're at least not seeing any real detrimental effects from having aggressive compiler optimizations on, and in some cases we're seeing a noticeable improvement. I could run the tests on my system, but I don't have an |
@asaparov by the way, forgot to mention: if you want to inject your own build flags into the kernel, you can use the
I was able to build my kernel with LTO, but unfortunately the proprietary modules I require don't seem to play nice with LTO, so I'm stuck with the above flags. |
Oh that works well, thanks! I had tried CFLAGS_KERNEL and KBUILD_CFLAGS earlier and they would weren't working. |
I try to keep my nbench results moving in the right direction. I should have kept them for reference, right now I get with my current set of flags (-autopar) on a mostly idle system (bare in mind it does have quite slow memory because I'm on a budget and re-used my existing DDR3 RAM when I built this machine)
What's quite interesting is how sensitive the various benchmarks is to L1 cache residency and others to high optimisation. An old Athlon X2:
In combination with my set of flags it gets a much better NUMERIC SORT result than the above AMD FX when compiled with -Os while some other benchmarks suffer:
While with -Ofast it's quite different:
At least it's faster than baseline :-) ... and yes, I really do need to update the kernel on that machine!! For reference, the old nbench results page is here: http://web.archive.org/web/20160706230749/http://www.tux.org:80/~mayer/linux/results2.html |
I have a machine I want to LTO and was wondering if anyone would be interested in some before and after benchmarks. If so what would be useful to run?
The text was updated successfully, but these errors were encountered: