-
Notifications
You must be signed in to change notification settings - Fork 97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CFLAGS Performance Evaluation #431
Comments
N.B. -O2 actually means also, here is a script for identifying which benchmarks take the most time, by parsing your these are my worst offenders:
|
easier to look at when I check 'normalize results' |
Next week might be an online week for my school, so I might have enough time to look into this, but only benchmarking my current system. |
@wolfwood Sooo..... LTO GRAPHITE is both worst and best, am I reading it right? |
That is one way to read it, certainly. The part that that framing misses is that in some cases it's last place by 0.02% and sometimes it's faster by a double digit percent. The harmonic means attempt to account for that but mostly say it's a wash. I think winnowing down the tests to remove the noisiest ones should make the harmonic means (and the number of wins/losses) less arbitrary. Also removing tests that show little variation across results would magnify the variation in the remaining ones, but those are also good canaries for detecting regressions. Not exactly sure how to analyze the raw data and maker those kinds of determinations, though. |
Would it be possible to create prefixes instead of rebuilding the system and run it from there, or would the host system influence the prefix in testing? If you do get a chance to post a lighter set without what you think is essential, or benchmark tests separated out, I'd really appreciate that. |
i don't have experience using prefixes, but my phoronix set plus pts itself should be all you need. for me the testing takes much longer that the world rebuild, but i guess you're right it's not all necessary. |
Right. my main concern is that my processor would likely take +3 days compared to your beefy CPU for the full bench, and I'm sure there are others who'd be much more willing to do the benchmark if it were time feasible. The issue with that is if the tests aren't comprehensive enough the results may end up being pointless.
I wouldn't doubt it, since a lot of tests are repeated for deviation. The problem is for some systems, a full world build would take >24hr, just by itself. Perhaps it would be ideal to setup a stage 3 tarball with the bare essentials to install on a separate partition to reduce the need for a full @world rebuild with all the fun stuff that benchmarking doesn't use. If it wouldn't be too much work, can you explain why such tests were chosen? What about these particular tests did you think made them important to include? Edit: Sorry, I know I am asking a lot of simple questions trying to figure this out. |
@jiblime I started with every phoronix test that supported linux and would install, then dropped the ones that depended on X (somewhat arbitrarily as I said) and then dropped a few that took a very long time. This is not a curated set of tests, rather I'm looking for help in producing one. |
also, I'm trying out setting up a prefix. If nothing else I'll feel a lot better not having to install php and mongodb on in my host system :P |
Good to know! I'll come up with my own (small) set and hopefully someone here can chime in as to what needs to be fixed for coverage and consistency. About the prefix: you can do a bash bootstrap, but it's needless since the level 2 IIRC will be rebuilding the whole system anyways. So the bootstrap-prefix.sh would be preferred. It also works with Notes about setting up the prefix:
|
I got hung up in my work on #288 with 2 issues. what are representative benchmarks, and how do I evaluate the flags selections that iRace outputs to compare to some baseline and decide if its 'improving' things.
I decided take a break from the flag learning experiments to simply try running phoronix-test-suite tests after rebuilding my system with various CFLAGS configurations. results were a bit indecipherable unfortunately, but you can see them uploaded here:
my results
I got an ebuild for PTS from @bobwya's overlay, and made a package set of my own for various test dependencies (still haven't gotten the tensorflow stuff to build right tho)
phoronix-set.txt which you should copy to
/etc/portage/sets/phoronix
andemerge -av @phoronix
to get all the necessary deps for the phoronix tests on gentoo,started out trying to test with a virtual suite for all valid linux tests, which were estimated at 1 month runtime. I then cut this down to about a 7 day test set. then I got sick and tired of watching it test every single combination of resolution and track on Super Tuxkart for days on end and ejected all graphical testing completely (yes I regret this a tad).
I ended up with a set that runs in a bit under a day for me. you can try it at home as 1910276-HV-LTOIZE49411
[skipping GPU testing but keeping a lot of disk tests is non-ideal as I'd expect compile flags to have less effect on storage workloads than compute or GPU bound ones, but the results aren't flat in all cases and I'm having trouble telling if the disk testing is just very noisy or if its actually meaningful data.]
I then wrote a tool to force install and run tests with my current *FLAGS, from
emerge --info
, and MPI environmental variables, which I called pts (make sure to add FCFLAGS="${CFLAGS}" and FFLAGS="${CFLAGS}" to your make.conf because some tests are fortran)pts.txt
I then wrote a wrapper script to take a password, run sed on my make.conf (overwrites your CFLAGS! comment them out to save a copy), reemerge my system and then run my pts script
eval.sh.txt
Finally I wrote a wrapper to try all combinations of O2/O3, LTO and GRAPHITE
harness.pl.txt
I think the next steps in this work are to move to the newest 9.x line of PTS, further remove tests that don't give meaningful results and then try again with ${DEVIRTLTO} ${IPAPTA} ${SEMINTERPOS} and -falign-functions=24/32/64/xxx
mostly just throwing this up there to see if anyone else can make any sense of the PTS graphs / is interest in running on their own hardware.
The text was updated successfully, but these errors were encountered: