-
Notifications
You must be signed in to change notification settings - Fork 97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARM64 - sorry to open this as an issue #738
Comments
Sure, but you might consider cross compiling instead of building on the RPi itself. You can use |
Unfortunately I don't think you get any speedups when using distcc with LTO though. It seems like the linking step runs locally and it probably needs to. Otherwise I would assume that every computer part of the distcc cluster needs to have exactly the same libraries. With LTO most of the compilation time is spent in the linking step. Personally I had been wondering for a long time why distcc isn't helping my systems anymore and it's probably because of this. Most of my packages actually seem to compile a bit slower and distribute very badly when trying to use an older version of distcc without that PR merged. Once a new distcc version is released on gentoo with this PR inside the time taken should at least not increase but it won't even have any chance of improving anymore since distcc won't even try to distribute that work. Since I have LTO enabled system-wide except when I build the kernel it's only the kernel that I see big speed improvements on when using distcc. This is unfortunately a pretty sad situation because I think it's in smaller and embedded systems that LTO is even more important since it can increase performance and always results in smaller or at least equally sized binaries. So yes it does make a lot of sense to enable it on a raspberry pi, except it increases the compilation time by a lot and you cannot distribute the work. What you would have to do instead is compile the packages with LTO locally on a stronger machine in a build environment for the Raspberry Pi and then distribute the binaries to the Pi. Either that, or accept that it will take a really long time to build all the packages on the RPi itself. Also for smaller systems I suggest you go with -O2 or -Os instead of -O3 because O3 tends to increase the binary sizes by quite a lot often without actually increasing the performance of them. From my experience from embedded development I have seen -Os sometimes giving the best performance and the smallest binaries (with the latter being the goal of Os, while the former is supposed to be the goal of O3), perhaps because the smaller code more easily fits into the small cache sizes available on such processors. As you can see here some of the more advanced optimization flags enabled by this overlay are sometimes decreasing the performance. Apart from taking less space (which you might not have a lot of in smaller system) smaller binaries also have a faster startup-time which is often the dominant problem when running programs with short runtimes. This is especially true if the backing storage is quite slow (like an SD card) and if there's not much RAM available acting as a filesystem cache. |
Yes, the linking is done locally but it also depends on the source code, bigger packages certainly compiles faster using distcc. EDIT: |
@shelterx Are you sure you actually built llvm with -flto though? My output from "emerge --info llvm" also confirms that the CFLAGS & CXXFLAGS don't have -flto present. When -flto is enabled I think all the code optimization is skipped in the compiling step and instead done during the linking step. This is why the linking step takes so much longer and the distcc helpers can't really help much with the actual compilation steps either. Sending the source code and the results back over the network is likely just slowing down the whole process. |
@AnonymousRetard Ooops, you are correct and that would explain why I don't see some stuff getting passed to distcc server. here's another example: |
@shelterx This is quite interesting. I might do some of my own tests on these packages later. I have a 4 core weak AMD system used as a server and a strong 16 core 5950X. I don't have specific examples since it's a long time ago I tried this last but I remember being very disappointed in DISTCC performance and actually seeing slowdowns from it on quite a few packages. Very few jobs where being distributed to the 5950X and the majority of the time when building packages was spent compiling things locally. These issues completely disappeared when building packages without -flto but I decided that I rather build stuff locally with LTO than try to speed up the jobs with DISTCC. This discussion should perhaps continue somewhere else though. The issue tracker on distcc is probably a better place: https://github.com/distcc/distcc As I mentioned in my original reply a PR has been merged that looks like distcc will soon stop trying to distribute -flto jobs completely so we'll have to raise an issue there if we want to change that behavior in a future release. Perhaps it helps in some cases but not others but I'm not sure if that can be detected automatically. |
Sorry, for opening this, but I didn't know how to address this.
Does it make sense to use LTO for an ARM64 Raspberry Pi on Gentoo?
The text was updated successfully, but these errors were encountered: