-
-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for ARM hardware intrinsics #63
Comments
Ya, been waiting for these to show up, but probably just going to have to PR them myself like I'm doing for the F16C stabilization. |
Did some digging. It looks like the holdup for ARM conversions is that the ARM code all relies on So it would be a completely different implementation that accessed the LLVM hardware intrinsics directly without going through the easier |
I think the only reasonable option in the short term is to bypass the intrinsics and use inline assembly to get to these instructions. Inline assembly on ARM has been stable since Rust 1.59, so this will not even require a nightly compiler. |
is_aarch64_feature_detected has been stable since 1.60, so looks like with inline assembly this could be implemented in stable today, without waiting on any std or compiler features. |
Interesting, did not realize that. Though 32-bit ARM detection is NOT stable (so could not detect |
I've implemented the assembly conversions in the |
Amazing! That was quick! As for benchmarking, a number of public clouds provide ARM machines and free compute credits upon signup. For example, Google Cloud (full disclosure: they're my employer, I may be biased) provides $300 in free credit, and they have ARM machines. That should be more than enough to test and benchmark this on real hardware. |
Also, I have access to an ARM machine, and I could run the benchmarks you specify. Note that the criterion benchmarks in the repository lack black-boxing, which may make them not representative of the real-world performance. Both the input and output should be wrapped in |
Merged the aarch64 branch, so main branch now has AArch64 hardware support on Stable Rust now. Also includes arithmetic hardware operations on Leaving issue open for future possible 32-bit ARM support |
I assume the upcoming release also enabled the |
Feature is still there until the F16C stuff gets stabilized. Will probably remove after yes |
This is unfortunate because all dependents that re-export the type or expose it in the public API at all, such as the Worse, the major version bumps would also have to be carried out in sync across all dependents, or they will lose interoperability with each other. The ecosystem had it already with futures crate versions 0.1 and 0.3, and it was Not Fun. Since no breaking changes are actually made other than MSRV, may I instead suggest keeping the |
The MSRV policy is due to a previous minor version bump with MSRV bump that caused compile failures in downstream creates. It's kind of tough either way, with either choice causing downstream issues. I'll see what I can do about phasing out use-intrinsics in the way you mentioned, that may work in this case. I also want to see if I can finangle a deprecation warning on the |
Now that 1.70 is out with the x86 |
Not sure if this is part of this issue, but aarch64 has bf16/svebf16 features, which are currently not used by this crate? |
Hmm, may I attach a comment here? I'm doing my best to maintain the |
ARM provides intrinsics to convert from f32 to f16 since ARMv8, see e.g. VCVT-F16-F32
Unfortunately the Rust standard library does not implement this intrinsic yet, even though it does implement lots of similar ones - e.g. vcvt_f64_f32.
Adding support for this intrinsic in the standard library should be fairly trivial, since all the groundwork is already laid out.
The text was updated successfully, but these errors were encountered: