Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New AVX512F implementation #760

Merged
merged 3 commits into from
Mar 30, 2024
Merged

New AVX512F implementation #760

merged 3 commits into from
Mar 30, 2024

Conversation

Ka-zam
Copy link
Contributor

@Ka-zam Ka-zam commented Feb 27, 2024

I think this is a better implementation of the reciprocal kernel as it uses the new _mm512_rcp14_ps intrinsic that handles exceptions correctly. It's accurate to tol < 6.2e-5. On a 7950X3D there is a 30% speedup.

magnus@r7950x3d:~/src/kazam/volk/build$ volk_profile -R reci
RUN_VOLK_TESTS: volk_32f_reciprocal_32f(131071,1987)
generic completed in 20.7839 ms
a_sse completed in 41.2548 ms
a_avx completed in 20.6385 ms
a_avx512 completed in 16.861 ms
u_sse completed in 41.301 ms
u_avx completed in 20.7819 ms
u_avx512 completed in 15.9916 ms
Best aligned arch: u_avx512
Best unaligned arch: u_avx512
Writing /home/magnus/.volk/volk_config...

Signed-off-by: Magnus Lundmark <[email protected]>
Signed-off-by: Magnus Lundmark <[email protected]>
Copy link
Contributor

@jdemel jdemel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your contribution. It looks really good. We just need to figure out and document if the division handling is the same across all implementations. This seems to be a very nasty to debug bug, if it would occur.

kernels/volk/volk_32f_reciprocal_32f.h Show resolved Hide resolved
kernels/volk/volk_32f_reciprocal_32f.h Show resolved Hide resolved
Signed-off-by: Magnus Lundmark <[email protected]>
@Ka-zam
Copy link
Contributor Author

Ka-zam commented Mar 1, 2024

I ran all kernels for special values:

magnus@r7950x3d:~/src/kazam/scratch$ ./a.out 
x:
 -0.0000e+00   0.0000e+00          inf         -inf          nan         -nan   1.0000e-30   1.0000e+30 
generic:
        -inf          inf   0.0000e+00  -0.0000e+00          nan         -nan   1.0000e+30   1.0000e-30 
a_sse:
        -inf          inf   0.0000e+00  -0.0000e+00          nan         -nan   1.0000e+30   1.0000e-30 
a_avx:
        -inf          inf   0.0000e+00  -0.0000e+00          nan         -nan   1.0000e+30   1.0000e-30 
a_avx512:
        -inf          inf   0.0000e+00  -0.0000e+00          nan         -nan   9.9999e+29   9.9999e-31 

NaN and inf with sign are properly handled for all kernels.

Copy link
Contributor

@jdemel jdemel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jdemel
Copy link
Contributor

jdemel commented Mar 30, 2024

No objections. The broken build should be fixed now with #761 . Merging...

@jdemel jdemel merged commit 8a015bb into gnuradio:main Mar 30, 2024
32 of 34 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants