Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: improve rmul/rdiv/rsqrt operations #42

Merged
merged 4 commits into from
Dec 30, 2024
Merged

perf: improve rmul/rdiv/rsqrt operations #42

merged 4 commits into from
Dec 30, 2024

Conversation

loyd
Copy link
Owner

@loyd loyd commented Dec 29, 2024

Replace custom i256/u256 impl with the i256 crate.

i9-14900K:

group                                      new                                    master
-----                                      ---                                    ------
F128p18/rdiv (~1e5/~1e4, Ceil)             1.00     10.7±0.15ns        ? ?/sec    6.02     64.3±0.15ns        ? ?/sec
F128p18/rdiv (~1e5/~1e4, Floor)            1.00     10.4±0.15ns        ? ?/sec    6.17     64.2±0.22ns        ? ?/sec
F128p18/rdiv (~1e5/~1e4, Nearest)          1.00     11.2±0.16ns        ? ?/sec    5.75     64.6±0.16ns        ? ?/sec
F128p18/rmul (~1e4, Ceil)                  1.00      7.0±0.04ns        ? ?/sec    7.71     54.4±0.36ns        ? ?/sec
F128p18/rmul (~1e4, Floor)                 1.00      7.0±0.02ns        ? ?/sec    7.70     54.3±0.34ns        ? ?/sec
F128p18/rmul (~1e4, Nearest)               1.00      7.2±0.06ns        ? ?/sec    7.60     54.4±0.19ns        ? ?/sec
F128p18/rsqrt (MAX, Ceil)                  1.00     40.2±0.28ns        ? ?/sec    13.43   540.0±9.43ns        ? ?/sec
F128p18/rsqrt (MAX, Floor)                 1.00     39.3±0.27ns        ? ?/sec    13.35   524.8±1.47ns        ? ?/sec
F128p18/rsqrt (MAX, Nearest)               1.00     41.4±0.38ns        ? ?/sec    13.17  545.4±10.95ns        ? ?/sec
F128p18/rsqrt (adaptive, Ceil)             1.00     50.0±0.42ns        ? ?/sec    10.61   530.8±2.11ns        ? ?/sec
F128p18/rsqrt (adaptive, Floor)            1.00     49.2±0.42ns        ? ?/sec    10.55   519.0±1.83ns        ? ?/sec
F128p18/rsqrt (adaptive, Nearest)          1.00     50.6±0.38ns        ? ?/sec    10.83   547.8±2.17ns        ? ?/sec
F128p18/rsqrt (~1e4, Ceil)                 1.00     40.0±0.24ns        ? ?/sec    5.91    236.3±0.59ns        ? ?/sec
F128p18/rsqrt (~1e4, Floor)                1.00     39.4±0.28ns        ? ?/sec    5.68    223.9±3.09ns        ? ?/sec
F128p18/rsqrt (~1e4, Nearest)              1.00     41.2±0.28ns        ? ?/sec    5.84    240.5±0.77ns        ? ?/sec
F64p9/rsqrt (MAX, Ceil)                    1.00      1.0±0.01ns        ? ?/sec    31.48    31.6±0.09ns        ? ?/sec
F64p9/rsqrt (MAX, Floor)                   1.00      1.0±0.01ns        ? ?/sec    30.12    30.2±0.29ns        ? ?/sec
F64p9/rsqrt (MAX, Nearest)                 1.00      1.0±0.01ns        ? ?/sec    33.35    33.6±0.24ns        ? ?/sec
F64p9/rsqrt (adaptive, Ceil)               1.00      5.4±0.02ns        ? ?/sec    5.69     30.5±0.10ns        ? ?/sec
F64p9/rsqrt (adaptive, Floor)              1.00      4.9±0.01ns        ? ?/sec    5.96     29.1±0.67ns        ? ?/sec
F64p9/rsqrt (adaptive, Nearest)            1.00      5.5±0.02ns        ? ?/sec    5.92     32.6±0.58ns        ? ?/sec
F64p9/rsqrt (~1e4, Ceil)                   1.00      1.0±0.02ns        ? ?/sec    13.55    13.8±0.05ns        ? ?/sec
F64p9/rsqrt (~1e4, Floor)                  1.00      1.0±0.02ns        ? ?/sec    12.47    12.7±0.03ns        ? ?/sec
F64p9/rsqrt (~1e4, Nearest)                1.00      1.0±0.03ns        ? ?/sec    15.04    15.5±0.05ns        ? ?/sec

@loyd
Copy link
Owner Author

loyd commented Dec 29, 2024

Need to update and check after Alexhuszagh/i256-rs#40

@loyd loyd force-pushed the perf/i256 branch 4 times, most recently from cbc690c to ca9ea48 Compare December 30, 2024 16:52
@loyd loyd merged commit 2c4ab7c into master Dec 30, 2024
5 checks passed
@loyd loyd deleted the perf/i256 branch December 30, 2024 17:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant