Skip to content

Commit

Permalink
perf: improve rmul/rdiv/rsqrt
Browse files Browse the repository at this point in the history
  • Loading branch information
loyd committed Dec 30, 2024
1 parent 8117b77 commit cbc690c
Show file tree
Hide file tree
Showing 9 changed files with 565 additions and 1,376 deletions.
3 changes: 2 additions & 1 deletion Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ std = ["derive_more/error"]
i16 = []
i32 = []
i64 = []
i128 = []
i128 = ["dep:i256"]
serde = ["dep:serde"]
schemars = ["dep:schemars"]
parity = ["parity-scale-codec"]
Expand All @@ -50,6 +50,7 @@ derive_more = { version = "0.99.9", default-features = false }
parity-scale-codec = { version = "3", default-features = false, optional = true }
static_assertions = "1.1.0"
itoa = "1.0.1"
i256 = { version = "=0.1.1", default-features = false, optional = true }

[dev-dependencies]
anyhow = { version = "1.0.38", default-features = false }
Expand Down
136 changes: 68 additions & 68 deletions benches/README.md
Original file line number Diff line number Diff line change
@@ -1,92 +1,92 @@
# Benchmarks

Benchmarks were performed on an [AMD Ryzen 7 4800HS CPU](https://en.wikichip.org/wiki/amd/ryzen_9/3900).
Benchmarks were performed on an Intel Core i9-14900K CPU.

```sh
$ cargo bench --bench <name> --features <int>
$ critcmp new | tail +3 | sort | sed 's# ? ?/sec##'
$ critcmp new | tail +3 | sort | sed 's# ? ?/sec##' | sed 's# 1.00##'
```

## ops
64-bit FP with precision = 9:
```
F64p9/cadd (~1e4) 1.00 1.9±0.01ns
F64p9/from_decimal(12345, -3) 1.00 1.6±0.00ns
F64p9/next_power_of_ten 1.00 3.6±0.01ns
F64p9/rdiv (~1e5/~1e4, Ceil) 1.00 1.9±0.01ns
F64p9/rdiv (~1e5/~1e4, Floor) 1.00 1.9±0.01ns
F64p9/rdiv (~1e5/~1e4, Nearest) 1.00 1.9±0.00ns
F64p9/rmul (~1e4, Ceil) 1.00 1.9±0.01ns
F64p9/rmul (~1e4, Floor) 1.00 1.9±0.03ns
F64p9/rmul (~1e4, Nearest) 1.00 1.9±0.00ns
F64p9/rsqrt (~1e4, Ceil) 1.00 43.7±0.29ns
F64p9/rsqrt (~1e4, Floor) 1.00 42.5±0.17ns
F64p9/rsqrt (~1e4, Nearest) 1.00 47.0±0.19ns
F64p9/rsqrt (adaptive, Ceil) 1.00 98.0±0.33ns
F64p9/rsqrt (adaptive, Floor) 1.00 94.4±1.45ns
F64p9/rsqrt (adaptive, Nearest) 1.00 99.6±0.67ns
F64p9/rsqrt (MAX, Ceil) 1.00 102.3±0.50ns
F64p9/rsqrt (MAX, Floor) 1.00 100.2±0.50ns
F64p9/rsqrt (MAX, Nearest) 1.00 102.7±0.80ns
F64p9/to_decimal(0) (12.345) 1.00 9.1±0.02ns
F64p9/to_decimal(i32::MAX) (12.345) 1.00 9.1±0.01ns
F64p9/try_from(f64) (~0.1) 1.00 64.8±0.33ns
F64p9/try_from(f64) (~1e-12) 1.00 132.5±0.46ns
F64p9/try_from(f64) (~1e6) 1.00 24.9±0.14ns
F64p9/try_from(f64) (MAX) 1.00 5.9±0.01µs
F64p9/try_from(f64) (MIN_POSITIVE) 1.00 1872.9±4.12ns
F64p9/cadd (~1e4) 1.0±0.03ns
F64p9/from_decimal(12345, -3) 1.0±0.01ns
F64p9/next_power_of_ten 1.6±0.03ns
F64p9/rdiv (~1e5/~1e4, Ceil) 1.0±0.03ns
F64p9/rdiv (~1e5/~1e4, Floor) 1.0±0.04ns
F64p9/rdiv (~1e5/~1e4, Nearest) 1.0±0.04ns
F64p9/rmul (~1e4, Ceil) 1.0±0.03ns
F64p9/rmul (~1e4, Floor) 1.0±0.04ns
F64p9/rmul (~1e4, Nearest) 1.0±0.05ns
F64p9/rsqrt (~1e4, Ceil) 1.0±0.02ns
F64p9/rsqrt (~1e4, Floor) 1.0±0.02ns
F64p9/rsqrt (~1e4, Nearest) 1.0±0.03ns
F64p9/rsqrt (adaptive, Ceil) 5.4±0.02ns
F64p9/rsqrt (adaptive, Floor) 4.9±0.01ns
F64p9/rsqrt (adaptive, Nearest) 5.5±0.02ns
F64p9/rsqrt (MAX, Ceil) 1.0±0.01ns
F64p9/rsqrt (MAX, Floor) 1.0±0.01ns
F64p9/rsqrt (MAX, Nearest) 1.0±0.01ns
F64p9/to_decimal(0) (12.345) 5.0±0.01ns
F64p9/to_decimal(i32::MAX) (12.345) 5.0±0.02ns
F64p9/try_from(f64) (~0.1) 33.2±0.08ns
F64p9/try_from(f64) (~1e-12) 61.9±0.20ns
F64p9/try_from(f64) (~1e6) 16.2±0.05ns
F64p9/try_from(f64) (MAX) 1263.8±2.26ns
F64p9/try_from(f64) (MIN_POSITIVE) 693.4±2.38ns
```

128-bit FP with precision = 18:
```
F128p18/cadd (~1e4) 1.00 2.8±0.00ns
F128p18/from_decimal(12345, -3) 1.00 9.1±0.03ns
F128p18/next_power_of_ten 1.00 6.3±0.03ns
F128p18/rdiv (~1e5/~1e4, Ceil) 1.00 157.3±0.51ns
F128p18/rdiv (~1e5/~1e4, Floor) 1.00 154.2±1.19ns
F128p18/rdiv (~1e5/~1e4, Nearest) 1.00 159.4±1.05ns
F128p18/rmul (~1e4, Ceil) 1.00 132.5±0.61ns
F128p18/rmul (~1e4, Floor) 1.00 132.3±0.79ns
F128p18/rmul (~1e4, Nearest) 1.00 134.1±0.79ns
F128p18/rsqrt (~1e4, Ceil) 1.00 428.3±7.08ns
F128p18/rsqrt (~1e4, Floor) 1.00 403.9±1.24ns
F128p18/rsqrt (~1e4, Nearest) 1.00 475.3±1.03ns
F128p18/rsqrt (adaptive, Ceil) 1.00 1469.3±3.05ns
F128p18/rsqrt (adaptive, Floor) 1.00 1436.2±1.98ns
F128p18/rsqrt (adaptive, Nearest) 1.00 1530.6±1.97ns
F128p18/rsqrt (MAX, Ceil) 1.00 1393.2±9.68ns
F128p18/rsqrt (MAX, Floor) 1.00 1335.9±10.01ns
F128p18/rsqrt (MAX, Nearest) 1.00 1441.7±11.63ns
F128p18/to_decimal(0) (12.345) 1.00 263.8±25.35ns
F128p18/to_decimal(i32::MAX) (12.345) 1.00 263.2±0.13ns
F128p18/try_from(f64) (~0.1) 1.00 59.3±0.36ns
F128p18/try_from(f64) (~1e-12) 1.00 133.0±0.14ns
F128p18/try_from(f64) (~1e6) 1.00 27.8±0.25ns
F128p18/try_from(f64) (MAX) 1.00 5.9±0.00µs
F128p18/try_from(f64) (MIN_POSITIVE) 1.00 1842.6±1.86ns
F128p18/cadd (~1e4) 1.9±0.05ns
F128p18/from_decimal(12345, -3) 4.8±0.02ns
F128p18/next_power_of_ten 3.1±0.04ns
F128p18/rdiv (~1e5/~1e4, Ceil) 10.7±0.15ns
F128p18/rdiv (~1e5/~1e4, Floor) 10.4±0.15ns
F128p18/rdiv (~1e5/~1e4, Nearest) 11.2±0.16ns
F128p18/rmul (~1e4, Ceil) 7.0±0.04ns
F128p18/rmul (~1e4, Floor) 7.0±0.02ns
F128p18/rmul (~1e4, Nearest) 7.2±0.06ns
F128p18/rsqrt (~1e4, Ceil) 40.0±0.24ns
F128p18/rsqrt (~1e4, Floor) 39.4±0.28ns
F128p18/rsqrt (~1e4, Nearest) 41.2±0.28ns
F128p18/rsqrt (adaptive, Ceil) 50.0±0.42ns
F128p18/rsqrt (adaptive, Floor) 49.2±0.42ns
F128p18/rsqrt (adaptive, Nearest) 50.6±0.38ns
F128p18/rsqrt (MAX, Ceil) 40.2±0.28ns
F128p18/rsqrt (MAX, Floor) 39.3±0.27ns
F128p18/rsqrt (MAX, Nearest) 41.4±0.38ns
F128p18/to_decimal(0) (12.345) 59.1±0.19ns
F128p18/to_decimal(i32::MAX) (12.345) 59.1±0.28ns
F128p18/try_from(f64) (~0.1) 28.5±1.51ns
F128p18/try_from(f64) (~1e-12) 62.1±0.20ns
F128p18/try_from(f64) (~1e6) 15.2±0.04ns
F128p18/try_from(f64) (MAX) 1264.6±4.34ns
F128p18/try_from(f64) (MIN_POSITIVE) 693.6±2.45ns
```

## serde
64-bit FP with precision = 9:
```
F64p9/deserialize 123.456 from f64 1.00 103.7±0.24ns
F64p9/deserialize 123.456 from string 1.00 54.8±0.18ns
F64p9/deserialize MAX from f64 1.00 59.8±0.24ns
F64p9/deserialize MAX from string 1.00 86.3±0.79ns
F64p9/serialize 123.456 to f64 1.00 48.2±0.46ns
F64p9/serialize 123.456 to string 1.00 27.5±0.29ns
F64p9/serialize MAX to f64 1.00 41.3±0.95ns
F64p9/serialize MAX to string 1.00 35.3±2.63ns
F64p9/deserialize 123.456 from f64 55.4±0.17ns
F64p9/deserialize 123.456 from string 27.1±0.34ns
F64p9/deserialize MAX from f64 44.4±0.03ns
F64p9/deserialize MAX from string 39.3±0.61ns
F64p9/serialize 123.456 to f64 27.0±0.33ns
F64p9/serialize 123.456 to string 13.1±0.21ns
F64p9/serialize MAX to f64 38.6±0.01ns
F64p9/serialize MAX to string 14.8±0.19ns
```

128-bit FP with precision = 18:
```
F128p18/deserialize 123.456 from f64 1.00 103.3±0.24ns
F128p18/deserialize 123.456 from string 1.00 70.8±0.09ns
F128p18/deserialize MAX from f64 1.00 56.6±0.19ns
F128p18/deserialize MAX from string 1.00 147.3±0.51ns
F128p18/serialize 123.456 to f64 1.00 67.7±0.38ns
F128p18/serialize 123.456 to string 1.00 51.7±0.64ns
F128p18/serialize MAX to f64 1.00 63.6±0.74ns
F128p18/serialize MAX to string 1.00 80.6±1.00ns
F128p18/deserialize 123.456 from f64 55.9±0.07ns
F128p18/deserialize 123.456 from string 31.5±0.74ns
F128p18/deserialize MAX from f64 40.8±0.20ns
F128p18/deserialize MAX from string 60.1±0.75ns
F128p18/serialize 123.456 to f64 30.4±0.15ns
F128p18/serialize 123.456 to string 23.6±0.29ns
F128p18/serialize MAX to f64 23.4±0.02ns
F128p18/serialize MAX to string 37.3±0.04ns
```
Loading

0 comments on commit cbc690c

Please sign in to comment.