Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: improve rmul/rdiv/rsqrt operations #42

Merged
merged 4 commits into from
Dec 30, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 7 additions & 6 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@ jobs:
- run: cargo clippy --version
- run: cargo clippy --features i64
- run: cargo clippy --all-targets --features i64
- run: cargo clippy --all-targets --features i64,i128
- run: cargo clippy --all-targets --all-features

test:
Expand All @@ -44,12 +45,12 @@ jobs:
- run: rustc --version
- run: cargo test --features i64
- run: cargo test --features i128
- run: cargo test --no-default-features --lib --features i64
- run: cargo test --no-default-features --lib --features i128
- run: cargo test --no-default-features --lib --features std,i64
- run: cargo test --no-default-features --lib --features serde,i64
- run: cargo test --no-default-features --lib --features i64,parity
- run: cargo test --no-default-features --lib --features i128,parity
- run: cargo test --no-default-features --lib --test it --features i64
- run: cargo test --no-default-features --lib --test it --features i128
- run: cargo test --no-default-features --lib --test it --features std,i64
- run: cargo test --no-default-features --lib --test it --features serde,i64
- run: cargo test --no-default-features --lib --test it --features i64,parity
- run: cargo test --no-default-features --lib --test it --features i128,parity
- run: cargo test --all-features

run-example:
Expand Down
6 changes: 3 additions & 3 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ std = ["derive_more/error"]
i16 = []
i32 = []
i64 = []
i128 = []
i128 = ["dep:i256"]
serde = ["dep:serde"]
schemars = ["dep:schemars"]
parity = ["parity-scale-codec"]
Expand All @@ -46,16 +46,16 @@ quick-xml = ["serde?/derive", "serde?/alloc"] # FIXME: quick-xml#473
serde = { version = "1.0", default-features = false, optional = true }
schemars = { version = "0.8", default-features = false, optional = true }
typenum = "1.12.0"
derive_more = { version = "0.99.9", default-features = false }
parity-scale-codec = { version = "3", default-features = false, optional = true }
static_assertions = "1.1.0"
itoa = "1.0.1"
i256 = { version = "=0.1.1", default-features = false, optional = true }

[dev-dependencies]
anyhow = { version = "1.0.38", default-features = false }
colored = "2.0.0"
criterion = "0.5"
derive_more = "0.99.9"
derive_more = { version = "1.0.0", features = ["full"] }
trybuild = "1.0.85"
serde_json = "1"
proptest = "1.0.0"
Expand Down
136 changes: 68 additions & 68 deletions benches/README.md
Original file line number Diff line number Diff line change
@@ -1,92 +1,92 @@
# Benchmarks

Benchmarks were performed on an [AMD Ryzen 7 4800HS CPU](https://en.wikichip.org/wiki/amd/ryzen_9/3900).
Benchmarks were performed on an Intel Core i9-14900K CPU.

```sh
$ cargo bench --bench <name> --features <int>
$ critcmp new | tail +3 | sort | sed 's# ? ?/sec##'
$ critcmp new | tail +3 | sort | sed 's# ? ?/sec##' | sed 's# 1.00##'
```

## ops
64-bit FP with precision = 9:
```
F64p9/cadd (~1e4) 1.00 1.9±0.01ns
F64p9/from_decimal(12345, -3) 1.00 1.6±0.00ns
F64p9/next_power_of_ten 1.00 3.6±0.01ns
F64p9/rdiv (~1e5/~1e4, Ceil) 1.00 1.9±0.01ns
F64p9/rdiv (~1e5/~1e4, Floor) 1.00 1.9±0.01ns
F64p9/rdiv (~1e5/~1e4, Nearest) 1.00 1.9±0.00ns
F64p9/rmul (~1e4, Ceil) 1.00 1.9±0.01ns
F64p9/rmul (~1e4, Floor) 1.00 1.9±0.03ns
F64p9/rmul (~1e4, Nearest) 1.00 1.9±0.00ns
F64p9/rsqrt (~1e4, Ceil) 1.00 43.7±0.29ns
F64p9/rsqrt (~1e4, Floor) 1.00 42.5±0.17ns
F64p9/rsqrt (~1e4, Nearest) 1.00 47.0±0.19ns
F64p9/rsqrt (adaptive, Ceil) 1.00 98.0±0.33ns
F64p9/rsqrt (adaptive, Floor) 1.00 94.4±1.45ns
F64p9/rsqrt (adaptive, Nearest) 1.00 99.6±0.67ns
F64p9/rsqrt (MAX, Ceil) 1.00 102.3±0.50ns
F64p9/rsqrt (MAX, Floor) 1.00 100.2±0.50ns
F64p9/rsqrt (MAX, Nearest) 1.00 102.7±0.80ns
F64p9/to_decimal(0) (12.345) 1.00 9.1±0.02ns
F64p9/to_decimal(i32::MAX) (12.345) 1.00 9.1±0.01ns
F64p9/try_from(f64) (~0.1) 1.00 64.8±0.33ns
F64p9/try_from(f64) (~1e-12) 1.00 132.5±0.46ns
F64p9/try_from(f64) (~1e6) 1.00 24.9±0.14ns
F64p9/try_from(f64) (MAX) 1.00 5.9±0.01µs
F64p9/try_from(f64) (MIN_POSITIVE) 1.00 1872.9±4.12ns
F64p9/cadd (~1e4) 1.0±0.03ns
F64p9/from_decimal(12345, -3) 1.0±0.01ns
F64p9/next_power_of_ten 1.6±0.03ns
F64p9/rdiv (~1e5/~1e4, Ceil) 1.0±0.03ns
F64p9/rdiv (~1e5/~1e4, Floor) 1.0±0.04ns
F64p9/rdiv (~1e5/~1e4, Nearest) 1.0±0.04ns
F64p9/rmul (~1e4, Ceil) 1.0±0.03ns
F64p9/rmul (~1e4, Floor) 1.0±0.04ns
F64p9/rmul (~1e4, Nearest) 1.0±0.05ns
F64p9/rsqrt (~1e4, Ceil) 1.0±0.02ns
F64p9/rsqrt (~1e4, Floor) 1.0±0.02ns
F64p9/rsqrt (~1e4, Nearest) 1.0±0.03ns
F64p9/rsqrt (adaptive, Ceil) 5.4±0.02ns
F64p9/rsqrt (adaptive, Floor) 4.9±0.01ns
F64p9/rsqrt (adaptive, Nearest) 5.5±0.02ns
F64p9/rsqrt (MAX, Ceil) 1.0±0.01ns
F64p9/rsqrt (MAX, Floor) 1.0±0.01ns
F64p9/rsqrt (MAX, Nearest) 1.0±0.01ns
F64p9/to_decimal(0) (12.345) 5.0±0.01ns
F64p9/to_decimal(i32::MAX) (12.345) 5.0±0.02ns
F64p9/try_from(f64) (~0.1) 33.2±0.08ns
F64p9/try_from(f64) (~1e-12) 61.9±0.20ns
F64p9/try_from(f64) (~1e6) 16.2±0.05ns
F64p9/try_from(f64) (MAX) 1263.8±2.26ns
F64p9/try_from(f64) (MIN_POSITIVE) 693.4±2.38ns
```

128-bit FP with precision = 18:
```
F128p18/cadd (~1e4) 1.00 2.8±0.00ns
F128p18/from_decimal(12345, -3) 1.00 9.1±0.03ns
F128p18/next_power_of_ten 1.00 6.3±0.03ns
F128p18/rdiv (~1e5/~1e4, Ceil) 1.00 157.3±0.51ns
F128p18/rdiv (~1e5/~1e4, Floor) 1.00 154.2±1.19ns
F128p18/rdiv (~1e5/~1e4, Nearest) 1.00 159.4±1.05ns
F128p18/rmul (~1e4, Ceil) 1.00 132.5±0.61ns
F128p18/rmul (~1e4, Floor) 1.00 132.3±0.79ns
F128p18/rmul (~1e4, Nearest) 1.00 134.1±0.79ns
F128p18/rsqrt (~1e4, Ceil) 1.00 428.3±7.08ns
F128p18/rsqrt (~1e4, Floor) 1.00 403.9±1.24ns
F128p18/rsqrt (~1e4, Nearest) 1.00 475.3±1.03ns
F128p18/rsqrt (adaptive, Ceil) 1.00 1469.3±3.05ns
F128p18/rsqrt (adaptive, Floor) 1.00 1436.2±1.98ns
F128p18/rsqrt (adaptive, Nearest) 1.00 1530.6±1.97ns
F128p18/rsqrt (MAX, Ceil) 1.00 1393.2±9.68ns
F128p18/rsqrt (MAX, Floor) 1.00 1335.9±10.01ns
F128p18/rsqrt (MAX, Nearest) 1.00 1441.7±11.63ns
F128p18/to_decimal(0) (12.345) 1.00 263.8±25.35ns
F128p18/to_decimal(i32::MAX) (12.345) 1.00 263.2±0.13ns
F128p18/try_from(f64) (~0.1) 1.00 59.3±0.36ns
F128p18/try_from(f64) (~1e-12) 1.00 133.0±0.14ns
F128p18/try_from(f64) (~1e6) 1.00 27.8±0.25ns
F128p18/try_from(f64) (MAX) 1.00 5.9±0.00µs
F128p18/try_from(f64) (MIN_POSITIVE) 1.00 1842.6±1.86ns
F128p18/cadd (~1e4) 1.9±0.05ns
F128p18/from_decimal(12345, -3) 4.8±0.02ns
F128p18/next_power_of_ten 3.1±0.04ns
F128p18/rdiv (~1e5/~1e4, Ceil) 10.7±0.15ns
F128p18/rdiv (~1e5/~1e4, Floor) 10.4±0.15ns
F128p18/rdiv (~1e5/~1e4, Nearest) 11.2±0.16ns
F128p18/rmul (~1e4, Ceil) 7.0±0.04ns
F128p18/rmul (~1e4, Floor) 7.0±0.02ns
F128p18/rmul (~1e4, Nearest) 7.2±0.06ns
F128p18/rsqrt (~1e4, Ceil) 40.0±0.24ns
F128p18/rsqrt (~1e4, Floor) 39.4±0.28ns
F128p18/rsqrt (~1e4, Nearest) 41.2±0.28ns
F128p18/rsqrt (adaptive, Ceil) 50.0±0.42ns
F128p18/rsqrt (adaptive, Floor) 49.2±0.42ns
F128p18/rsqrt (adaptive, Nearest) 50.6±0.38ns
F128p18/rsqrt (MAX, Ceil) 40.2±0.28ns
F128p18/rsqrt (MAX, Floor) 39.3±0.27ns
F128p18/rsqrt (MAX, Nearest) 41.4±0.38ns
F128p18/to_decimal(0) (12.345) 59.1±0.19ns
F128p18/to_decimal(i32::MAX) (12.345) 59.1±0.28ns
F128p18/try_from(f64) (~0.1) 28.5±1.51ns
F128p18/try_from(f64) (~1e-12) 62.1±0.20ns
F128p18/try_from(f64) (~1e6) 15.2±0.04ns
F128p18/try_from(f64) (MAX) 1264.6±4.34ns
F128p18/try_from(f64) (MIN_POSITIVE) 693.6±2.45ns
```

## serde
64-bit FP with precision = 9:
```
F64p9/deserialize 123.456 from f64 1.00 103.7±0.24ns
F64p9/deserialize 123.456 from string 1.00 54.8±0.18ns
F64p9/deserialize MAX from f64 1.00 59.8±0.24ns
F64p9/deserialize MAX from string 1.00 86.3±0.79ns
F64p9/serialize 123.456 to f64 1.00 48.2±0.46ns
F64p9/serialize 123.456 to string 1.00 27.5±0.29ns
F64p9/serialize MAX to f64 1.00 41.3±0.95ns
F64p9/serialize MAX to string 1.00 35.3±2.63ns
F64p9/deserialize 123.456 from f64 55.4±0.17ns
F64p9/deserialize 123.456 from string 27.1±0.34ns
F64p9/deserialize MAX from f64 44.4±0.03ns
F64p9/deserialize MAX from string 39.3±0.61ns
F64p9/serialize 123.456 to f64 27.0±0.33ns
F64p9/serialize 123.456 to string 13.1±0.21ns
F64p9/serialize MAX to f64 38.6±0.01ns
F64p9/serialize MAX to string 14.8±0.19ns
```

128-bit FP with precision = 18:
```
F128p18/deserialize 123.456 from f64 1.00 103.3±0.24ns
F128p18/deserialize 123.456 from string 1.00 70.8±0.09ns
F128p18/deserialize MAX from f64 1.00 56.6±0.19ns
F128p18/deserialize MAX from string 1.00 147.3±0.51ns
F128p18/serialize 123.456 to f64 1.00 67.7±0.38ns
F128p18/serialize 123.456 to string 1.00 51.7±0.64ns
F128p18/serialize MAX to f64 1.00 63.6±0.74ns
F128p18/serialize MAX to string 1.00 80.6±1.00ns
F128p18/deserialize 123.456 from f64 55.9±0.07ns
F128p18/deserialize 123.456 from string 31.5±0.74ns
F128p18/deserialize MAX from f64 40.8±0.20ns
F128p18/deserialize MAX from string 60.1±0.75ns
F128p18/serialize 123.456 to f64 30.4±0.15ns
F128p18/serialize 123.456 to string 23.6±0.29ns
F128p18/serialize MAX to f64 23.4±0.02ns
F128p18/serialize MAX to string 37.3±0.04ns
```
12 changes: 9 additions & 3 deletions src/errors.rs
Original file line number Diff line number Diff line change
@@ -1,10 +1,11 @@
use core::fmt::{Display, Formatter, Result};

// TODO: once MSRV becomes 1.81, use `core::error::Error` instead.
// Also, enable doctests in CI checks even for no-std.
#[cfg(feature = "std")]
use derive_more::Error;
use std::error::Error;

/// Represents errors during arithmetic operations.
#[cfg_attr(feature = "std", derive(Error))]
#[derive(Clone, Debug, PartialEq, Eq)]
#[non_exhaustive]
pub enum ArithmeticError {
Expand Down Expand Up @@ -34,8 +35,10 @@ impl Display for ArithmeticError {
}
}

#[cfg(feature = "std")]
impl Error for ArithmeticError {}

/// Represents errors during conversions.
#[cfg_attr(feature = "std", derive(Error))]
#[derive(Clone, Debug, PartialEq, Eq)]
pub struct ConvertError {
reason: &'static str,
Expand All @@ -57,3 +60,6 @@ impl Display for ConvertError {
f.write_str(self.as_str())
}
}

#[cfg(feature = "std")]
impl Error for ConvertError {}
Loading
Loading