You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It seems like AES256Gcm's performance is extremely low in debug builds; specifically, if anything except opt-level =3 is used. The benchmarks below show ~14s to encrypt a 64MB plaintext block when the bench profile is overridden to opt-level = 0.
With opt-level = 0 set:
$ cargo bench -p crypto --bench encrypt_decrypt
Benchmarking encrypt_64mb_direct/encrypt_64MB: Warming up for 3.0000 s
Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 147.6s.
encrypt_64mb_direct/encrypt_64MB
time: [14.733 s 14.773 s 14.817 s]
With opt-level = 3 set:
encrypt_64mb_direct/encrypt_64MB
time: [61.542 ms 61.875 ms 62.530 ms]
change: [-99.584% -99.581% -99.579%] (p = 0.00 < 0.05)
Performance has improved.
Found 2 outliers among 10 measurements (20.00%)
2 (20.00%) high severe
Analysis
vTune profiles suggested that AES-NI was being used in the slow case, so it's probably not a software implementation fallback issue [1]. The stacks are mostly unintelligible though so I can't say for sure (the crates have a ton of metaprogramming). I haven't attempted a godbolt analysis yet, because it seems like I must be doing something trivial very wrong for performance to be this bad.
Is this just expected in debug / non-opt builds? Is there a known reason for this behavior?
[1] It seems like older versions of the cipher crate used to require special RUSTFLAGS to use AES-NI, but newer versions of the crates (I'm using 0.4.4) auto-detect AES-NI presence.
Benchmark code
use aead::stream::EncryptorLE31;
use aead::AeadCore;
use aead::KeyInit;
use aead::OsRng;
use aes_gcm::Aes256Gcm;
use criterion::criterion_group;
use criterion::criterion_main;
use criterion::Criterion;
pub fn encrypt_64mb_direct(c: &mut Criterion) {
let aes256_nonce = Aes256Gcm::generate_nonce(OsRng);
let streaming_nonce = Aes256StreamNonce::from_slice(&aes256_nonce.as_slice()[0..8]);
let key = Aes256Gcm::generate_key(OsRng);
let aes = Aes256Gcm::new(&key);
let mut encryptor = EncryptorLE31::from_aead(aes, streaming_nonce);
let mut plaintext = vec![0u8; 1024 * 1024 * 64];
// Set a low sample size because 64MB chunks take multiple *seconds* if optimizations are off.
let mut bench_group = c.benchmark_group("encrypt_64mb_direct");
bench_group.sample_size(10);
bench_group.bench_function("encrypt_64MB", |b| {
b.iter(|| {
encryptor
.encrypt_next_in_place(&[], &mut plaintext)
.unwrap();
});
});
}
criterion_group!(benches, encrypt_64mb_direct);
criterion_main!(benches);
Cargo.toml adjustments for opt-level
[profile.bench]
opt-level = 0
The text was updated successfully, but these errors were encountered:
Yes, it's an expected behavior. Our code relies heavily on inlining across crates for optimal performance. For example, without inlining all intrinsics will be separate function calls instead of being inlined and compiled as just one instruction.
Problem description
It seems like AES256Gcm's performance is extremely low in debug builds; specifically, if anything except
opt-level =3
is used. The benchmarks below show ~14s to encrypt a 64MB plaintext block when the bench profile is overridden toopt-level = 0
.With
opt-level = 0
set:With
opt-level = 3
set:Analysis
vTune profiles suggested that AES-NI was being used in the slow case, so it's probably not a software implementation fallback issue [1]. The stacks are mostly unintelligible though so I can't say for sure (the crates have a ton of metaprogramming). I haven't attempted a godbolt analysis yet, because it seems like I must be doing something trivial very wrong for performance to be this bad.
Is this just expected in debug / non-opt builds? Is there a known reason for this behavior?
[1] It seems like older versions of the
cipher
crate used to require special RUSTFLAGS to use AES-NI, but newer versions of the crates (I'm using 0.4.4) auto-detect AES-NI presence.Benchmark code
Cargo.toml adjustments for
opt-level
The text was updated successfully, but these errors were encountered: