Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extremely poor performance on AES256Gcm with anything but opt-level=3 #581

Closed
0xngold opened this issue Feb 23, 2024 · 2 comments
Closed

Comments

@0xngold
Copy link

0xngold commented Feb 23, 2024

Problem description

It seems like AES256Gcm's performance is extremely low in debug builds; specifically, if anything except opt-level =3 is used. The benchmarks below show ~14s to encrypt a 64MB plaintext block when the bench profile is overridden to opt-level = 0.

With opt-level = 0 set:

$ cargo bench -p crypto --bench encrypt_decrypt

Benchmarking encrypt_64mb_direct/encrypt_64MB: Warming up for 3.0000 s
Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 147.6s.
encrypt_64mb_direct/encrypt_64MB                                                                          
                        time:   [14.733 s 14.773 s 14.817 s]

With opt-level = 3 set:

encrypt_64mb_direct/encrypt_64MB                                                                           
                        time:   [61.542 ms 61.875 ms 62.530 ms]
                        change: [-99.584% -99.581% -99.579%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 10 measurements (20.00%)
  2 (20.00%) high severe

Analysis

vTune profiles suggested that AES-NI was being used in the slow case, so it's probably not a software implementation fallback issue [1]. The stacks are mostly unintelligible though so I can't say for sure (the crates have a ton of metaprogramming). I haven't attempted a godbolt analysis yet, because it seems like I must be doing something trivial very wrong for performance to be this bad.

Is this just expected in debug / non-opt builds? Is there a known reason for this behavior?

[1] It seems like older versions of the cipher crate used to require special RUSTFLAGS to use AES-NI, but newer versions of the crates (I'm using 0.4.4) auto-detect AES-NI presence.

Benchmark code

use aead::stream::EncryptorLE31;
use aead::AeadCore;
use aead::KeyInit;
use aead::OsRng;
use aes_gcm::Aes256Gcm;
use criterion::criterion_group;
use criterion::criterion_main;
use criterion::Criterion;

pub fn encrypt_64mb_direct(c: &mut Criterion) {
    let aes256_nonce = Aes256Gcm::generate_nonce(OsRng);
    let streaming_nonce = Aes256StreamNonce::from_slice(&aes256_nonce.as_slice()[0..8]);
    let key = Aes256Gcm::generate_key(OsRng);
    let aes = Aes256Gcm::new(&key);
    let mut encryptor = EncryptorLE31::from_aead(aes, streaming_nonce);
    let mut plaintext = vec![0u8; 1024 * 1024 * 64];

    // Set a low sample size because 64MB chunks take multiple *seconds* if optimizations are off.
    let mut bench_group = c.benchmark_group("encrypt_64mb_direct");
    bench_group.sample_size(10);

    bench_group.bench_function("encrypt_64MB", |b| {
        b.iter(|| {
            encryptor
                .encrypt_next_in_place(&[], &mut plaintext)
                .unwrap();
        });
    });
}

criterion_group!(benches, encrypt_64mb_direct);
criterion_main!(benches);

Cargo.toml adjustments for opt-level

[profile.bench]
opt-level = 0
@newpavlov
Copy link
Member

Yes, it's an expected behavior. Our code relies heavily on inlining across crates for optimal performance. For example, without inlining all intrinsics will be separate function calls instead of being inlined and compiled as just one instruction.

@newpavlov newpavlov closed this as not planned Won't fix, can't repro, duplicate, stale Feb 24, 2024
@0xngold
Copy link
Author

0xngold commented Feb 26, 2024

Ah okay. Looking at the call stacks that makes sense. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants