Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

manual struct target feature #292

Draft
wants to merge 3 commits into
base: inflate-fast-help-target-feature
Choose a base branch
from

Conversation

folkertdev
Copy link
Collaborator

this essentially inlines all avx2 codepaths.

this basically gives the same performance as -Ctarget-cpu=native on my machine.

but it's not the prettiest thing. I'm open to suggestions.

Copy link

codecov bot commented Jan 30, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Files with missing lines Coverage Δ
zlib-rs/src/cpu_features.rs 59.64% <ø> (ø)
zlib-rs/src/inflate.rs 91.17% <100.00%> (+0.04%) ⬆️
zlib-rs/src/inflate/writer.rs 89.17% <100.00%> (+1.23%) ⬆️

... and 1 file with indirect coverage changes

@folkertdev
Copy link
Collaborator Author

I tried specializing for avx512, but the target_feature(enable = "avx512f") etc are not stable, so the avx512 code would just use SSE loads, and that turns out to be worse. So, avx2 is the best we can do right now (until we start looking at avx512 seriously).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant