Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training Mamba model with the same number of parameters is way slower than its CNN counter part. #678

Open
pooya-mohammadi opened this issue Jan 24, 2025 · 0 comments

Comments

@pooya-mohammadi
Copy link

I train a CNN model with say 40,000,000 parameters it's way faster than a mamba model with 721,079 params.
Is it usual or something wrong on my side?
I tried it on a desktop 4090 and a server with A100. The output is the same for both of them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant