Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AssertionError: Errors of the kernel fp8_gemm in the profiling table #9

Open
vlluvia opened this issue Feb 26, 2025 · 3 comments
Open

Comments

@vlluvia
Copy link

vlluvia commented Feb 26, 2025

Python 3.12.3
CUDA 12.6
pytorch 2.6
CUTLASS 3.8

@LyricZhao
Copy link
Collaborator

This means the bench_kineto function does not detect any kernel running.

For all bench_kineto function call, you can set suppress_kineto_output=False, which may print more error information and share to us.

@vlluvia
Copy link
Author

vlluvia commented Feb 26, 2025

Library path:

['/root/DeepGEMM/deep_gemm']

Testing GEMM:
WARNING:2025-02-26 09:43:38 68350:68350 init.cpp:178] function cbapi->getCuptiStatus() failed with error CUPTI_ERROR_NOT_INITIALIZED (15)
WARNING:2025-02-26 09:43:38 68350:68350 init.cpp:179] CUPTI initialization failed - CUDA profiler activities will be missing
INFO:2025-02-26 09:43:38 68350:68350 init.cpp:181] If you see CUPTI_ERROR_INSUFFICIENT_PRIVILEGES, refer to https://developer.nvidia.com/nvidia-development-tools-solutions-err-nvgpuctrperm-cupti
Traceback (most recent call last):
File "/root/DeepGEMM/tests/test_core.py", line 156, in
test_gemm()
File "/root/DeepGEMM/tests/test_core.py", line 80, in test_gemm
t = bench_kineto(test_func, 'fp8_gemm', suppress_kineto_output=False)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/DeepGEMM/deep_gemm/utils.py", line 119, in bench_kineto
assert sum([name in line for line in prof_lines]) == 1, f'Errors of the kernel {name} in the profiling table'
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: Errors of the kernel fp8_gemm in the profiling table

@LyricZhao
Copy link
Collaborator

Seems you don't have sufficient privilege for CUPTI profiling, which is important for microsecond-level accurate timing.

Try to follow the PyTorch Kineto profile warning information to solve the issue. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants