How many blocks or threads per block are used by the cuQuantum API? #20
-
As I understood, the underhood of the cuStateVec API is kernel functions. But all the cuQuantum APIs cannot set the blocks and the threads_per_block parameters, just like the kernel function <<<blocks_count, threads_per_block>>>. How could I know the exactly count of threads that are used by the cuQuantum API. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 6 replies
-
As in any CUDA program (and in particular CUDA Libraries) the grid/block/shmem sizes are impacted by many many factors, e.g. algorithm, implementation, hardware, driver, ... Plus, a single API call might have multiple kernels invoked, so there's no way to answer this question. Finally, even if you have this information, I don't think there's much you can do with it. If you're interested, one way to check is to run your workload under nsys (part of the Nsight Profiler), and then open the generated file in the Nsight visualizer. In the GPU timeline you can inspect the kernel configurations. |
Beta Was this translation helpful? Give feedback.
As in any CUDA program (and in particular CUDA Libraries) the grid/block/shmem sizes are impacted by many many factors, e.g. algorithm, implementation, hardware, driver, ... Plus, a single API call might have multiple kernels invoked, so there's no way to answer this question. Finally, even if you have this information, I don't think there's much you can do with it.
If you're interested, one way to check is to run your workload under nsys (part of the Nsight Profiler), and then open the generated file in the Nsight visualizer. In the GPU timeline you can inspect the kernel configurations.