Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Confusion about block_tile_j in cudaTensorCoreGemm.cu #288

Open
Peter-Cao89 opened this issue Jul 27, 2024 · 0 comments
Open

Confusion about block_tile_j in cudaTensorCoreGemm.cu #288

Peter-Cao89 opened this issue Jul 27, 2024 · 0 comments

Comments

@Peter-Cao89
Copy link

Peter-Cao89 commented Jul 27, 2024

I have a little confusion about the calculation formula for block_tile_j in cudaTensorCoreGemm.cu at line 230:
const unsigned int block_tile_j = (block_pos * BLOCK_COL_TILES) % N_TILES;

Theoretically, block_tile_i and block_tile_j are the row index and column index of tiles from matrix C or D, respectively.
Now that block_tile_i equals to block position(block_pos) multiply logical tile numbers per row in per thread block(BLOCK_ROW_TILES), then divide total tile numbers along N direction(N_TILES) and multiply logical tile numbers per column(BLOCK_COL_TILES) , i.e.
const unsigned int block_tile_i = ((block_pos * BLOCK_ROW_TILES) / N_TILES) * (BLOCK_COL_TILES);.

Therefore why block_tile_j is equal to (block_pos * BLOCK_COL_TILES) % N_TILES, not (block_pos * BLOCK_ROW_TILES) % N_TILES?

Look forward someone can resolve my confusion, Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant