Confusion about `block_tile_j` in cudaTensorCoreGemm.cu #288

Peter-Cao89 · 2024-07-27T04:55:48Z

I have a little confusion about the calculation formula for block_tile_j in cudaTensorCoreGemm.cu at line 230:
const unsigned int block_tile_j = (block_pos * BLOCK_COL_TILES) % N_TILES;

Theoretically, block_tile_i and block_tile_j are the row index and column index of tiles from matrix C or D, respectively.
Now that block_tile_i equals to block position(block_pos) multiply logical tile numbers per row in per thread block(BLOCK_ROW_TILES), then divide total tile numbers along N direction(N_TILES) and multiply logical tile numbers per column(BLOCK_COL_TILES) , i.e.
const unsigned int block_tile_i = ((block_pos * BLOCK_ROW_TILES) / N_TILES) * (BLOCK_COL_TILES);.

Therefore why block_tile_j is equal to (block_pos * BLOCK_COL_TILES) % N_TILES, not (block_pos * BLOCK_ROW_TILES) % N_TILES?

Look forward someone can resolve my confusion, Thanks.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Confusion about `block_tile_j` in cudaTensorCoreGemm.cu #288

Confusion about `block_tile_j` in cudaTensorCoreGemm.cu #288

Peter-Cao89 commented Jul 27, 2024 •

edited

Loading

Confusion about block_tile_j in cudaTensorCoreGemm.cu #288

Confusion about block_tile_j in cudaTensorCoreGemm.cu #288

Comments

Peter-Cao89 commented Jul 27, 2024 • edited Loading

Confusion about `block_tile_j` in cudaTensorCoreGemm.cu #288

Confusion about `block_tile_j` in cudaTensorCoreGemm.cu #288

Peter-Cao89 commented Jul 27, 2024 •

edited

Loading