You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have a little confusion about the calculation formula for block_tile_j in cudaTensorCoreGemm.cu at line 230: const unsigned int block_tile_j = (block_pos * BLOCK_COL_TILES) % N_TILES;
Theoretically, block_tile_i and block_tile_j are the row index and column index of tiles from matrix C or D, respectively.
Now that block_tile_i equals to block position(block_pos) multiply logical tile numbers per row in per thread block(BLOCK_ROW_TILES), then divide total tile numbers along N direction(N_TILES) and multiply logical tile numbers per column(BLOCK_COL_TILES) , i.e. const unsigned int block_tile_i = ((block_pos * BLOCK_ROW_TILES) / N_TILES) * (BLOCK_COL_TILES);.
Therefore why block_tile_j is equal to (block_pos * BLOCK_COL_TILES) % N_TILES, not (block_pos * BLOCK_ROW_TILES) % N_TILES?
Look forward someone can resolve my confusion, Thanks.
The text was updated successfully, but these errors were encountered:
I have a little confusion about the calculation formula for
block_tile_j
incudaTensorCoreGemm.cu
at line 230:const unsigned int block_tile_j = (block_pos * BLOCK_COL_TILES) % N_TILES;
Theoretically, block_tile_i and block_tile_j are the row index and column index of tiles from matrix C or D, respectively.
Now that
block_tile_i
equals to block position(block_pos
) multiply logical tile numbers per row in per thread block(BLOCK_ROW_TILES
), then divide total tile numbers along N direction(N_TILES
) and multiply logical tile numbers per column(BLOCK_COL_TILES
) , i.e.const unsigned int block_tile_i = ((block_pos * BLOCK_ROW_TILES) / N_TILES) * (BLOCK_COL_TILES);
.Therefore why
block_tile_j
is equal to(block_pos * BLOCK_COL_TILES) % N_TILES
, not(block_pos * BLOCK_ROW_TILES) % N_TILES
?Look forward someone can resolve my confusion, Thanks.
The text was updated successfully, but these errors were encountered: