Skip to content

Commit

Permalink
[ROCm] fix test_softmax_forward_64bit_indexing_cuda OOM (pytorch#113093)
Browse files Browse the repository at this point in the history
TestNNDeviceTypeCUDA.test_softmax_forward_64bit_indexing_cuda started failing for ROCm after pytorch#112096 with the message

torch.cuda.OutOfMemoryError: HIP out of memory. Tried to allocate 13.35 GiB. GPU 0 has a total capacity of 31.98 GiB of which 3.89 GiB is free. Of the allocated memory 26.69 GiB is allocated by PyTorch, and 18.91 MiB is reserved by PyTorch but unallocated.

This amounts to approximately 41GB. The test is currently decorated with `largeTensorTest("30GB", "cuda")` but this is not sufficient for ROCm.

Pull Request resolved: pytorch#113093
Approved by: https://github.com/malfet
  • Loading branch information
jeffdaily authored and pytorchmergebot committed Nov 7, 2023
1 parent 8768b87 commit 4c04ae2
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion test/test_nn.py
Original file line number Diff line number Diff line change
Expand Up @@ -12754,7 +12754,7 @@ def compare_scaling(grads):

# reference issue: https://github.com/pytorch/pytorch/issues/111484
@onlyCUDA
@largeTensorTest("30GB", "cuda")
@largeTensorTest("41GB" if TEST_WITH_ROCM else "30GB", "cuda")
def test_softmax_forward_64bit_indexing(self, device):
batch_size = 70
seq_len = 2048
Expand Down

0 comments on commit 4c04ae2

Please sign in to comment.