[tensor wrapper subclass] Add support for torchao.float8 mlp #1585

crcrpar · 2024-12-23T17:14:58Z

What does this PR do?

Multiple changes for thunder.jit to support a torchao.float8 MLP (see the test):

Add support of torch._scaled_mm
Update _general_jit_torch_autograd_function_apply_lookaside

- Add `scaled_mm` - Change how the lookaside of `torch.autograd.Function.apply` applies dce taking the failure of apex fused rms norm into consideration. ```python @torch.no_grad() @no_autocast def FusedRMSNormAffineMixedDtypesFunction(t_0, t_1, tup11, f12, b13): # /usr/local/lib/python3.12/dist-packages/apex/normalization/fused_layer_norm.py:128: weight_ = weight.contiguous() # t_0: "cuda:0 f32[4, 5, 3, 2]" # t_1: "cuda:0 f32[3, 2]" # /usr/local/lib/python3.12/dist-packages/apex/normalization/fused_layer_norm.py:127: input_ = input.contiguous() t5 = ltorch.contiguous(t_0, memory_format=_torch_memory_format_0) # t5: "cuda:0 f32[4, 5, 3, 2]" # t5 = prims.stride_order(t_0, (3, 2, 1, 0)) # t5: "cuda:0 f32[4, 5, 3, 2]" # /usr/local/lib/python3.12/dist-packages/apex/normalization/fused_layer_norm.py:128: weight_ = weight.contiguous() t6 = ltorch.contiguous(t_1, memory_format=_torch_memory_format_0) # t6: "cuda:0 f32[3, 2]" # t6 = prims.stride_order(t_1, (1, 0)) # t6: "cuda:0 f32[3, 2]" (t10, t9) = apex_fused_rms_norm_forward_affine_mixed_dtypes(t5, (3, 2), t6, 1e-05) return t10 ``` For this trace, `thunder.core.transforms.dce` replaces `t9` with `_` then the augmented forward trace would lose the access to it. So by reusing the augmented forward trace in the basic forward trace, `dce` would not do so. Signed-off-by: Masaki Kozuki <[email protected]>

crcrpar · 2025-01-14T06:09:39Z

needs to fix the backward of torchao.float8 in tests. The cause seems to be the mismatch or row-major or column-major of the inputs to torch._scaled_mm. This could be dodged if we have a decomposition and let nvfuser or other fusion executor take care of it.

crcrpar mentioned this pull request Dec 23, 2024

[Tensor Subclasses] Trace transfom to interpret __torch_dispatch__ #1394

Draft

crcrpar force-pushed the tensor_subclass_3 branch from b4ecee4 to cafa119 Compare December 24, 2024 04:39

github-actions bot added the documentation Improvements or additions to documentation label Dec 24, 2024

crcrpar force-pushed the tensor_subclass_3 branch from cafa119 to 5afba28 Compare December 24, 2024 07:37

crcrpar force-pushed the tensor_subclass_2 branch from 515d425 to 2e763f1 Compare December 24, 2024 07:38

crcrpar force-pushed the tensor_subclass_3 branch from 5afba28 to 6e6b077 Compare December 25, 2024 18:21

crcrpar force-pushed the tensor_subclass_2 branch from 2e763f1 to a534003 Compare December 25, 2024 18:21

crcrpar force-pushed the tensor_subclass_3 branch from 6e6b077 to d9ed305 Compare December 30, 2024 12:22

crcrpar force-pushed the tensor_subclass_2 branch from e22488f to db3a6c9 Compare December 30, 2024 12:23

crcrpar force-pushed the tensor_subclass_3 branch from d9ed305 to 8435406 Compare January 2, 2025 13:52

github-actions bot removed the documentation Improvements or additions to documentation label Jan 2, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[tensor wrapper subclass] Add support for torchao.float8 mlp #1585

[tensor wrapper subclass] Add support for torchao.float8 mlp #1585

crcrpar commented Dec 23, 2024

crcrpar commented Jan 14, 2025 •

edited

Loading

[tensor wrapper subclass] Add support for torchao.float8 mlp #1585

Are you sure you want to change the base?

[tensor wrapper subclass] Add support for torchao.float8 mlp #1585

Conversation

crcrpar commented Dec 23, 2024

What does this PR do?

crcrpar commented Jan 14, 2025 • edited Loading

crcrpar commented Jan 14, 2025 •

edited

Loading