XLA + Conv-Bias-Elu via cuDNN runtime fusion #19

kaixih · 2022-10-12T20:42:25Z

kaixih
Oct 12, 2022
Collaborator

Recently, we are working on the new PR to enable the Conv-Bias-Elu fusion in XLA, which can take advantage of the cuDNN runtime compiled fusion kernels. This is going to be the first PR to exploit this new cuDNN feature in XLA. More patterns will come after this PR gets merged.

We have conducted some simple benchmark with synthetic data and the results show about 5% to 40% perf improvements over the original Conv-Bias (currently XLA can only fusion this part) and then Elu in most cases.

Also, we plan to make this feature turned on by default, though it might lead to longer the compilation time since cuDNN will need some time to compile every kernel/engine during the autotune. And this overhead is currently ~1.5s for each engine on Ampere.

Any thoughts and feedback are welcomed. Thanks. For more detailed implementation, please refer to this PR.

As a side note, in the native TF, we have already enabled this feature by setting TF_CUDNN_USE_RUNTIME_FUSION and the current list of supported patterns are:

Conv-Bias-Elu/Relu6/LeakyRelu
Matmul-Bias+Gelu/Sigmoid/Tanh

More supported patterns from cuDNN can be found in the online cuDNN guide.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

XLA + Conv-Bias-Elu via cuDNN runtime fusion #19

{{title}}

Replies: 0 comments

Select a reply

XLA + Conv-Bias-Elu via cuDNN runtime fusion #19

kaixih Oct 12, 2022 Collaborator

Replies: 0 comments

kaixih
Oct 12, 2022
Collaborator