Extending Liger-Kernel Optimizations to Encoder Models Like BER #500

pengzhangzhi · 2024-12-26T15:57:50Z

🚀 The feature, motivation and pitch

Hey team,

I’ve been exploring Liger-Kernel’s optimizations for decoder models like GPT, and I’m curious about extending these benefits to encoder models such as BERT.

BERT is the go-to architecture in areas like discrete diffusion models, a promising research area for the next-generation LLM.
In AI for biology, bert has exemplified as ESM (Lin et al., Science 2023, https://www.science.org/doi/10.1126/science.ade2574) which enables significant scientific applications like the 2024 novel prize problem protein structure prediction.

Given Liger-Kernel’s success in boosting training throughput and reducing GPU memory usage for decoder models, applying similar optimizations to encoder architectures seems promising. I’m interested in discussing the feasibility of adapting Liger-Kernel’s techniques for encoder models and would appreciate any insights or considerations from the community.

Alternatives

No response

Additional context

No response

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extending Liger-Kernel Optimizations to Encoder Models Like BER #500

Extending Liger-Kernel Optimizations to Encoder Models Like BER #500

pengzhangzhi commented Dec 26, 2024

Extending Liger-Kernel Optimizations to Encoder Models Like BER #500

Extending Liger-Kernel Optimizations to Encoder Models Like BER #500

Comments

pengzhangzhi commented Dec 26, 2024

🚀 The feature, motivation and pitch

Alternatives

Additional context