You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I’ve been exploring Liger-Kernel’s optimizations for decoder models like GPT, and I’m curious about extending these benefits to encoder models such as BERT.
BERT is the go-to architecture in areas like discrete diffusion models, a promising research area for the next-generation LLM.
In AI for biology, bert has exemplified as ESM (Lin et al., Science 2023, https://www.science.org/doi/10.1126/science.ade2574) which enables significant scientific applications like the 2024 novel prize problem protein structure prediction.
Given Liger-Kernel’s success in boosting training throughput and reducing GPU memory usage for decoder models, applying similar optimizations to encoder architectures seems promising. I’m interested in discussing the feasibility of adapting Liger-Kernel’s techniques for encoder models and would appreciate any insights or considerations from the community.
Alternatives
No response
Additional context
No response
The text was updated successfully, but these errors were encountered:
🚀 The feature, motivation and pitch
Hey team,
I’ve been exploring Liger-Kernel’s optimizations for decoder models like GPT, and I’m curious about extending these benefits to encoder models such as BERT.
BERT is the go-to architecture in areas like discrete diffusion models, a promising research area for the next-generation LLM.
In AI for biology, bert has exemplified as ESM (Lin et al., Science 2023, https://www.science.org/doi/10.1126/science.ade2574) which enables significant scientific applications like the 2024 novel prize problem protein structure prediction.
Given Liger-Kernel’s success in boosting training throughput and reducing GPU memory usage for decoder models, applying similar optimizations to encoder architectures seems promising. I’m interested in discussing the feasibility of adapting Liger-Kernel’s techniques for encoder models and would appreciate any insights or considerations from the community.
Alternatives
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: