softmax1

All

7 repositories

Flash-Attention-Softmax-N
Public
CUDA and Triton implementations of Flash Attention with SoftmaxN.
transformers artificial-intelligence attention-mechanism deep-learning pytorch
Python
•
GNU General Public License v3.0
•5•67•1•1•Updated May 26, 2024May 26, 2024
llama2.c-tinystories
Public
Inference Llama 2 in one file of pure C
Jupyter Notebook
•
MIT License
•2.2k•0•0•0•Updated Dec 20, 2023Dec 20, 2023
MosaicBERT-Softmax1
Public
Python
•
GNU General Public License v3.0
•0•0•0•0•Updated Sep 23, 2023Sep 23, 2023
EsperBERTo
Public
A test of the Attention Is Off By One hypothesis
ai llm llm-training
Python
•0•0•0•0•Updated Sep 16, 2023Sep 16, 2023
nanoGPT_softmax1
Public
An experiment using nanoGPT vs nanoGPT (softmax1) to see how it affects perplexity score
Python
•0•0•1•0•Updated Aug 19, 2023Aug 19, 2023
nanoGPT_softmax1_reddit
Public
The simplest, fastest repository for training/finetuning medium-sized GPTs.
Python
•
MIT License
•6.3k•0•0•0•Updated Aug 19, 2023Aug 19, 2023
quietGPT
Public
A scaled down empirical study of "Attention is Off by One" on nanoGPT
Python
•0•3•0•0•Updated Aug 9, 2023Aug 9, 2023