1 min readfrom Machine Learning

I released a softmax-free attention model at GPT-2 Medium scale (~354M params, 11.5B tokens): structural sparsity + tile-skipping kernels for long-context VRAM savings. Open weights + custom Triton kernels [R]

I released a softmax-free attention model at GPT-2 Medium scale (~354M params, 11.5B tokens): structural sparsity + tile-skipping kernels for long-context VRAM savings. Open weights + custom Triton kernels [R]

Want to read more?

Check out the full article on the original site

View original article

Tagged with

#natural language processing for spreadsheets
#generative AI for data analysis
#rows.com
#Excel alternatives for data analysis
#softmax-free attention
#structural sparsity
#tile-skipping kernels
#VRAM savings
#long-context
#GPT-2
#Triton kernels
#open weights
#machine learning
#attention model
#parameters
#tokens
#kernels
#sparsity
#model
#Hugging Face