June 21, 2026•1 min read•from Machine Learning

I released a softmax-free attention model at GPT-2 Medium scale (~354M params, 11.5B tokens): structural sparsity + tile-skipping kernels for long-context VRAM savings. Open weights + custom Triton kernels [R]

Check out the full article on the original site

#natural language processing for spreadsheets

#generative AI for data analysis

#rows.com

#Excel alternatives for data analysis

#softmax-free attention

#structural sparsity

#tile-skipping kernels

#VRAM savings

#long-context

#GPT-2

#Triton kernels

#open weights

#machine learning

#attention model

#parameters

#tokens

#kernels

#sparsity

#model

#Hugging Face