•1 min read•from Machine Learning
I released a softmax-free attention model at GPT-2 Medium scale (~354M params, 11.5B tokens): structural sparsity + tile-skipping kernels for long-context VRAM savings. Open weights + custom Triton kernels [R]
![I released a softmax-free attention model at GPT-2 Medium scale (~354M params, 11.5B tokens): structural sparsity + tile-skipping kernels for long-context VRAM savings. Open weights + custom Triton kernels [R]](/_next/image?url=https%3A%2F%2Fexternal-preview.redd.it%2FfzbusCnVMF6KiLx-XGEbOtJ2hfOGlk4ouLmg5Wsh_8c.png%3Fwidth%3D640%26crop%3Dsmart%26auto%3Dwebp%26s%3Dfb342a515f360dfb611d261f135a028577e8e501&w=3840&q=75)
| submitted by /u/NonGameCatharsis [link] [comments] |
Want to read more?
Check out the full article on the original site
Tagged with
#natural language processing for spreadsheets
#generative AI for data analysis
#rows.com
#Excel alternatives for data analysis
#softmax-free attention
#structural sparsity
#tile-skipping kernels
#VRAM savings
#long-context
#GPT-2
#Triton kernels
#open weights
#machine learning
#attention model
#parameters
#tokens
#kernels
#sparsity
#model
#Hugging Face