1 min readfrom Machine Learning

[P] Built a portable GPU ISA after reading too many architecture manuals [P]

I’ve been reading GPU architecture docs in my free time. NVIDIA PTX, AMD ISA reference guides, Intel Xe, reverse-engineered Apple GPU stuff. Over 5,000 pages across 16 microarchitectures.

After a while you notice all four vendors are doing the same 11 things with different names. So I wrote a spec that covers all of them and built a toolchain around it. It’s called WAVE. You write a kernel once, it compiles to a portable binary, then thin backends translate it to Metal, PTX, HIP, or SYCL.

Same binary verified on Apple M4 Pro, NVIDIA T4, and AMD MI300X. My co-author Onyinye built PyTorch integration and got identical training results across all backends.

Please star on GitHub: https://github.com/Oabraham1/wave
Preprint: https://arxiv.org/abs/2603.28793
Read full docs and how I built everything: https://wave.ojima.me

pip install wave-gpu

submitted by /u/not-your-typical-cs
[link] [comments]

Want to read more?

Check out the full article on the original site

View original article

Tagged with

#rows.com
#real-time data collaboration
#financial modeling with spreadsheets
#real-time collaboration
#spreadsheet API integration
#GPU
#ISA
#architecture
#WAVE
#NVIDIA PTX
#AMD ISA
#Intel Xe
#portable binary
#kernel
#PTX
#toolchain
#Metal
#microarchitectures
#PyTorch
#Apple M4 Pro