[P] Built a portable GPU ISA after reading too many architecture manuals [P]

I’ve been reading GPU architecture docs in my free time. NVIDIA PTX, AMD ISA reference guides, Intel Xe, reverse-engineered Apple GPU stuff. Over 5,000 pages across 16 microarchitectures.

After a while you notice all four vendors are doing the same 11 things with different names. So I wrote a spec that covers all of them and built a toolchain around it. It’s called WAVE. You write a kernel once, it compiles to a portable binary, then thin backends translate it to Metal, PTX, HIP, or SYCL.

Same binary verified on Apple M4 Pro, NVIDIA T4, and AMD MI300X. My co-author Onyinye built PyTorch integration and got identical training results across all backends.

Please star on GitHub: https://github.com/Oabraham1/wave
Preprint: https://arxiv.org/abs/2603.28793
Read full docs and how I built everything: https://wave.ojima.me

pip install wave-gpu

submitted by /u/not-your-typical-cs
[link] [comments]

[P] Built a portable GPU ISA after reading too many architecture manuals [P]

Want to read more?

Tagged with