2 min readfrom Machine Learning

Kuma: compiling PyTorch models into self-contained WebGPU executables [P]

I've been experimenting with a compiler/runtime project that I'm not entirely sure is a good idea, so I'd love some feedback from people who've worked on deployment systems.

The idea is to compile an exported PyTorch model into a self-contained package that contains:

  • graph
  • binary weights
  • backend kernels (currently WGSL)
  • runtime metadata

A lightweight runtime loads that package and executes it directly in the browser with WebGPU. No Python, no server inference, and no dependency on a heavyweight runtime.

Right now the attached demos are just neural video representations because they were easy to test, but the motivation is actually operator networks and scientific ML, where I like the idea of distributing a single portable artifact.

The repo is here:
https://github.com/Slater-Victoroff/Kuma

I'm mostly looking for architectural feedback.

Some questions I'm wrestling with:

  • Is embedding backend kernels in the artifact a terrible idea?
  • Is this solving a real deployment problem or just reinventing ONNX Runtime?
  • Are there existing systems I should study that take a similar approach?
  • If you were designing a deployment format today, what would you change?

I'd especially appreciate thoughts from people who've worked on ONNX, IREE, TVM, ExecuTorch, MLIR, or similar compiler/runtime projects.

submitted by /u/svictoroff
[link] [comments]

Want to read more?

Check out the full article on the original site

View original article

Tagged with

#rows.com
#financial modeling with spreadsheets
#self-service analytics tools
#self-service analytics
#natural language processing for spreadsheets
#generative AI for data analysis
#Excel alternatives for data analysis
#real-time data collaboration
#real-time collaboration
#PyTorch
#WebGPU
#Compilation
#Runtime
#Deployment
#WGSL
#ONNX Runtime
#IREE
#TVM
#ExecuTorch
#MLIR