2 min readfrom Machine Learning

Working on a cgo-free CUDA binding in Go for ML stuff Week 3 - open source [P]

At our work we use CUDA in Rust since the company switched to it recently. Rust has pretty good Driver API bindings but it made me wonder why the hell we cant have something decent in Go without cgo.

I mostly build ML tools in the last month and Go is my main language for pretty much everything.

Problem is most Go CUDA projects still need cgo and the full toolkit at build time. That breaks cross compilation and makes Docker images huge which sucks when working on machine learning projects.

So last month I started messing around with a proof of concept that loads libcuda.so at runtime using purego. No cgo at all.

Biggest pain was thread affinity. CUDA keeps context per thread so goroutines switching around kept breaking things. I built a simple executor that locks an OS thread with runtime.LockOSThread and funnels all calls through a channel.

Heres roughly what using it looks like right now:

func run() error { cuda.Init() dev, _ := cuda.GetDevice(0) ctx, _ := dev.Primary() defer ctx.Close() a, _ := cuda.Alloc[float32](ctx, 1024) b, _ := cuda.Alloc[float32](ctx, 1024) c, _ := cuda.Alloc[float32](ctx, 1024) stream, _ := ctx.NewStream() start, _ := ctx.NewEvent() stop, _ := ctx.NewEvent() start.Record(stream) fn.LaunchOn(bg, stream, cfg, cuda.Arg(a), cuda.Arg(b), cuda.Arg(c), cuda.ArgValue(int32(1024)), ) stop.Record(stream) stop.Synchronize() duration, _ := start.Elapsed(stop) fmt.Printf("GPU time: %v\n", duration) return nil } 

On my 4070 Ti a 10M vector add showed CPU timer at like 160us but actual GPU event timing was 434us. That difference surprised me.

The project is still super early and moves slow cuz i only code on weekends and im a total noob with CUDA. Slowly adding Graphs and multi gpu support.

THIS IS SO early , so treat it more like a learning cuda repo, but im having fun learning cuda. Thought some of you might find it interesting too.

repo is github.com/eitamring/gocudrv if you wanna take a look.

Would be cool if anyone with 5xxx series cards wants to try it wink wink

submitted by /u/Eitamr
[link] [comments]

Want to read more?

Check out the full article on the original site

View original article

Tagged with

#machine learning in spreadsheet applications
#financial modeling with spreadsheets
#rows.com
#natural language processing for spreadsheets
#real-time data collaboration
#no-code spreadsheet solutions
#real-time collaboration
#generative AI for data analysis
#Excel alternatives for data analysis
#self-service analytics tools
#business intelligence tools
#collaborative spreadsheet tools
#natural language processing
#data visualization tools
#data analysis tools
#spreadsheet API integration
#CUDA
#Go
#ML
#cgo