Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Llama.cpp just added CUDA GPU acceleration yesterday, so this would be very interesting for the emerging space of running local LLMs on commodity hardware.

Running CUDA on an AMD RDNA3 APU is what I'd like to see as its probably the cheapest 16GB shared VRAM solution (UMA Frame Buffer BIOS setting) and creates the possibility of running 13b LLM locally on an underutilized iGPU.

Aaand its been dead for years, shame.



- llama.cpp already has OpenCL acceleration. It has had it for some time.

- AMD already has a CUDA translator: ROCM. It should work with llama.cpp CUDA, but in practice... shrug

- Copies the CUDA/OpenCL code make (that are unavoidable for discrete GPUs) are problematic for IGPs. Right now acceleration regresses performance on IGPs.

Llama.cpp would need tailor made IGP acceleration. And I'm not even sure what API has the most appropriate zero copy mechanism. Vulkan? OneAPI? Something inside ROCM?


Apple has a way for zero copy since they have one memory pool for both GPU and cpu.

But… I don’t know if it’s possible to do for iGPUs that partition memory in BIOS. I am curious for the answer.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: