Llama.cpp just added CUDA GPU acceleration yesterday, so this would be very interesting for the emerging space of running local LLMs on commodity hardware.
Running CUDA on an AMD RDNA3 APU is what I'd like to see as its probably the cheapest 16GB shared VRAM solution (UMA Frame Buffer BIOS setting) and creates the possibility of running 13b LLM locally on an underutilized iGPU.
- llama.cpp already has OpenCL acceleration. It has had it for some time.
- AMD already has a CUDA translator: ROCM. It should work with llama.cpp CUDA, but in practice... shrug
- Copies the CUDA/OpenCL code make (that are unavoidable for discrete GPUs) are problematic for IGPs. Right now acceleration regresses performance on IGPs.
Llama.cpp would need tailor made IGP acceleration. And I'm not even sure what API has the most appropriate zero copy mechanism. Vulkan? OneAPI? Something inside ROCM?
Running CUDA on an AMD RDNA3 APU is what I'd like to see as its probably the cheapest 16GB shared VRAM solution (UMA Frame Buffer BIOS setting) and creates the possibility of running 13b LLM locally on an underutilized iGPU.
Aaand its been dead for years, shame.