Llama.cpp just added CUDA GPU acceleration yesterday, so this would be very inte...

brucethemoose2 · on June 15, 2023

- llama.cpp already has OpenCL acceleration. It has had it for some time.

- AMD already has a CUDA translator: ROCM. It should work with llama.cpp CUDA, but in practice... shrug

- Copies the CUDA/OpenCL code make (that are unavoidable for discrete GPUs) are problematic for IGPs. Right now acceleration regresses performance on IGPs.

Llama.cpp would need tailor made IGP acceleration. And I'm not even sure what API has the most appropriate zero copy mechanism. Vulkan? OneAPI? Something inside ROCM?

sroussey · on June 16, 2023

Apple has a way for zero copy since they have one memory pool for both GPU and cpu.

But… I don’t know if it’s possible to do for iGPUs that partition memory in BIOS. I am curious for the answer.