I've been playing with running some models on the free tier Oracle VM machines w...

brucethemoose2 · on July 23, 2023

Prompt ingestion is too slow on the Oracle VMs.

Also its really tricky to even build llama.cpp with a BLAS library, to make prompt ingestion less slow. The Oracle Linux OpenBLAS build isnt detected ootb, and it doesn't perform well compared to x86 for some reason.

LLVM/GCC have some kind of issue identifying the Ampere ARM architecture (march=native doesn't really work), so maybe this could be improved with the right compiler flags?

pedrovhb · on July 24, 2023

Not sure if that's still the case. I remember having trouble building it a couple of months ago, had to tweak the Makefile because iirc it assumed ARM64 <=> Mac, but I recently re-cloned the repo and started from scratch and it was as simple as `make DLLAMA_BLAS=1`. I don't think I have any special setup other than having installed the apt openblas dev package.

brucethemoose2 · on July 24, 2023

IDK. A bunch of basic development packages like git were missing from my Ubuntu image when I tried last week, and I just gave up because it seemed like a big rabbit hole to go down.

I can see the ARM64 versions on the Ubuntu web package list, so... IDK what was going on?

On Oracle Linux, until I changed some env variables and lines in the makefile, the openblas build would "work," but it was actually silently failing and not using OpenBLAS.

jvickers · on July 23, 2023

Is it any easier when using Ubuntu on ARM Oracle servers?

brucethemoose2 · on July 24, 2023

Nah, I tried Ubuntu too.

The OpenBLAS package was missing on ARM, along with some other dependencies I needed for compilation.

At the end of the day, even with many tweaks and custom compilation flags, the instance was averaging below 1 token/sec as a Kobold Horde host, which is below the threshold to even be allowed as a llm host.

summarity · on July 24, 2023

If you're running on Ampere, using llama.cpp is probably not ideal. While it's optimized for ARM, Ampere has native acceleration for workloads like this: https://cloudmarketplace.oracle.com/marketplace/en_US/adf.ta...