For most people who just want to play around and are using MacOS or Windows, I'd...

serf · on July 25, 2023

>I'd just recommend lmstudio.ai.

on windows it's an unsigned binary , backed by a website that only indicates a twitter/discord/github as far as explaining their organization, and the github doesn't include source on the client itself, only models.

this must throw up some red flags for others, no?

trevor-e · on July 26, 2023

Does this not work with Stable Diffusion models? Not super familiar with all of this yet but I can't find any from HuggingFace that are compatible.

yreg · on July 25, 2023

This seems interesting. Does anyone know of an iOS app compatible with OpenAI API that you could use to talk to LM Studio over local network?

dividedbyzero · on July 25, 2023

Does it make any sense to try this on a lower-end Mac (like a M2 Air)?

mchiang · on July 25, 2023

Yeah! How much memory do you have?

If by lower-end Macbook air, you mean with 8GB of memory, try the smaller models (Such as Orca Mini 3B). You can do this via LM Studio, Oogabooga/text-generation-webui, KoboldCPP, GPT4all, ctransformers, and more.

I'm biased since I work on Ollama, and if you want to try it out:

1. Download https://ollama.ai/download

2. `ollama run orca`

3. Enter your input to prompt

Note Ollama is open source, and you can compile it too from https://github.com/jmorganca/ollama

bdavbdav · on July 25, 2023

I’m deliberating on how much RAM to get on my new MBP. Is 32gb going to stand me in good stead?

rootusrootus · on July 25, 2023

32GB should be fine. I went a little overboard and got a new MBP with M2 MAX and 96GB, but the hardware is really best suited at this point to a 30B model. I can and do play around with 65B models, but at that point you're making a fairly big tradeoff in generation speed for an incremental increase in quality.

As a datapoint, I have a 30B model [0] loaded right now and it's using 23.44GB of RAM. Getting around 9 tokens/sec, which is very usable. I also have the 65B version of the same model [1] and it's good for around 3.6 tokens/second, but it uses 44GB of RAM. Not unusably slow, but more often than not I opt for the 30B because it's good enough and a lot faster.

Haven't tried the llama2 70B yet.

[0] https://huggingface.co/TheBloke/upstage-llama-30b-instruct-2... [1] https://huggingface.co/TheBloke/Upstage-Llama1-65B-Instruct-...

swader999 · on July 25, 2023

What's your use case for local if you don't mind?

bdavbdav · on July 26, 2023

Thankyou that’s really helpful! The CTO lead times on Mac are huge here so it’s either the pro with 16 or the max with 32. Ideally I’d go pro with 64.

mchiang · on July 25, 2023

Local memory management will definitely get better in the future.

For now:

You should have at least 8 GB of RAM to run the 3B models, 16 GB to run the 7B models, and 32 GB to run the 13B models.

My personal recommendation is to get as much memory as you can if you want to work with local models [including VRAM if you are planning to be executing on GPU]

sourcecodeplz · on July 26, 2023

I run the original llama 7b model just fine on 8GB of Ram. It is best to give advice from experience not only what you read from others.

mchiang · on July 26, 2023

This is from us manually testing it on macbooks that we have available. It might run, but it's probably using swap.

bdavbdav · on July 26, 2023

Thanks - the issue I’m facing is the CTO lead times on Macs here!

dividedbyzero · on July 25, 2023

By lower-end I meant that the Airs are quite low-end in general (compared to Pro/Studio). I have the maxed-out 24gb, but 16gb may be more common among people who might use an Air for this kind of thing.

column · on July 26, 2023

Heads up for anyone else: clicking that link automatically starts the download (92MB)

moneywoes · on July 26, 2023

what about a m1 with 16gb ram?

robotnikman · on July 25, 2023

Was taking a look into this. Is the source code open for lmstudio.ai?