Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

For most people who just want to play around and are using MacOS or Windows, I'd just recommend lmstudio.ai. Nice interface, with super easy searching and downloading of new models.


>I'd just recommend lmstudio.ai.

on windows it's an unsigned binary , backed by a website that only indicates a twitter/discord/github as far as explaining their organization, and the github doesn't include source on the client itself, only models.

this must throw up some red flags for others, no?


Does this not work with Stable Diffusion models? Not super familiar with all of this yet but I can't find any from HuggingFace that are compatible.


This seems interesting. Does anyone know of an iOS app compatible with OpenAI API that you could use to talk to LM Studio over local network?


Does it make any sense to try this on a lower-end Mac (like a M2 Air)?


Yeah! How much memory do you have?

If by lower-end Macbook air, you mean with 8GB of memory, try the smaller models (Such as Orca Mini 3B). You can do this via LM Studio, Oogabooga/text-generation-webui, KoboldCPP, GPT4all, ctransformers, and more.

I'm biased since I work on Ollama, and if you want to try it out:

1. Download https://ollama.ai/download

2. `ollama run orca`

3. Enter your input to prompt

Note Ollama is open source, and you can compile it too from https://github.com/jmorganca/ollama


I’m deliberating on how much RAM to get on my new MBP. Is 32gb going to stand me in good stead?


32GB should be fine. I went a little overboard and got a new MBP with M2 MAX and 96GB, but the hardware is really best suited at this point to a 30B model. I can and do play around with 65B models, but at that point you're making a fairly big tradeoff in generation speed for an incremental increase in quality.

As a datapoint, I have a 30B model [0] loaded right now and it's using 23.44GB of RAM. Getting around 9 tokens/sec, which is very usable. I also have the 65B version of the same model [1] and it's good for around 3.6 tokens/second, but it uses 44GB of RAM. Not unusably slow, but more often than not I opt for the 30B because it's good enough and a lot faster.

Haven't tried the llama2 70B yet.

[0] https://huggingface.co/TheBloke/upstage-llama-30b-instruct-2... [1] https://huggingface.co/TheBloke/Upstage-Llama1-65B-Instruct-...


What's your use case for local if you don't mind?


Thankyou that’s really helpful! The CTO lead times on Mac are huge here so it’s either the pro with 16 or the max with 32. Ideally I’d go pro with 64.


Local memory management will definitely get better in the future.

For now:

You should have at least 8 GB of RAM to run the 3B models, 16 GB to run the 7B models, and 32 GB to run the 13B models.

My personal recommendation is to get as much memory as you can if you want to work with local models [including VRAM if you are planning to be executing on GPU]


I run the original llama 7b model just fine on 8GB of Ram. It is best to give advice from experience not only what you read from others.


This is from us manually testing it on macbooks that we have available. It might run, but it's probably using swap.


Thanks - the issue I’m facing is the CTO lead times on Macs here!


By lower-end I meant that the Airs are quite low-end in general (compared to Pro/Studio). I have the maxed-out 24gb, but 16gb may be more common among people who might use an Air for this kind of thing.


Heads up for anyone else: clicking that link automatically starts the download (92MB)


what about a m1 with 16gb ram?


Was taking a look into this. Is the source code open for lmstudio.ai?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: