More

mrdude42 · on Sept 24, 2024

Any particular models you can recommend for someone trying out local models for the first time?

oneshtein · on Sept 25, 2024

You need ollama[1][2] and hardware to run 20-70B models with quantization of Q4 at least to have similar experience to commercially hosted models. I use codestral:22b, gemma2:27b, gemma2:27b-instruct, aya:35b.

Smaller models are useless for me, because my native language is Ukrainian (it's easier to spot mistakes made by model in a language with more complex grammar rules).

As GUI, I use Page Assist[3] plugin for Firefox, or aichat[4] commandline and WebUI tool.

[1]: https://github.com/ollama/ollama/releases

[2]: https://ollama.com/

[3]: https://github.com/n4ze3m/page-assist

[4]: https://github.com/sigoden/aichat

copperx · on Sept 25, 2024

What's the hardware needed to make it run reasonably fast?

oneshtein · on Sept 26, 2024

I have no idea what "reasonably fast" means for you. It good for performance when model fit inside memory of a graphic card. Nvidia 4090 with 24Gb will be more than enough to start learning. I use gaming notebook with Nvidia 3080Ti equipped with 16Gb of videomemory.

ranger_danger · on Sept 28, 2024

I have no issues with using just the CPU on smaller (<= 13b) models and it's quite fast enough for me. Even 70b models still work if you have the RAM, they're just much slower.

dcl · on Sept 24, 2024

Llama and its variants are popular for language tasks, https://huggingface.co/meta-llama/Meta-Llama-3.1-8B

However, as far as I can tell, it's never actually clear what the hardware requirements are to get these to run without fussing around. Am I wrong about this?

gens · on Sept 25, 2024

In my experience the hardware requirements are whatever the file size is + a bit more. Cpu works, gpu is a lot faster but needs VRAM.

Was playing with them some more yesterday. Found that the 4bit ("q4") is much worse then q8 or fp16. Llama3.1 8B is ok, internlm2 7B is more precise. And they all hallucinate a lot.

Also found this page, that has some rankings: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_...

In my opinion they are not really useful. Good for translations, to summaries some texts, and.. to ask in case you forgot some things about something. But they lie, so for anything serious you have to do your own research. And absolutely no good for precise or obscure topics.

If someone wants to play there's GPT4All, Msty, LM Studio. You can give them some of your documents to process and use as "knowledge stacks". Msty has web search, GPT4All will get it in some time.

Got more opinions, but this is long enough already.

accrual · on Sept 25, 2024

I agree on the translation part. Llama 3.1 8B even at 4bit does a great job translating JP to EN as far as I can tell, and is often better than dedicated translation models like Argos in my experience.

petre · on Sept 25, 2024

I had a underwhelming experience with Llama translation, incompatable to Claude or GPT3.5+ which are very good. Kind of like Google translate but worse. I was using them through Perplexity.

AstralStorm · on Sept 24, 2024

Training is rather resource intensive either in time, RAM or VRAM. So it takes rather top end hardware. For the moment, nVidia's stuff works best if cost is no object.

For running them, you want a GPU. The limitation is that the model fits in VRAM or the performance will be slow.

But if you don't care about speed, there's more options.

wkat4242 · on Sept 24, 2024

Yeah llama3.1 is really impressive even in the small 8B size. Just don't rely on knowledge but make it interact with Google instead (really easy to do with OpenWebUI)

I personally use an uncensored version which is another huge benefit of a local model. Mainly because I have many kinky hobbies that piss off cloud models.

AstralStorm · on Sept 24, 2024

The moment Google gets infiltrated by rogue AI content it will cease to be as useful and you get to train it with more knowledge.

It's slowly getting there.

daveguy · on Sept 25, 2024

It's been infiltrated by rogue SEO content for at least a decade.

talldayo · on Sept 25, 2024

Maybe, but given how good Gemma is for a 2b model I think Google has hedged their bets nicely.

mrdude42 · on Aug 19, 2024

Yeah, I mean obviously GenX followed a similar track to Boomers and now Millennials are next in the generation line. Except this time, Boomers are in their 80's and GenX are in their 50s so when the Boomer grandparents are dying they're probably leaving inheritance to their millennial grandchildren who need it more than their grown-up GenX children.

mrdude42 · on Aug 17, 2024

free.99 if you know where to look

mrdude42 · on May 22, 2024

I dunno. I've been using Full Self Driving v12.3 and it's gotten sooooo much better than v11. I used to rarely use FSD v11 but now I use v12.3 every day driving around town and on the highway. Looking forward to getting v12.4 in my car next week hopefully too.

jnericks · on May 22, 2024

mrdude42 · on May 22, 2024

If Musk wasn't CEO of Tesla, it would turn into Rivian.

JojoFatsani · on May 23, 2024

Which would be great.

gwbas1c · on May 23, 2024

Why the downvotes? Vivian, looks like it's run by adults.

mrdude42 · on May 16, 2024

Couldn’t have said it better myself.

mrdude42 · on Sept 3, 2023

I've definitely been trying to find a good alternative for a while. Still haven't found one though…

mrdude42 · on July 6, 2023

YES. I completely 100% agree. Try to minimize the number of messages you're going to send. Just saying "Hi" and then also saying what you want to ask just makes my phone ring twice instead of once and that just gets more annoying with each subsequent message you send before I have a chance to respond.

mrdude42 · on July 6, 2023

I'd always make one for my work dev folder so I could quickly switch to it as soon as I restarted my terminal. I'd always make it just a 2 letter abbreviation. For example, my last one was an alias "s6" to take me the the correct dev folder.

pawelduda · on July 6, 2023

Check out zsh-z, autojump (or any derivative of these)

mrdude42 · on July 6, 2023

Fraction of the cost of the Apple Studio Display and also has more ports available. Shared all the files needed to make one yourself.