ive been trying this for a few week, but i dont at all currently own hardware go...

nowittyusername · 2025-11-06T23:17:20 1762471040

My man, we now have llms that are anywhere between 130 million to 1 trillion parameters available for us to run locally, I can guarantee there is a model for you there that even your toaster can run. I have a RTX 4090 but for most of my fiddling i use small models like Qwen 3 4b and they work amazing so there's no excuse :P.

8note · 2025-11-06T23:38:15 1762472295

well, i got some gemini models running on my phone, but if i switch apps, android kills it, so the call to the server always hangs... and then the screen goes black

the new laptop only has 16GB of memory total, with another 7 dedicated to the NPU.

i tried pulling up Qwen 3 4B on it, but the max context i can get loaded is about 12k before the laptop crashes.

my next attempt is gonna be a 0.5B one, but i think ill still end up having to compress the context every call, which is my real challenge

nowittyusername · 2025-11-07T00:34:43 1762475683

I recommend use low quantized models first. for example anywhere between q4 and q8 gguf models. Also dont need high context to fiddle around and learn the ins and outs. for example 4k context is more then enough to figure out what you need in agentic solutions. In fact thats a good limit to impose on yourself and start developing decent automatic context management systems internally as that will be very important when making robus agentic solutions. with all that you should be able to load an llm no issues on many devices.

tmzt · 2025-11-09T21:47:23 1762724843

If it helps, you can disable some of those limitations on Android:

https://www.reddit.com/r/AndroidQuestions/comments/16r1cfq/p...