Is anyone working on software that lets you run local LLMs in the browser?
In theory, it should be possible, shouldn't it?
The page could hold only the software in JavaScript that uses WebGL to run the neural net. And offer an "upload" button that the user can click to select a model from their file system. The button would not upload the model to a server - it would just let the JS code access it to convert it into WebGL and move it into the GPU.
This way, one could download models from HuggingFace, store them locally and use them as needed. Nicely sandboxed and independent of the operating system.
This might be a misunderstanding. Did you see the "button that the user can click to select a model from their file system" part of my comment?
I tried some of the demos of transformers.js but they all seem to load the model from a server. Which is super slow. I would like to have a page the lets me use any model I have on my disk.
> Modify this application such that it offers the user a file browse button for selecting their own local copy of the model file instead of loading it over the network. Provide a "download model" option too.
Then later:
> Build the production app and then make it available on localhost somehow
And:
> Uncaught (in promise) Error: Invalid configuration detected: both local and remote models are disabled. Fix by setting `env.allowLocalModels` or `env.allowRemoteModels` to `true`.
Yeah, something like that, but without the WebGPU requirement.
Neither FireFox nor Chromium support WebGPU on Linux. Maybe behind flags. But before using a technology, I would wait until it is available in the default config.
Lets see when browsers will bring WebGPU to Linux.
This one is pretty cool. Compile the gguf of an OSS LLM directly into an executable. Will open an interface in the browser to chat. Can also launch an OpenAI API style interface hosted locally.
Doesn't work quite as well on Windows due to the executable file size limit but seems great for Mac/Linux flavors.
Container or VM is a bigger commitment. VMs need root and containers need Docker group and something like docker-compose or a shell script or something.
idk it's just like, do I want to run to the store and buy a 24-pack of water bottles, and stash them somewhere, or do I want to open the tap and have clean drinking water
Neither of requirements are true on recent OS versions. Users have had the ability to make containers or VMs without special privileges for a very long time now.
Beyond all the wasm/webgpu approaches other folks have linked (mostly in the transformers.js ecosystem), there's been a standardized API brewing since 2019: https://webmachinelearning.github.io/webnn-intro/
No, I'm looking for an html page with a button "Select LLM". After pressing that button and selecting a local LLM from disk, it would show an input field where you can type your question and then it would use the given LLM to create the answer.
I'm not sure what OpenWebUI is, but if it was what I mean, they would surely have the page live and not ask users to install Docker etc.
It's both what you want and not; the chat/question interface is as you describe, lack-of-installation is not. The LLM work is offloaded to other software, not the browser.
I would like to skip maintaining all this crap, though: I like your approach
You should install it, because it's exactly what you just described.
Edit: From a UI perspective, it's exactly what you described. There's a dropdown where you select the LLM, and there's a ChatGPT-style chatbox. You just docker-up and go to town.
Maybe I don't understand the rest of the request, but I can't imagine a software where a webpage exists and it just magically has LLMs available in the browser with no installation?
It doesn't seem exactly like what they are describing. The end-user interface is what they are describing but it sounds like they want the actual LLM to run in the browser (perhaps via webgpu compute shaders). Open WebUI seems to rely on some external executor like ollama/llama.cpp, which naturally can still be self-hosted but they are not executing INSIDE the browser.
Does that even exist? It's basically what they described but with some additional installation? Once you install it, you can select the LLM on disk and run it? That's what they asked for.
Apparently it does, though I'm learning about it for the first time in this thread also. Personally, I just run llama.cpp locally in docker-compose with anythingllm for the UI but I can see the appeal of having it all just run in the browser.
Not browser but Electron. For the browser you would have to run a local nodejs server and point the browser app to use the local API. I use electron with nodejs and react for UI. Yes I can switch models.
In theory, it should be possible, shouldn't it?
The page could hold only the software in JavaScript that uses WebGL to run the neural net. And offer an "upload" button that the user can click to select a model from their file system. The button would not upload the model to a server - it would just let the JS code access it to convert it into WebGL and move it into the GPU.
This way, one could download models from HuggingFace, store them locally and use them as needed. Nicely sandboxed and independent of the operating system.