Is anyone working on software that lets you run local LLMs in the browser? In th...

simonw · 2025-09-08T16:41:34 1757349694

Transformers.js (https://huggingface.co/docs/transformers.js/en/index) is this. Some demos (should work in Chrome and Firefox on Windows, or Firefox Nightly on macOS and Linux):

https://huggingface.co/spaces/webml-community/llama-3.2-webg... loads a 1.24GB Llama 3.2 q4f16 ONNX build

https://huggingface.co/spaces/webml-community/janus-pro-webg... loads a 2.24 GB DeepSeek Janus Pro model which is multi-modal for output - it can respond with generated images in addition to text.

https://huggingface.co/blog/embeddinggemma#transformersjs loads 400MB for an EmbeddingGemma demo (embeddings, not LLMs)

I've collected a few more of these demos here: https://simonwillison.net/tags/transformers-js/

You can also get this working with web-llm - https://github.com/mlc-ai/web-llm - here's my write-up of a demo that uses that: https://simonwillison.net/2024/Nov/29/structured-generation-...

mg · 2025-09-08T17:44:36 1757353476

This might be a misunderstanding. Did you see the "button that the user can click to select a model from their file system" part of my comment?

I tried some of the demos of transformers.js but they all seem to load the model from a server. Which is super slow. I would like to have a page the lets me use any model I have on my disk.

simonw · 2025-09-08T20:09:20 1757362160

Oh sorry, I missed that bit.

I got Codex + GPT-5 to modify that Llama chat example to implement the "load from local directory" pattern. It appears to work.

First you'll need to grab the checkout of the local model (~1.3GB):

  git lfs install
  git clone https://huggingface.co/onnx-community/Llama-3.2-1B-Instruct-q4f16

Then visit this page: https://static.simonwillison.net/static/2025/llama-3.2-webgp... - in Chrome or Firefox Nightly.

Now click "Browse folder" and select the folder you just checked out with Git.

Click the confusing "Upload" confirmation (it doesn't upload anything, just opens those files in the current browser session).

Now click "Load local model" - and you should get a full working chat interface.

Code is here: https://github.com/simonw/transformers.js-examples/commit/cd...

Here's the full Codex session that I used to build this: https://gist.github.com/simonw/3c46c9e609f6ee77367a760b5ca01...

I ran Codex against the https://github.com/huggingface/transformers.js-examples/tree... folder and prompted:

> Modify this application such that it offers the user a file browse button for selecting their own local copy of the model file instead of loading it over the network. Provide a "download model" option too.

Then later:

> Build the production app and then make it available on localhost somehow

And:

> Uncaught (in promise) Error: Invalid configuration detected: both local and remote models are disabled. Fix by setting `env.allowLocalModels` or `env.allowRemoteModels` to `true`.

And:

> Add a bash script which will build the application such that I can upload a folder called llama-3.2-webgpu to http://static.simonwillison.net/static/2025/llama-3.2-webgpu... and http://static.simonwillison.net/static/2025/llama-3.2-webgpu... will serve the app

(Note that this doesn't allow you to use any model on your machine, but it proves that it's possible.)

simonw · 2025-09-08T21:11:42 1757365902

Wrote this all up on my blog here, including a GIF demo showing how to use it: https://simonwillison.net/2025/Sep/8/webgpu-local-folder/

mg · 2025-09-09T05:09:53 1757394593

Awesome!

Bookmarked. I will surely try it out once FireFox or Chromium on Linux support WebGPU in their default config.

SparkyMcUnicorn · 2025-09-08T14:56:04 1757343364

Yes. MLC's inference engine runs on WebGPU/WASM.

https://github.com/mlc-ai/web-llm-chat

https://github.com/mlc-ai/mlc-llm

https://github.com/mlc-ai/web-llm

mg · 2025-09-08T15:10:21 1757344221

Yeah, something like that, but without the WebGPU requirement.

Neither FireFox nor Chromium support WebGPU on Linux. Maybe behind flags. But before using a technology, I would wait until it is available in the default config.

Lets see when browsers will bring WebGPU to Linux.

SparkyMcUnicorn · 2025-09-08T15:24:43 1757345083

This should be what you're looking for. It doesn't utilize the GPU, but WebGL support is in the TODOs.

https://github.com/ngxson/wllama

https://huggingface.co/spaces/ngxson/wllama

simonw · 2025-09-08T21:12:31 1757365951

Firefox Nightly on macOS now supports WebGPU, and the documentation says the Linux build supports it too.

generalizations · 2025-09-08T15:47:29 1757346449

This is an in-browser llamacpp implementation: https://github.com/ngxson/wllama

And related is the whisper implementation: https://ggml.ai/whisper.cpp/

vonneumannstan · 2025-09-08T15:39:44 1757345984

This one is pretty cool. Compile the gguf of an OSS LLM directly into an executable. Will open an interface in the browser to chat. Can also launch an OpenAI API style interface hosted locally.

Doesn't work quite as well on Windows due to the executable file size limit but seems great for Mac/Linux flavors.

https://github.com/Mozilla-Ocho/llamafile

adastra22 · 2025-09-08T15:00:36 1757343636

You don’t need a browser to sandbox something. Easier and more performant to do GOU pass through to a container or VM.

01HNNWZ0MV43FF · 2025-09-08T15:39:51 1757345991

Container or VM is a bigger commitment. VMs need root and containers need Docker group and something like docker-compose or a shell script or something.

idk it's just like, do I want to run to the store and buy a 24-pack of water bottles, and stash them somewhere, or do I want to open the tap and have clean drinking water

adastra22 · 2025-09-10T02:49:48 1757472588

Neither of requirements are true on recent OS versions. Users have had the ability to make containers or VMs without special privileges for a very long time now.

paulirish · 2025-09-08T17:47:16 1757353636

Beyond all the wasm/webgpu approaches other folks have linked (mostly in the transformers.js ecosystem), there's been a standardized API brewing since 2019: https://webmachinelearning.github.io/webnn-intro/

Demos here: https://webmachinelearning.github.io/webnn-samples/ I'm not sure any of them allow you to select a model file from disk, but that should be entirely straightforward.

samsolomon · 2025-09-08T14:58:48 1757343528

Is Open WebUI something like you are looking for? The design has some awkwardness, but overall it's incorporated a ton of great features.

https://openwebui.com/

mg · 2025-09-08T15:04:25 1757343865

No, I'm looking for an html page with a button "Select LLM". After pressing that button and selecting a local LLM from disk, it would show an input field where you can type your question and then it would use the given LLM to create the answer.

I'm not sure what OpenWebUI is, but if it was what I mean, they would surely have the page live and not ask users to install Docker etc.

tmdetect · 2025-09-08T15:44:22 1757346262

I think what you want is this: https://github.com/mlc-ai/web-llm

bravetraveler · 2025-09-08T15:24:53 1757345093

It's both what you want and not; the chat/question interface is as you describe, lack-of-installation is not. The LLM work is offloaded to other software, not the browser.

I would like to skip maintaining all this crap, though: I like your approach

Jemaclus · 2025-09-08T15:36:00 1757345760

You should install it, because it's exactly what you just described.

Edit: From a UI perspective, it's exactly what you described. There's a dropdown where you select the LLM, and there's a ChatGPT-style chatbox. You just docker-up and go to town.

Maybe I don't understand the rest of the request, but I can't imagine a software where a webpage exists and it just magically has LLMs available in the browser with no installation?

craftkiller · 2025-09-08T15:42:57 1757346177

It doesn't seem exactly like what they are describing. The end-user interface is what they are describing but it sounds like they want the actual LLM to run in the browser (perhaps via webgpu compute shaders). Open WebUI seems to rely on some external executor like ollama/llama.cpp, which naturally can still be self-hosted but they are not executing INSIDE the browser.

Jemaclus · 2025-09-08T15:53:24 1757346804

Does that even exist? It's basically what they described but with some additional installation? Once you install it, you can select the LLM on disk and run it? That's what they asked for.

Maybe I'm misunderstanding something.

craftkiller · 2025-09-08T16:04:52 1757347492

Apparently it does, though I'm learning about it for the first time in this thread also. Personally, I just run llama.cpp locally in docker-compose with anythingllm for the UI but I can see the appeal of having it all just run in the browser.

  https://github.com/mlc-ai/web-llm
  https://github.com/ngxson/wllama

Jemaclus · 2025-09-08T16:07:33 1757347653

Oh, interesting. Well, TIL.

andsoitis · 2025-09-08T15:43:45 1757346225

> You should install it, because it's exactly what you just described.

Not OP, but it really isn't what' they're looking for. Needing to install stuff VS simply going to a web page are two very different things.

coip · 2025-09-08T14:56:57 1757343417

Have you seen/used the webGPU spaces?

https://huggingface.co/docs/transformers.js/en/guides/webgpu

eta: its predecessor was using webGL

mg · 2025-09-08T15:16:30 1757344590

WebGPU is not yet available in the default config of Linux browsers, so WebGL would have been perfect :)

mudkipdev · 2025-09-08T15:04:54 1757343894

It was done with gemma-3-270m, I hope someone will post a link to it below

vavikk · 2025-09-08T15:07:55 1757344075

Not browser but Electron. For the browser you would have to run a local nodejs server and point the browser app to use the local API. I use electron with nodejs and react for UI. Yes I can switch models.