That's what the Google TPU is in a nutshell as I understand it, loading weights ...

		moffkalast on Dec 20, 2023 \| parent \| context \| favorite \| on: LLM in a Flash: Efficient LLM Inference with Limit... That's what the Google TPU is in a nutshell as I understand it, loading weights into memory cells between fpus.