Is that the point? I ask, because the "weird" in the TPU is mostly its scale. It...

Is that the point? I ask, because the "weird" in the TPU is mostly its scale. Its not like you can't do matrix multiplies with the vector units on a CPU or with a GPU. Its really the scale, by that I mean its more elements than what you get with existing hardware, but its also lower precision, and appears less flexible, and is bolted to a heavyweight memory subsystem.

So, in that regard its no more "weird" than other common accelerator/coprocessors for things like compression.

So, in the end, what would show up in a phone doesn't really look anything like a TPU. I would maybe expect a lightweight piece of matrix acceleration hardware, which due to power constraints isn't going to be able to match what a "desktop" level FPGA or GPU is capable of much less a full blown TPU.