Specialized AI chips don't seem like a very good business idea to me.
The way we do things (in AI) today - we may be doing things completely different tomorrow. It's not like there's a standard everyone has agreed on.
There is a very real risk these specialized, expensive devices will go the way of the Bitcoin ASIC miner (which saturated secondary markets at a fraction of its original cost).
Making a matrix multiplication accelerator seems a pretty safe bet to me. I am less sure about sparsity optimization, but I guess it still works for dense matrixes even in the worst case.
Sure, but Cerebras isn't just multiplying two large matrices, they are multiplying two large very sparse matrices, relying on ReLU activation to maintain sparcity in all of the layers. We already have BERT/XLNet/other transformer models move away from ReLU to GELU which do not result in sparse matrices. "Traditional" activations (tanh, sigmoid, softmax) are not sparse either.
While the theoretical innovations have mostly been incremental, there has been a lot of progress in the development of "light" deep learning frameworks - so the tasks that previously required massive GPUs can now run on your phone. And this trend will continue.
I see your point. Fundamentally, the same multiplications.
However, if we look at TF Lite, for example - its internal operators were tuned for mobile devices, its new model file format is much more compact, and does not need to be parsed before usage. My point is - the hardware requirements aren't growing; instead, the frameworks are getting optimized to use less power.
I wish this was the case. 5 years ago I could train the most advanced, largest DL models in reasonable time (few weeks) on my 4 GPU workstation. Today something like GPT-2 would probably take years to train on 4 GPUs, despite the fact that GPUs I have now are 10 times faster than GPUs I had 5 years ago.
This seems targeted for training, not inference. It definitely seems to me compute need is growing for training. (Is TF Lite even relevant at all for training?)
The way we do things (in AI) today - we may be doing things completely different tomorrow. It's not like there's a standard everyone has agreed on.
There is a very real risk these specialized, expensive devices will go the way of the Bitcoin ASIC miner (which saturated secondary markets at a fraction of its original cost).
Source: I do ML consulting and build AI hardware.