Specialized AI chips don't seem like a very good business idea to me. The way we...

sanxiyn · on Aug 19, 2019

Isn't BLAS a standard everyone has agreed on?

Making a matrix multiplication accelerator seems a pretty safe bet to me. I am less sure about sparsity optimization, but I guess it still works for dense matrixes even in the worst case.

p1esk · on Aug 19, 2019

The way we do things in AI today is multiplication of two large matrices. Just like we did it 30 years ago: http://yann.lecun.com/exdb/publis/pdf/lecun-89e.pdf

ivalm · on Aug 19, 2019

Sure, but Cerebras isn't just multiplying two large matrices, they are multiplying two large very sparse matrices, relying on ReLU activation to maintain sparcity in all of the layers. We already have BERT/XLNet/other transformer models move away from ReLU to GELU which do not result in sparse matrices. "Traditional" activations (tanh, sigmoid, softmax) are not sparse either.

p1esk · on Aug 19, 2019

Good point. I think it's a safe bet to focus on dense dot product in hardware for the foreseeable future. However, to their defense:

1. It's not clear that supporting sparse operands in hw would result in significant overhead.

2. DL models are still pretty sparse (I bet even those models with GELU still have lots of very small values that could be safely rounded to zero).

3. Sparsity might have some benefits (e.g. https://arxiv.org/abs/1903.11257).

mv4 · on Aug 19, 2019

While the theoretical innovations have mostly been incremental, there has been a lot of progress in the development of "light" deep learning frameworks - so the tasks that previously required massive GPUs can now run on your phone. And this trend will continue.

p1esk · on Aug 19, 2019

Last I checked all those light frameworks still have to do good old matrix multiplications. What's changed?

mv4 · on Aug 19, 2019

I see your point. Fundamentally, the same multiplications.

However, if we look at TF Lite, for example - its internal operators were tuned for mobile devices, its new model file format is much more compact, and does not need to be parsed before usage. My point is - the hardware requirements aren't growing; instead, the frameworks are getting optimized to use less power.

p1esk · on Aug 19, 2019

I wish this was the case. 5 years ago I could train the most advanced, largest DL models in reasonable time (few weeks) on my 4 GPU workstation. Today something like GPT-2 would probably take years to train on 4 GPUs, despite the fact that GPUs I have now are 10 times faster than GPUs I had 5 years ago.

sanxiyn · on Aug 19, 2019

This seems targeted for training, not inference. It definitely seems to me compute need is growing for training. (Is TF Lite even relevant at all for training?)