Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Specialized AI chips don't seem like a very good business idea to me.

The way we do things (in AI) today - we may be doing things completely different tomorrow. It's not like there's a standard everyone has agreed on.

There is a very real risk these specialized, expensive devices will go the way of the Bitcoin ASIC miner (which saturated secondary markets at a fraction of its original cost).

Source: I do ML consulting and build AI hardware.



Isn't BLAS a standard everyone has agreed on?

Making a matrix multiplication accelerator seems a pretty safe bet to me. I am less sure about sparsity optimization, but I guess it still works for dense matrixes even in the worst case.


The way we do things in AI today is multiplication of two large matrices. Just like we did it 30 years ago: http://yann.lecun.com/exdb/publis/pdf/lecun-89e.pdf


Sure, but Cerebras isn't just multiplying two large matrices, they are multiplying two large very sparse matrices, relying on ReLU activation to maintain sparcity in all of the layers. We already have BERT/XLNet/other transformer models move away from ReLU to GELU which do not result in sparse matrices. "Traditional" activations (tanh, sigmoid, softmax) are not sparse either.


Good point. I think it's a safe bet to focus on dense dot product in hardware for the foreseeable future. However, to their defense:

1. It's not clear that supporting sparse operands in hw would result in significant overhead.

2. DL models are still pretty sparse (I bet even those models with GELU still have lots of very small values that could be safely rounded to zero).

3. Sparsity might have some benefits (e.g. https://arxiv.org/abs/1903.11257).


While the theoretical innovations have mostly been incremental, there has been a lot of progress in the development of "light" deep learning frameworks - so the tasks that previously required massive GPUs can now run on your phone. And this trend will continue.


Last I checked all those light frameworks still have to do good old matrix multiplications. What's changed?


I see your point. Fundamentally, the same multiplications.

However, if we look at TF Lite, for example - its internal operators were tuned for mobile devices, its new model file format is much more compact, and does not need to be parsed before usage. My point is - the hardware requirements aren't growing; instead, the frameworks are getting optimized to use less power.


I wish this was the case. 5 years ago I could train the most advanced, largest DL models in reasonable time (few weeks) on my 4 GPU workstation. Today something like GPT-2 would probably take years to train on 4 GPUs, despite the fact that GPUs I have now are 10 times faster than GPUs I had 5 years ago.


This seems targeted for training, not inference. It definitely seems to me compute need is growing for training. (Is TF Lite even relevant at all for training?)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: