Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

RNNs (LSTM/GRU) tend to have issues with scaling. Attention-based models like Transformer on the other hand scale extremely well.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: