RNNs (LSTM/GRU) tend to have issues with scaling. Attention-based models like Tr...

		bitL on Nov 10, 2019 \| parent \| context \| favorite \| on: Deep learning has a size problem RNNs (LSTM/GRU) tend to have issues with scaling. Attention-based models like Transformer on the other hand scale extremely well.