google seemed to make a genuine effort to make a model that is useful rather tha...

bitL · on Nov 10, 2019

BERT is already outdated, but still useful as you need only 1 Titan RTX to retrain its BERT_large model via transfer learning.

turnersr · on Nov 10, 2019

What methods make BERT outdated? Do you have pointers to other options?

bitL · on Nov 10, 2019

e.g. XLNet:

phreeza · on Nov 10, 2019

XLnet is Bert with a bunch of additional training tricks.

bitL · on Nov 10, 2019

BERT is a Transformer with a bunch of additional training tricks. Transformer is self-attention with a bunch of additional training tricks...