It's become increasingly obvious that the basic structure of a NN for most NLP tasks is the same across multiple tasks.
And people have had good results using a secondary task as an auxiliary loss function.
But trying to formally measure results across this many tasks with the same network is pretty new. The recent OpenAI work heads this way too: https://blog.openai.com/language-unsupervised/
It's become increasingly obvious that the basic structure of a NN for most NLP tasks is the same across multiple tasks.
And people have had good results using a secondary task as an auxiliary loss function.
But trying to formally measure results across this many tasks with the same network is pretty new. The recent OpenAI work heads this way too: https://blog.openai.com/language-unsupervised/