Very nicely written post. I particularly like how you attached a link to your codebase on repl.it so anyone who is interested can tinker with the code.
One thing I have been wondering for some time is whether the vanilla RNN can learn negations (i.e. 'not good' == 'bad') and valence shifts (e.g. modifier words like 'very' --- they do not carry sentiment connotations themselves, but may amplify/dampen the sentiment of the words they modify; negations like 'not' can be considered as a special-case valence shifter where it inverts the sentiment of the following word).
My suspicion is that vanilla RNNs are not capable of modelling negations and valence shifters since they make inference on the sentiment of a sentence by 'adding up' the sentiment connotations of its constituent words --- negations and valence shifts, however, works more like multiplications than additions.
I see you already have such examples in your dataset so I thought I'd do some experiments. I simplified your original dataset to the following:
train_data = {
'good': True,
'bad': False,
'not good': False,
'not bad': True,
'very good': True,
'very bad': False,
'not very good': False,
'not very bad': True
}
test_data = {
'very not bad': True,
'very not good': False
}
While the test cases do not reflect how people actually speak, the hope is that the model should be able to apply its learning to infer their sentiment. For me, however, it would seem the training failed to converge with the default parameter settings (hidden_size=64).
It would be interesting to see how other RNN architectures (e.g. LSTM, Transformers) fare with negations and valence shifters.
P.S.: When calculating softmax, it is better to use the built-in functions or at least do the log-sum-exp trick to prevent under-flowing.
One thing I have been wondering for some time is whether the vanilla RNN can learn negations (i.e. 'not good' == 'bad') and valence shifts (e.g. modifier words like 'very' --- they do not carry sentiment connotations themselves, but may amplify/dampen the sentiment of the words they modify; negations like 'not' can be considered as a special-case valence shifter where it inverts the sentiment of the following word).
My suspicion is that vanilla RNNs are not capable of modelling negations and valence shifters since they make inference on the sentiment of a sentence by 'adding up' the sentiment connotations of its constituent words --- negations and valence shifts, however, works more like multiplications than additions.
I see you already have such examples in your dataset so I thought I'd do some experiments. I simplified your original dataset to the following:
While the test cases do not reflect how people actually speak, the hope is that the model should be able to apply its learning to infer their sentiment. For me, however, it would seem the training failed to converge with the default parameter settings (hidden_size=64).It would be interesting to see how other RNN architectures (e.g. LSTM, Transformers) fare with negations and valence shifters.
P.S.: When calculating softmax, it is better to use the built-in functions or at least do the log-sum-exp trick to prevent under-flowing.