Hi, I'm the author. I appreciate the time you've taken to read and provide const...

nanis · on Feb 11, 2016

The point I am making is simple: You can calculate whatever you want to calculate, but there is no room for statistical testing because you do not have a probability sample, and, no sampling variation.

Yes, there will be future episodes, but you are not claiming that you are predicting what these characters will say in those future episodes (in which case your whole setup is rather inappropriate).

Also, I suggest you think very hard about this statement:

> The log likelihood value of 101.7 is significant far beyond even the 0.01% level, so we can reject the null hypothesis that Cartman and the remaining text are one and the same.

Even if the statistical test you employed were appropriate, this is not the conclusion you draw from it.

Also, are you confusing p = 0.01 with 1% or did you really choose p = 0.00001 as the significance level for your test?

wodenokoto · on Feb 11, 2016

A simple tf-idf would get you similar results without a t-test.

I think that is what parent is implying.