This is a good explanation, thank you. >*You will be able to detect very small e...

spywaregorilla · on July 13, 2021

If you're performing a rigorous experiment, then you have a control group, and your null hypothesis is that the experimental group will be the same. In actuality you will find that many null hypotheses hold true and the experimental design has no effect on the outcome, at all. Of course in some fields, like psych, most everything has some effect. But it's absolutely not correct to blame p-values for helping you distinguish between no effect and astonishingly small effects. They're functioning correctly. That is a secondary problem to solve, usually by stating a minimum effect size.

Of far greater problem is False Discovery Rate related things. Where you test 20 different things at once, and by chance identify one of them as significant even though the true effect size is 0. This is another area where increasing your sample size can help avoid problems, but even still you need to acknowledge your tools are imperfect.

bumby · on July 13, 2021

>In actuality you will find that many null hypotheses hold true

I'm assuming you mean within the confines of the experiment, correct? I agree. The tweet author was eluding to the fact that "IRL" the null hypothesis is almost never true at the population level. Meaning if you grab a large enough sample you will detect very, very small differences. (This was her Lucky Charms ~ blood type example in the tweet). I also agree with that. I don't think the two claims are mutually exclusive and the fact they can coexist is (I believe) precisely her point about sample size.

spywaregorilla · on July 13, 2021

I mean there are many real world examples where the impact of A has no effect on B, obviously. The number I'm thinking of, has, truly, no effect on the time since you last blinked. No sample size will change that.

Lucky Charms does probably relate to blood type in some impossibly small way. It makes sense that a biological trait has some relation to dietary consumption. I don't think any sort of non-garbage tier journal with peer review would publish it, but good on p-values for helping us detect them though. Not bad on sample size for making it possible to discern this effect size with a high degree of confidence.

It's worth noting that we also have many other tools to help us. For example you can test, given an expected effect size and a sample size, what the probability is of getting a statistically significant result, or a non-significant result, or a significant result that erroneously goes in the wrong direction. Or what the range of likely true mean effect size is given a significant sample difference.

We want large samples. They enhance confidence in findings. The author's premise seems to be it's better not to know that small rocks exist if we're only looking for big rocks. But fails to mention that the tools to find small rocks also help us identify big rocks with more clarity.