Imagine testing 1000 potential cancer drugs. Only 100 of them actually work, but you don't know that; you have to do the trial. So you get out some petri dishes and start testing. You look for a 5% chance of false positive (p < 0.05), which is the statistical significance level usually used in medical trials.
Of the 100 real drugs, you detect all of them. Of the 900 fake drugs, 5% of them falsely appear to work. So you have 145 drugs you think work, only two thirds of which actually work. The chance of any individual drug having obtained its positive results by chance is 31%, not 5%.
I have a guide to this here, since it's so common:
Sorry, I wasn't very clear. Usually people quote the p value as the chance the result is a fluke; the p value in CERN's case is p = 1/1,740,000. But that's the chance that the effect would be produced if the Higgs did not exist, which is different.
By analogy in the medical case, p = 0.05. The incorrect interpretation is that this means only 5% of drugs with statistically significant benefits actually achieved these benefits through luck; rather, the right interpretation is that 5% of the nonfunctional drugs somehow appeared to work.
You could also imagine testing 200,000,000 hypotheses which were all completely false. Even if you used CERN's level of statistical significance, you'd still quite likely find one hypothesis which appears to be true, simply by chance. The chance of that hypothesis being false is 100%, despite the significance level of 1 in 1,740,000.
So yes, 31% is exactly the chance that randomness produced the effect in the trial. But people will try to tell you that it's actually 5%, and they're wrong.
This thread started with the claim that "the chance that the results occurred by chance" was different from "the chance that randomness could produce [the result]".
But you're saying that in your example both are 31%. So again, I ask, are we talking about two separate things? And if so, can you give an example where the two things have different values?
In my medical example, "the chance that the results occurred by chance" is 31%. "The chance that randomness could produce [the result]" was only 5%.
For CERN, the chance that randomness could produce this result is 1 in 1.74 million; the chance that the results occurred by chance is larger, but not computable with the information we have,
The guide I linked to above gives a much better explanation than this. I rushed my first post here, and I think I was unclear.
Imagine flipping a perfectly fair coin 100 times. You'd expect to see 50 heads, but you don't always -- it's just an average. Suppose you see 75 heads. What is the chance that you'd see 75 heads with a fair coin? Very very small. The chance that randomness could produce such a result is small.
Now, imagine you test 100 perfectly fair coins. A few of them give more than 75 heads, just by luck. You conclude they're unfair, since the result is unlikely otherwise. The chance that randomness produced the effects you saw is actually 100%, because all the coins are fair.
There's a difference between the question "How likely is this outcome to happen if the coin is fair?" and "Given that this outcome happened, how likely is it that the coin is fair?" Statistical significance addresses the first question, not the second.
I suspect Almaviva is talking about the biases inherent to publication / talking about results.
Consider, by analogy, the event "rolling a six sided die and getting a 6 and announcing that fact to the world".
"What is the probability that random events could produce a result that large?": one in six, per die roll. The question excludes the whole "announce it to the world" filter.
"What is the probability that these results [getting a six and announcing it] occurred by chance, rather than being a signal?": We have no idea. If the person announced "I'm going to roll one die and announce the results, regardless of the outcome", then it's one in six. If they kept rolling dice until they got a six, then the probability is 1. If they rolled 3 dice, then the probability is 91/216.
The point is that the scientific method has all sorts of biases (publication bias, confirmation bias, etc.) and p-values are rarely "probability that the result is wrong".
Standard null hypothesis testing -- including what they're talking about here -- focuses on
P(results | random noise is active)
You're talking about
P(the thing we care about | results)
The former is more like a simple sanity check. If random noise could have produced what you see, you shouldn't take the results too seriously.
The former tells you a little bit about the latter -- which is good, because the latter is what we actually care about. But you can't explicitly compute the latter without making much stronger assumptions like priors and the like. That's why this last step of reasoning is often performed qualitatively.
1. I'm rolling two dice to try and get the highest total. I get two sixes. What is the chance randomness produces this? 1/36, about 0.027. This more than a two sigma result. What is the chance that this is caused by random chance? 2.7%? Nope, 100%.
2. I study the same thing in a million situations in parallel. I take the most extreme result and find that random chance can produce this one time in 1.7 million. It's a five sigma result! What are the chances this result is caused by random chance?