Out of curiosity, with a regular normal distribution I wonder what the probability is that the most recent point finds a problem. I guess you could calculate these things separately for an approximation, but I'd probably just want to simulate it...
For a first cut:
- Rule 1: 0.3% of samples are more than 3 standard deviations of the mean.
- Rule 2: 1/2^8 = 0.4% chance the previous 8 points were on the same side of the mean as the most recent one.
- Rule 5: 2.5% chance of being above 2sd on either side, 3 choose 2 is 3, 2 sides, so 0.375% of exactly 2. "2 or 3" is not much higher.
- Rule 6: More than 0.55%, if I've done my maths right.
- Rule 7: 0.3%
I guess you're going to get a lot of false positives if you're sampling reasonably frequently -- maybe one in 50?
I got interested in this so whipped up a notebook for my interpretation of the rules against random sequences of different lengths. Failure rates are the number of sequences that contained an error, not the number of errors in all the sequences (so a sequence that had 3 errors counts as one failure).
A couple of rule descriptions were ambiguous, rules 5 & 6 (at least to me).
I'll upload a PDF output when I get latex installed again (downloading the html file is probably the easiest way of quickly seeing the output)
Well the assumption here is that we've got something we're measuring with a steady mean and either inherently noisy variation or some measurement error on top.
Lots of real-world samples follow the normal distribution, and anything that does should look roughly like that sim.
So there's no real need to use random numbers, but it's a very quick way of me getting data that looks like real data and I know I've got the standard deviation & mean correct and that there should be no anomalies.
My sim can only show one side of the story though, it can't show how often real issues are picked up. For that, we'd probably want to look at real-world data and investigate each reported issue to see what proportion are important (and then possibly try and see how many were missed).
A small random variance or a bunch of uncorrelated errors ought to produce something very similar to a normal distribution, which we can model with random generation.
The Nelson rules are basically an attempt to determine whether relatively 'healthy' data is actually coming from some specific forcing events (e.g. oscillation instead of random variance), so it looks for breaks with normal distributions.
A lot of this is the context... You're not trying to prove something is mathematically perfect. You're trying to figure out if the variation is common cause [0] or a special cause. It's ok if you get this wrong from time to time because it's decision support, and it's better to act on imperfect information than to wait too long for perfection.
[0] https://en.wikipedia.org/wiki/Common_cause_and_special_cause... Common causes are addressed by fixing the system as a whole, while special causes are addressed by fixing one-offs. For example if you are addressing variations in delivery times, you address small delivery variance by improving the maps, which help every delivery. You address the one-off variation by firing the guy who takes 3 hour lunches mid-delivery every few weeks.
Can I reverse your numbers and say that Rule 1 has a 99.7% confidence that a datapoint is out of control? That seems a little on the low side, 0.3% means every 1000 data points, which might be every hour and a half the system would generate such an outlier. That's too many for my sysadmins to generate an alert on anyway.
When do you generate an alert? I'd say a false positive once a month would be acceptable. That'd be around 6-sigma confidence if you measure every 5 seconds.
You're missing a crucial parameter, which is how often you think your system goes out of control.
You want a high Pr(problem|alert) but also want a high Pr(alert|problem), the trade-off you choose between them depends on how often you expect problems to occur. If they are rare then you want false positives to be rare. If problems are frequent then false positives can happen more often without affecting Pr(problem|alert) so much.
Thanks for this. Rule 5 immediately stood out to me as a high error source, and I came to the comments hoping someone had done the actual math.
My guess is that positives from these rules would be logged rather than immediately reacted to. If over 1000 detections you break each rule at about the expected rate, all is well. If you break a few rules well outside of expectation, it raises some questions.
I guess that's the idea. If you are manually sampling a process, or looking into a graph manually created by somebody else, you shouldn't see two rules broken on the same day.
Of course, nowadays we can get so much data that we can create a procedure where an event with 10^-3 chance is seen several times a day.
This is not about simulation tests, but more likely about industrial processes.
If you are already paying $60K per year to your QA technicians, it does not matter if the rule fires once per hour or once per week. You want those guys to investigate every incident. Most of the times there will be false positives but every other month QA will catch enough minor defects to pay for itself. The real benefit though, is to catch the bigger multi-million-dollar-issues that come every other year.
For a first cut:
- Rule 1: 0.3% of samples are more than 3 standard deviations of the mean.
- Rule 2: 1/2^8 = 0.4% chance the previous 8 points were on the same side of the mean as the most recent one.
- Rule 5: 2.5% chance of being above 2sd on either side, 3 choose 2 is 3, 2 sides, so 0.375% of exactly 2. "2 or 3" is not much higher.
- Rule 6: More than 0.55%, if I've done my maths right.
- Rule 7: 0.3%
I guess you're going to get a lot of false positives if you're sampling reasonably frequently -- maybe one in 50?