Out of curiosity, with a regular normal distribution I wonder what the probabili...

IanCal · on July 31, 2015

I got interested in this so whipped up a notebook for my interpretation of the rules against random sequences of different lengths. Failure rates are the number of sequences that contained an error, not the number of errors in all the sequences (so a sequence that had 3 errors counts as one failure).

A couple of rule descriptions were ambiguous, rules 5 & 6 (at least to me).

I'll upload a PDF output when I get latex installed again (downloading the html file is probably the easiest way of quickly seeing the output)

Edit - updated.

PDF:

http://files.figshare.com/2196604/Analysis_of_the_false_posi...

PDF/HTML/notebook

http://dx.doi.org/10.6084/m9.figshare.1499204

IanCal · on July 31, 2015

Damn, missed the edit window. Fixed rule #5

http://files.figshare.com/2196656/Analysis_of_the_false_posi...

Same DOI, new version.

jacquesm · on July 31, 2015

I'm probably missing something obvious by why random sequences when the Nelson rules appear to be aimed at measurements of real world properties?

IanCal · on July 31, 2015

Well the assumption here is that we've got something we're measuring with a steady mean and either inherently noisy variation or some measurement error on top.

Lots of real-world samples follow the normal distribution, and anything that does should look roughly like that sim.

So there's no real need to use random numbers, but it's a very quick way of me getting data that looks like real data and I know I've got the standard deviation & mean correct and that there should be no anomalies.

My sim can only show one side of the story though, it can't show how often real issues are picked up. For that, we'd probably want to look at real-world data and investigate each reported issue to see what proportion are important (and then possibly try and see how many were missed).

Bartweiss · on July 31, 2015

A small random variance or a bunch of uncorrelated errors ought to produce something very similar to a normal distribution, which we can model with random generation.

The Nelson rules are basically an attempt to determine whether relatively 'healthy' data is actually coming from some specific forcing events (e.g. oscillation instead of random variance), so it looks for breaks with normal distributions.

mathattack · on July 31, 2015

A lot of this is the context... You're not trying to prove something is mathematically perfect. You're trying to figure out if the variation is common cause [0] or a special cause. It's ok if you get this wrong from time to time because it's decision support, and it's better to act on imperfect information than to wait too long for perfection.

[0] https://en.wikipedia.org/wiki/Common_cause_and_special_cause... Common causes are addressed by fixing the system as a whole, while special causes are addressed by fixing one-offs. For example if you are addressing variations in delivery times, you address small delivery variance by improving the maps, which help every delivery. You address the one-off variation by firing the guy who takes 3 hour lunches mid-delivery every few weeks.

tinco · on July 31, 2015

Can I reverse your numbers and say that Rule 1 has a 99.7% confidence that a datapoint is out of control? That seems a little on the low side, 0.3% means every 1000 data points, which might be every hour and a half the system would generate such an outlier. That's too many for my sysadmins to generate an alert on anyway.

When do you generate an alert? I'd say a false positive once a month would be acceptable. That'd be around 6-sigma confidence if you measure every 5 seconds.

eterm · on July 31, 2015

You're missing a crucial parameter, which is how often you think your system goes out of control.

You want a high Pr(problem|alert) but also want a high Pr(alert|problem), the trade-off you choose between them depends on how often you expect problems to occur. If they are rare then you want false positives to be rare. If problems are frequent then false positives can happen more often without affecting Pr(problem|alert) so much.

Bartweiss · on July 31, 2015

Thanks for this. Rule 5 immediately stood out to me as a high error source, and I came to the comments hoping someone had done the actual math.

My guess is that positives from these rules would be logged rather than immediately reacted to. If over 1000 detections you break each rule at about the expected rate, all is well. If you break a few rules well outside of expectation, it raises some questions.

hebdo · on July 31, 2015

By error source you mean false positive source? Because according to the analysis doc the rules 1 and 6 have way higher false positive rates.

tarblog · on Aug 3, 2015

Small communities become smaller when certain people share similar interests. -Tarblog

data_spy · on July 31, 2015

At my work in ad tech, we see about 1.5% instead of 2.5% for our publishers

marcosdumay · on July 31, 2015

I guess that's the idea. If you are manually sampling a process, or looking into a graph manually created by somebody else, you shouldn't see two rules broken on the same day.

Of course, nowadays we can get so much data that we can create a procedure where an event with 10^-3 chance is seen several times a day.

haddr · on July 31, 2015

How big is your simulation test set? For small number of elements, that outstanding Rule 5 result could be just a coincidence... ;)

(I'm taking some rigorous approach here. By intuition it seems to me that Rule 5 should generate more false-positives than other rules)

crpatino · on July 31, 2015

This is not about simulation tests, but more likely about industrial processes.

If you are already paying $60K per year to your QA technicians, it does not matter if the rule fires once per hour or once per week. You want those guys to investigate every incident. Most of the times there will be false positives but every other month QA will catch enough minor defects to pay for itself. The real benefit though, is to catch the bigger multi-million-dollar-issues that come every other year.