Predicting Failure Almost Never Pays

Predictive maintenance doesn't fail because the sensors are immature, but because of a chain of economic conditions that rarely all line up at once. It's a problem of decision theory, not of technology.

SCIENCE & TECHNOLOGY

Alessandro

6/21/20264 min read

The promise of Industry 4.0 is seductive in its simplicity: cover the plant in sensors, feed the vibrations and temperatures to a model, and the algorithm will warn you before the bearing gives way. No more surprise stoppages. Maintenance at exactly the right moment, not an hour early nor an hour late.

Then you look at the books of the people who have actually installed those programs, and the picture turns murky: many deliver less than they promise, and not because the sensors don't work. They work beautifully. The point is that prediction is sold as a problem of detection, when it is almost always a problem of decision. And the two obey different laws.

The intuition that misleads

Let's start with the idea that calendar-based maintenance rests on: things wear out, so the older they are the more likely they are to break. Replace them before they get too old.

It's an intuition that is almost always false. The foundational study by Nowlan and Heap for United Airlines (1978) — the conceptual basis of Reliability-Centred Maintenance — classified failures into six risk-over-time profiles and found a result that upended the discipline: only about 11% of components show an age-related wear-out zone, the one in which "older" really does mean "more at risk." The remaining 89% fail in a way essentially independent of age, and the single most common profile — roughly two thirds of the total — is infant mortality: very high risk at the start, which then drops and levels off. Later studies on naval and industrial fleets confirmed the shape.

Statistics says it with a single letter, the Weibull shape parameter β. Only with β greater than 1 is there a rising wear-out that justifies time-based preventive replacement. With β equal to 1 the failure is "memoryless": age contains no information, and any calendar maintenance is wasted. With β less than 1 — infant mortality — replacing on schedule is actually counterproductive, because it resets the clock to zero, that is, to the most dangerous phase. This is where predictive maintenance comes from: no longer "look at the age," but "look at the condition." Inventing the alternative, however, does not automatically make it worthwhile. For it to pay, three gates must open, one after another.

First gate: the warning window

A failure is predictable only if there is an interval between the moment it becomes detectable and the moment the component stops working. John Moubray called it the P-F interval: the distance on the curve between the point where a sensor can pick up the symptom (P) and functional failure (F).

That window must exist, be reasonably consistent, and be long enough to allow a response: ordering the spare, scheduling the stoppage, intervening. Subtract the response time, and what remains is the net usable warning. If the P-F interval is short or erratic — the brittle failure that crashes within hours, with no stable symptom — even the perfect sensor is worthless: you detect it, but there's no time to decide. The first gate is purely physical, and many failure modes leave it shut.

Second gate: the decision that changes

Suppose the warning is there. The question almost no one asks remains: change what? The value of a prediction is bounded by the value of the decision it alters — decision theory formalizes this as the expected value of information. Even an infallible oracle is worth only as much as the choices it lets you make better. If the optimal action is the same whether you know or not, the information is worth zero, and no sensor can change that.

For a cheap component, off the constraint, that drags no secondary damage with it, the rational decision is the same either way: let it run and replace it when it breaks. There, prediction is worth nothing regardless of sensor quality. This is where run-to-failure deserves rehabilitation: it is not negligence, it is a designed choice. RCM lists it among the legitimate strategies, to be adopted deliberately where the consequences are tolerable and no predictive task is effective.

The asymmetry that changes the decision is supplied by the constraint. A stoppage on the bottleneck costs the hourly margin of the entire line; a stoppage on a machine buffered by downstream inventory costs almost nothing. The pragmatic rule is sharp: predict where stopping is expensive, let it break where stopping is cheap. The sensor goes on the constraint, not everywhere.

Third gate: the rare numbers

The subtlest obstacle remains, and it is pure arithmetic. The failures that matter are rare. And on rare events even an excellent detector produces mostly false alarms.

It's Bayes' theorem applied to diagnostics: with a low base rate, the positive predictive value — the share of alarms that correspond to a real failure — collapses, however accurate the model. A classifier boasting 95% accuracy on a balanced test set can prove useless on an event that happens once in five hundred: the number that matters is not accuracy on the test bench, but precision at the real rate. The "balanced set" lies.

The consequences are two, and both corrosive. The first is the erosion of trust: after a few empty alarms, the team learns to ignore them — the boy who cried wolf. The second is crueler. Every false alarm acted upon is an unnecessary maintenance; and every intervention on a working system returns it to its infancy, that is, to the infant-mortality zone we started from. The predictive program, poorly calibrated, ends up manufacturing the very failures it was meant to prevent.

Three gates, one conclusion

A prediction pays only when the three gates are open together: there is an actionable warning window, the decision genuinely changes, and the numbers are frequent enough to make the alarm credible. Detectability, decision, discriminability. Predictive maintenance is sold on the first — the sensor, the data, the algorithm — but almost always dies on the second or the third. That is why technically flawless programs return so little: the limit was never where we were looking for it.

The maturity of a maintenance function, then, is not measured by the number of sensors installed, but by the rigor with which it decides where prediction earns its cost — and, above all, where it does not.

There is an elegant inversion at the bottom of all this. The good maintenance engineer sometimes chooses, deliberately, not to know. Predicting everything is not sophistication: it is having failed to do the math. The right question is never "is this failure predictable?", but "what decision would change if I could predict it, and is that change worth its price?". Letting a machine break, chosen with eyes open, is not the sign of a primitive system. It is the sign of a system that has understood the economics of what it does.