Problem Statement
For any arbitrary decision function , there exists a real-valued function
such that
The codomain of is not binary, but ternary:
where U denotes an explicit region of uncertainty rather than misclassification.
We define two scalars such that
The interval constitutes the threshold of uncertainty. Importantly, this region is not a modeling failure but a structural feature of the decision process: it represents inputs for which the evidence contained in is insufficient to justify either or .
The core problem is therefore not the thresholds and , nor the ternary decision rule itself, but the nature of the function .
What exactly is ?
Core Difficulty
Let be a high-dimensional input, and let be determined by the presence of a (possibly unknown) set of latent characteristics
such that
where the implication is causal or at least causally correlated, not merely statistical coincidence.
The challenge is that:
- The characteristics are not explicitly labeled.
- Their relevance is contextual and non-linear.
- Only a sufficient subset of is required for to hold.
Thus, must be a function that:
- Extracts these arbitrary and latent characteristics from ,
- Aggregates them into a scalar measure of evidence,
- And orders inputs such that proximity to the decision boundary reflects genuine epistemic uncertainty rather than noise.
Formally, the problem reduces to constructing or learning a function such that:
where g is unknown, the are implicit, and supervision is partial or weak.
The Central Question
If such an can be learned, then for all where is determined by a real causal structure in the input space, the ternary decision
is no longer an artifact of probabilistic calibration, but a faithful representation of epistemic structure.
The problem, therefore, is not how to classify better—but how to define and learn the scalar ordering such that uncertainty is explicit, meaningful, and minimized in measure rather than hidden behind confidence scores.
The Threshold of Uncertainty
The objective of the framework is not, fundamentally, to distinguish between and . That problem is comparatively trivial: given sufficient capacity, most models can learn a separating boundary.
The real problem lies in the threshold of uncertainty .
This region corresponds to inputs for which the extracted evidence is insufficient, incomplete, or internally inconsistent. Formally, these are cases where the latent characteristics
that causally support are either:
- weakly present,
- only partially extracted,
- or entangled with contradictory features.
As a consequence, the uncertainty region is not primarily a function of poorly chosen thresholds, but of the quality of the representation learned by .
Reducing the threshold of uncertainty therefore means improving the model’s ability to:
- Extract a richer and more faithful set of latent characteristics from ,
- Disentangle these characteristics so that their contribution to is explicit and stable,
- Aggregate them coherently, such that the scalar ordering induced by reflects genuine epistemic confidence.
In this sense, uncertainty is not noise to be eliminated, but a signal indicating insufficient characteristic extraction.
Thus, progress is measured not by sharper decision boundaries between and , but by the contraction of the uncertainty band as representation quality improves:
The threshold of uncertainty is therefore a diagnostic of understanding.
As the model learns to identify and organize the true causal characteristics implicit in , fewer inputs remain genuinely undecidable.