Back to brain

When SSL Stops Working: Characteristics, Not Pixels

date: 2026-02-06


0. Context

This iteration was a stress test of a simple thesis:

“SSL + fine-tuning should generalize across segmentation problems.”

It did for fire segmentation.

It did not for this new segmentation task (IoU plateau ~0.37–0.38 despite many interventions).

This post documents what we changed, what we observed, and what it implies for the next model iteration.


1. What the whiteboard formalizes

Let X be the input (image). The decision is not triggered by the raw input directly, but by latent characteristics inside it.

F(X)Decision(f(X))F(X) \approx Decision(f(X))

The key realization is: X itself does not imply the problem.

Rather:

C(X)PrC(X) \Rightarrow P_r

So the model must be decomposed into two functional roles:

It decodes/interprets those characteristics into the task output (mask, label, boxes, etc.).

So the minimal abstraction is:

XcC^(X)fY^X \xrightarrow{c} \hat{C}(X) \xrightarrow{f} \hat{Y}

This iteration’s lesson: different problems require extracting different kinds of characteristics.


2. The characteristic taxonomy (why SSL “works” sometimes)

Not all characteristics are equal. At least three families show up:

  1. Geometric (edges, shapes, spatial continuity)
  2. Morphological / textural (local patterns, gradients, blobs)
  3. Semantic / contextual (what the object is, not just how it looks locally)

Most SSL pretraining (especially vision-only, augmentation-driven) is disproportionately strong on (1) + (2).

Fire segmentation is dominated by (1)+(2): high-contrast, consistent textures, strong local cues.

This harder task leaks into (3): boundaries are ambiguous, cues are contextual, and “what counts” can depend on scene semantics.

So the failure is not “segmentation is hard”.

The failure is: the needed characteristics are not primarily geometric/morphological anymore.


3. What we actually did in this run (engineering log)

3.1 Pipeline

3.2 Class imbalance control

3.3 Loss shaping

We trained with a compound objective:

3.4 Sampling / optimization tricks


4. The observed behavior

4.1 IoU ceiling (the main fact)

Across many epochs, IoU oscillates and improves slightly, but does not break out:

This is visible directly in logs like:

So it’s not stable convergence to a better representation — it’s threshold/coverage instability.

4.2 Why this is not “just train longer”

Training longer helps when:

Here, the behavior is different:


5. “Is it scale?” — How we can tell

When I say “scale”, I mean one of these (often multiple):

  1. Data scale / diversity: not enough examples of the rare boundary cases
  2. Label scale / quality: mask ambiguity or noisy annotation caps IoU
  3. Model scale / capacity: backbone can’t represent the needed characteristic family
  4. Pretraining scale / modality: SSL didn’t include the semantic signals required

What would count as evidence it’s scale (and not a bug)

You’re looking for this pattern:

That’s exactly what happened.

So: yes, the ceiling looks structural.


6. Why fire segmentation was easy (scientific, concise)

Fire segmentation succeeds with the vanilla SSL→FT pipeline because the task is dominated by low-level, local, and consistent cues:

This new task appears to require higher-order / semantic disambiguation:

So the difference is not “segmentation vs segmentation”.

It’s which characteristic family determines the label.


7. The implication for the next iteration

7.1 The new backbone question

How do we build a backbone that extracts all characteristic types?

A single SSL-trained vision backbone is not guaranteed to encode semantic/contextual characteristics.

So the next iteration likely needs one of:

7.2 The stable abstraction

Keep the two-piece model:

But stop assuming that SSL alone defines cc for every domain.


8. Takeaway

This iteration produced a clean conclusion:

SSL is great at extracting geometric/morphological characteristics, but some P¬P problems are governed by semantic/contextual characteristics — and those require either scale, modality, or architecture changes in the extractor.

Fire segmentation was a geometry problem.

This one is drifting into semantics.

So the next step is not “more loss hacks”.

It’s upgrading the characteristic extractor.