Back to brain

Refactoring P¬P: Introducing SEN (Self-Evolving Networks)

Note: Y^:=Pr={P,¬P}\hat{Y}:=P_r=\set{P,\neg P}

On a broader note (before the technical one)

I just wanted to explore more the implications that this iteration had, how it connects to the previous blog post, and what I expect for the future of this development. The main thing that I left with intrigue is this line here:

XcC^(X)fY^\mathcal{X} \xrightarrow{c} \hat{C}(X) \xrightarrow{f} \hat{Y}

The P¬P problem can be decomposed into two phases: feature extraction and feature decoding / interpretation. This decomposition implies two things. First, there exists a set of extractable characteristics from X\mathcal{X} that correlate with Y^\hat{Y}; we denote this set as C^\hat{C}, the characteristic category, understood as a collection of characteristic types together with their realizations under cc. Second, there exist characteristic types ctC^c_t \in \hat{C} such that each type admits multiple realizations (i.e., Inst(ct)>1|\mathrm{Inst}(c_t)| > 1). These types may be semantic, geometric, morphological, or otherwise.

If such characteristic types exist, then different realizations of the same type—possibly across modalities—must map to overlapping regions under the decoding or interpretation process. For example, the semantic characteristic corresponding to the concept “person” may be realized both by a word embedding and by visual features extracted from an image of a person, even though their raw representations live in different input spaces.

The reason why I say this is because if that is true, then cc has to be able to represent said overlap in such a way that the feature decoder ff can then map Y^\hat{Y} correctly. But the question is: how?

Well, here’s why I believe SEN must be the way forward.

SSL is fairly good extracting EVERY feature, the weakness is therefore exactly that: it maps everything, notices everything, and thus, due to many neurons working to note foreground / non-usable characteristics, a semantic segmentation or a classification task becomes very challenging or compute-heavy. This is, there must be a way to simply scale the model or prune it based on the needs of the model. The idea of applying SEN is that it cuts out the neurons that don’t correlate with Y^\hat{Y} and thus ff’s job is easier since it doesn’t listen to inactive neurons or noisy neurons.

What was interesting, though, was the fact that in all iterations it reached the computational limit for growth. That is, it pruned at the start (cut out noisy or inactive neurons), and then grew as it needed new capabilities. The system showed clear signs that additional compute and width would be productively used, not wasted. This says that it simply needs more capability until (and here’s the sub-problem kicker) it can extract a sufficiently rich subset of C^\hat{C} such that ciC^,f(ci)    Y^\forall c_i \in \hat{C}, f(c_i) \implies \hat{Y}.

In other words, just like there’s a need to cut out noisy neurons, there must exist cases where it actually needs more neurons to amplify the model’s level of abstraction.

The next step is simple though mathematically complex: amplifying SEN to both ff and cc. If we can do that, a model which entirely evolves, then I think that it would be the biggest step to solving the P¬P problem. Anyways, back to the whiteboard!

IMG_7330.jpg

Technical section of iteration

This iteration focused on restructuring the P¬P segmentation pipeline to remove the assumption of fixed internal capacity. The core change was the integration of SEN (Self-Evolving Networks) into the P¬P backbone, enabling the model to dynamically prune and grow internal neuron groups during training.

Architectural changes

The original P¬P segmentation model used a fixed-width SSL backbone with a segmentation head. While effective, this design imposed a hard representational ceiling: once saturated, further optimization only produced diminishing returns.

The refactor introduced three key components:

1. Gated group structure

Selected internal layers (notably TokenProjection) were partitioned into groups of neurons, each controlled by a learnable scalar gate. Each group can be:

This transforms internal capacity into a controllable, discrete resource rather than a static design choice.

2. GateManager + edit loop

A centralized GateManager tracks:

At fixed edit intervals, the model evaluates whether to:

Edits are triggered by plateau detection on validation metrics, not by step-level gradients.

3. Semantic utility (SEN mode)

A new semantic utility mode was added alongside the original gradient-based utility.

For each gated group, semantic utility is computed as:

u=E[Aforeground]λbgE[Abackground]u = \mathbb{E}[A \mid \text{foreground}] - \lambda_{bg}\,\mathbb{E}[A \mid \text{background}]

where:

This explicitly favors neurons whose activation implies the target predicate PrP_r, not merely those that reduce loss indirectly.

Utility is computed over multiple validation batches, pooled, normalized, and used to rank groups for prune/grow decisions.

Training behavior and observations

Edit dynamics

With SEN enabled:

This behavior was stable and repeatable.

Capacity saturation

Across runs, the model consistently expanded until reaching k_max, after which:

This indicates that, under the current objective and dataset:

In other words, the model was capacity-limited, not optimization-limited.

Compute ceiling

Two constraints became apparent:

  1. Architectural ceiling

    Even with dynamic pruning, the model converged to maximal internal capacity, implying that the representational needs exceeded the original backbone width.

  2. Compute ceiling

    Kaggle-scale compute restricted:

    • input resolution
    • batch size
    • edit horizon
    • depth of gated layers

The system showed clear signs that additional compute and width would be productively used, not wasted.

What this refactor achieved (technically)

Technical takeaway

The experiments show that P¬P segmentation is not bottlenecked by loss design or optimization stability. It is bottlenecked by semantic capacity.

SEN does not merely prune unused neurons; it exposes when the model has exhausted all semantically meaningful capacity. When given the option, the network consistently chose to grow until hitting its structural and compute limits.

This confirms that the correct next step is scaling combined with self-evolution, not static architecture tuning.