Yihong Chen1, Xiangxiang Xu2, Pontus Stenetorp3
Sebastian Riedel3, Luca Franceschi4
ICLR 2026
1 OATML, University of Oxford, UK | 2 University of Rochester, USA 3 AI Centre, University College London, UK | 4 Independent Researcher, Berlin, Germany
We treat transformers as recursive residual networks and apply jet operators to expand their computation.
This yields explicit jet paths plus a remainder, turning one opaque computation into a decomposition we can inspect.
In the linear residual case, the model can be rewritten as an explicit sum over paths.
This makes the core intuition concrete: rather than treating the network as one monolithic function, we isolate pathways that can be inspected individually.
Local readout: jet lens exposes token-level component contributions, with Logit Lens as a special low-order case.
Global readout: jet n-grams extract symbolic n-gram tables directly from model computations.
No curated dataset are required. 0th or 1st order decomposition is enough to support meaningful finetuning effect verification, training dynamic tracing, and potentially broader knowledge quantification.
Jet n-gram mass lets us inspect knowledge shifts directly in model space.
Here, refusal-related mass rises more clearly than toxic mass disappears.
Takeaway. This suggests alignment often looks more like masking harmful continuations than deeply removing the underlying knowledge.
1) Interface for inspecting LLM knowledge structure.
In order to audit LLMs, we need a symbolic interface that allow us to inspect their internal entangled knowledge.
2) Functional decomposition.
LLM analysis should move from heuristic probing toward functional decomposition of computation.
3) Lower-order works.
Even zero-th order pathway readouts, such as n-gram mass, can reveal latent knowledge and test whether alignment changed knowledge or only visible behavior.