Chronological list of known learned representations (increasing date)

Chronological list of known learned representations that were explicitly identified, named, and evidenced in a paper/post with reproducible analysis.

The representation basis answers “what algebra the model chooses to live in. The circuit answers “how the transformer computes in that algebra.”

First reported (approx)	Representation (what it is)	Where it shows up	Canonical reference	Importance & generality (researcher comment)
1996	Sparse / wavelet-like (Gabor-like) receptive-field bases	Unsupervised vision models learning efficient codes for natural images	Olshausen & Field, Nature 1996 (Courses at Washington University)	This is one of the earliest clean demonstrations that optimizing a simple objective (sparsity/efficient coding) yields structured bases resembling classical signal representations. It is highly general for natural-image statistics and still conceptually underlies why “edge-like” first-layer features are so universal.
2013 (Jan)	Linear semantic substructure in word-vector spaces (directions encode relations; analogies ≈ parallelograms)	Word embeddings from neural objectives	Mikolov et al. 2013 (word2vec) (arXiv) and Pennington et al. 2014 (GloVe explicitly discusses the analogy geometry) (Stanford NLP)	This made “distributed representations” operational: relations become approximately linear operators/directions. Generality is high across corpora and embedding methods, though the reliability of specific analogies varies and is not guaranteed by training.
2013–2014 (Nov → ECCV)	Early CNN layers learn oriented edge / color-opponency filters (Gabor-like)	Supervised convnets on natural images	Zeiler & Fergus visualization work (arXiv)	Important because it empirically tied deep vision features to classical linear-systems intuition: even with end-to-end supervision, the network “chooses” a near-optimal front-end basis for images. Very general across CNN families trained on natural images.
2014 (Oct)	Differentiable addressing representations (content- and location-based “attention” over external memory)	Memory-augmented networks	Graves et al., Neural Turing Machines (arXiv)	This is a representation of state and retrieval rather than of sensory input: key/value-like addressing emerges as a learnable interface between computation and storage. Generality is moderate: powerful, but most mainstream models replaced explicit external memory with transformer attention over context.
2015 (Nov)	Convolutional algorithmic state representations (Neural GPU learns internal states that generalize addition/multiplication to long lengths)	Algorithm learning on sequences	Kaiser & Sutskever, Neural GPUs Learn Algorithms (arXiv)	This is a landmark for “nets can learn algorithmic latent states,” not just pattern matching. Generality is medium: it works well for certain algorithmic tasks with the right inductive bias, but is not a universal recipe for systematic generalization.
2017 (Oct)	Capsule pose-vector representations (entity presence + instantiation parameters; routing groups parts into wholes)	Vision architectures emphasizing part–whole structure	Sabour et al., Dynamic Routing Between Capsules (arXiv)	Conceptually important: it proposes a factorized internal code (pose/part structure) rather than “bags of features.” Generality is debated in mainstream practice, but the representational idea is crisp and has influenced later equivariant and compositional approaches.
2018 (Mar)	Grid-like spatial codes (grid/border/band-cell-like units)	RNNs trained for path integration / navigation	Cueva & Wei 2018 (arXiv)	Very important scientifically: it shows a strong convergence between trained artificial networks and biological coding hypotheses. Generality is high within navigation/path-integration objectives; less directly portable to arbitrary domains.
2018 (Aug)	Explicit arithmetic representations via specialized units (linear codes + gated primitive ops)	Neural arithmetic modules	Trask et al., NALU (arXiv)	This line is important because it cleanly separates “representation of quantity” from “operators on quantities,” targeting extrapolation. Generality is medium: works best when the task truly factors into arithmetic primitives and the architecture is used appropriately.
2020 (Jun)	Fourier-feature positional encodings / spectral reparameterizations (map inputs through sinusoidal features to defeat spectral bias)	Implicit neural representations; MLPs for signals/scenes	Tancik et al., Fourier Features… (NeurIPS Papers)	Important as a unifying explanation for why plain MLPs underfit high frequencies and how a spectral basis fixes it. Generality is high for continuous regression/INR tasks; it is partly “designed,” but it formalizes the representational need very clearly.
2022 (Sep)	Induction-head representations (“copy-from-previous-match” algorithm; pointer-like behavior)	Transformers doing in-context learning / pattern completion	Olsson et al., In-context Learning and Induction Heads (arXiv)	This is one of the most important circuit-level representational discoveries in transformers: it identifies a reusable mechanism that looks like learned algorithmic pointer-chasing. Generality is high across autoregressive transformers and many ICL-like behaviors.
2022 (Sep)	Superposition of features (many sparse features packed into fewer dimensions; polysemanticity as a geometric tradeoff)	ReLU nets and plausibly large models	Elhage et al., Toy Models of Superposition (arXiv)	Foundational for interpretability: it reframes “neurons are messy” as “the representation is compressed and distributed by necessity.” Generality is extremely high—this is an architectural/optimization-level phenomenon, not a task-specific trick.
2023 (Jan)	Discrete Fourier Transform (DFT) / trig-identity representation for modular addition	Small transformers that grok modular arithmetic	Nanda et al., Progress measures for grokking via mechanistic interpretability (arXiv) (plus walkthrough (Neel Nanda))	The model represents elements in a Fourier basis where modular addition becomes phase addition/rotation. Importance is high as a proof-of-mechanism (nets rediscover classic algebraic representations). Generality is moderate: strongest for tasks with group structure (cyclic groups, convolutions, periodicity).
2023 (Mar–Sep)	Linear “world-state” representations in sequence models (latent state corresponds to board state; controllable by vector arithmetic)	Othello-GPT-style models	Nanda’s exposition (Neel Nanda) and the associated paper on emergent linear representations (arXiv)	Important because it shows a model trained only to predict tokens can learn an explicit internal state (a “world model”) that is linearly recoverable and causally editable. Generality is promising but not universal; it likely emerges when the task forces consistent latent state tracking.
2023 (Oct)	Feature dictionaries / “monosemantic” features via sparse autoencoders (dictionary learning on activations)	Mechanistic interpretability for transformers	Anthropic’s “Towards Monosemanticity” line (Anthropic)	This is less “the model’s native representation” and more “a recovered basis that better matches it,” but it’s crucial: it suggests models are organized around a large set of sparse features even when neurons are polysemantic mixtures. Generality is likely high, and it directly shapes practical interpretability workflows.
2024 (Feb, community analysis)	Chess/Othello-like linear world representations (extensions/replications)	Board-game GPTs; “world model” probing and interventions	Example community writeup (LessWrong)	This is a continuation/expansion of the 2023 world-representation finding. Importance depends on replication rigor, but it is part of the emerging picture that “latent-state tracking” is a common representational strategy in sequence models under the right data/task constraints.

Update: Some more interesting representations

1) Finite-state / automaton-like representations (regular languages)

Transformers trained on formal languages can end up simulating automata, and recent work explicitly extracts finite state machines from trained transformers to characterize what they learned. This is close to “boolean/bitmap logic” in that the latent state is discrete and transitions are rule-like. https://arxiv.org/pdf/2410.06045

2) Stack-like representations for parentheses / Dyck-style tasks

Balanced bracket classification tasks are widely used in mech-interp pedagogy because they pressure the model toward a latent “depth” or stack surrogate. In practice, small transformers often learn a distributed state that tracks nesting structure, sometimes in a way that can be probed linearly. https://arena-chapter1-transformer-interp.streamlit.app/%5B1.5.1%5D_Balanced_Bracket_Classifier

3) “World-state bitmaps” (board-state as a linear code)

In Othello-GPT-style settings, the residual stream contains a linearly recoverable encoding of the board. This is arguably a learned bitmap-like representation (one direction per square / feature), embedded in a continuous space. https://www.neelnanda.io/mechanistic-interpretability/othello

4) Group-operation representations beyond modular addition

A closely related line studies how small nets learn group composition more broadly (a “universality” testbed). This generalizes the “DFT for cyclic groups” story into a broader family of algebraic representations and circuits. https://openreview.net/pdf?id=jCOrkuUpss

5) Boolean satisfiability style reasoning (logical structure)

There is mechanistic-interpretability work on transformer-based models trained to solve 2-SAT, which is a canonical boolean-logic problem. This is a direct example of boolean structure expressed in transformer activations and circuits. https://arxiv.org/html/2407.13594v1

6) Induction / copy (pointer-style algorithm)

Not boolean algebra per se, but it is a very simple learned algorithmic representation: a head learns to represent and retrieve repeated patterns (“copy what followed last time”). This often coexists with more symbolic-feeling representations in toy tasks. https://arxiv.org/abs/2312.03002

Secure Machinery

On the evolution of security and intelligent machinery

Chronological list of known learned representations (increasing date)

Leave a comment Cancel reply

Share this:

Leave a comment Cancel reply