Tag: representations

Neurosymbolic reasoning

Neurosymbolic reasoning combines two different kinds of computation. A neural network is a learned function, usually trained from data, that maps inputs into vectors and uses those vectors to predict useful outputs. It is good at pattern recognition, fuzzy matching, language, perception, and guessing promising next steps. A symbolic system is a system that manipulates explicit objects such as rules, formulas, programs, graphs, constraints, proofs, or search states. It is good at exactness: a proof step is valid or invalid; a SAT assignment satisfies a formula or it does not; a type checker accepts a program or rejects it.

A GNN, or graph neural network, is a neural network designed for data represented as graphs. A graph has nodes and edges. In SAT, for example, variables and clauses can be represented as nodes, and edges connect variables to the clauses in which they appear. This makes GNNs relevant because many formal problems are not naturally sequences of words; they are structured objects. Code has ASTs, control-flow graphs, and data-flow graphs. SAT formulas have variable-clause graphs. Theorems have dependency graphs of definitions and lemmas. A GNN can learn which parts of such a structure look important.

The core idea of neurosymbolic reasoning is simple: the neural part proposes, ranks, translates, or guides; the symbolic part represents, executes, constrains, or verifies. The neural system is allowed to be approximate because its output is not trusted directly. The symbolic system supplies the hard boundary between plausible and correct. In code, an LLM may propose a patch, but the compiler, tests, static analyzer, or verifier decide whether the patch is acceptable. In SAT, a neural model may suggest which variable to branch on, but the SAT solver performs exact search and a proof checker verifies the result. In theorem proving, a model may suggest a Lean tactic, but Lean checks whether the proof step is valid.

The reason this combination matters is that pure neural systems and pure symbolic systems have opposite strengths. Neural systems handle ambiguity and large messy inputs, but they can hallucinate or skip conditions. Symbolic systems are exact and compositional, but they often face enormous search spaces and require humans to formalize the problem precisely. Neurosymbolic reasoning is useful when a problem is both messy and exact: messy enough that learned guidance helps, but exact enough that unchecked guesses are dangerous.

Formal methods are a central use case because their hard part is usually not checking a finished proof. The hard part is finding the right specification, invariant, lemma, induction variable, proof tactic, or decomposition. A proof assistant can mechanically verify a proof, but it may not know which theorem to apply next. A SAT solver can prove unsatisfiability, but it may drown in bad branching choices. A verifier can check loop invariants, but someone must often invent those invariants. Neural networks help by searching this space of useful intermediate ideas.

SAT illustrates the division cleanly. For a satisfiable formula, the certificate is an assignment of truth values that makes every clause true. For an unsatisfiable formula, the certificate is a proof of contradiction, often in a format such as a resolution-style proof. A neural network can suggest a promising variable, a likely unsat core, or a useful learned clause. But the final answer must still be checked by a symbolic solver or proof checker. The neural model does not make SAT “true” or “false”; it helps navigate the search.

This does not mean neural SAT solvers are on a direct path to proving P = NP. P = NP would require a uniform polynomial-time method for solving every SAT instance in the worst case. Neural guidance can make many real instances dramatically easier, especially when they contain recurring structure from hardware, software, scheduling, planning, or verification problems. But worst-case SAT includes adversarial formulas designed to defeat heuristics. Better guidance can move the practical frontier without changing the worst-case complexity frontier.

The deeper promise is not that neural networks replace logic. The promise is that they make symbolic reasoning more usable. They can translate informal intent into candidate formal specifications, suggest missing invariants, rank lemmas, choose solver strategies, identify useful graph structure, and repair failed proof attempts. The symbolic system then checks whether these guesses actually satisfy the rules. This gives a division of labor: neural networks provide search intelligence; symbolic systems provide correctness.

There are two meanings of “neurosymbolic” that should be kept separate. The first is external neurosymbolic reasoning, where a neural model calls or guides explicit tools such as SAT solvers, proof assistants, compilers, planners, or databases. This is the practical and trustworthy version because the symbolic tool can reject invalid output. The second is internal symbolic representation, where researchers ask whether neural networks themselves learn vector representations that behave like variables, rules, types, objects, or relations. That is important for interpretability, but it is harder to trust because the “symbols” are implicit and distributed inside activations.

The main risk is not usually that the symbolic checker accepts an invalid proof. A good checker should catch that. The larger risk is proving or checking the wrong thing. A program can be verified against an incomplete specification. A sorter can be proved to return an ordered list while still dropping all input elements. A security function can be proved to require authentication while forgetting tenant isolation. Neurosymbolic systems therefore still depend on good specifications, not just good proof search.

Neural networks are useful for proposing promising moves in huge ambiguous spaces; symbolic systems are useful for exact manipulation and verification; neurosymbolic reasoning connects them by letting neural models guide search while symbolic tools enforce correctness; GNNs are relevant because many formal objects are graphs rather than plain text. The frontier is making formal reasoning scalable by wrapping exact checkers in learned search, translation, and repair loops.

Recent work has used graph neural networks to predict branching orders or guide branching decisions. One 2026 paper studies GNN-predicted initial branching orders for CDCL solvers, while earlier NeuroBack-style work plugged a neural heuristic into Kissat and reported solving more SAT Competition problems than the base solver on SATCOMP-2022 and SATCOMP-2023 sets.

Chronological list of known learned representations (increasing date)

Chronological list of known learned representations that were explicitly identified, named, and evidenced in a paper/post with reproducible analysis.

The representation basis answers “what algebra the model chooses to live in. The circuit answers “how the transformer computes in that algebra.”

First reported (approx)Representation (what it is)Where it shows upCanonical referenceImportance & generality (researcher comment)
1996Sparse / wavelet-like (Gabor-like) receptive-field basesUnsupervised vision models learning efficient codes for natural imagesOlshausen & Field, Nature 1996 (Courses at Washington University)This is one of the earliest clean demonstrations that optimizing a simple objective (sparsity/efficient coding) yields structured bases resembling classical signal representations. It is highly general for natural-image statistics and still conceptually underlies why “edge-like” first-layer features are so universal.
2013 (Jan)Linear semantic substructure in word-vector spaces (directions encode relations; analogies ≈ parallelograms)Word embeddings from neural objectivesMikolov et al. 2013 (word2vec) (arXiv) and Pennington et al. 2014 (GloVe explicitly discusses the analogy geometry) (Stanford NLP)This made “distributed representations” operational: relations become approximately linear operators/directions. Generality is high across corpora and embedding methods, though the reliability of specific analogies varies and is not guaranteed by training.
2013–2014 (Nov → ECCV)Early CNN layers learn oriented edge / color-opponency filters (Gabor-like)Supervised convnets on natural imagesZeiler & Fergus visualization work (arXiv)Important because it empirically tied deep vision features to classical linear-systems intuition: even with end-to-end supervision, the network “chooses” a near-optimal front-end basis for images. Very general across CNN families trained on natural images.
2014 (Oct)Differentiable addressing representations (content- and location-based “attention” over external memory)Memory-augmented networksGraves et al., Neural Turing Machines (arXiv)This is a representation of state and retrieval rather than of sensory input: key/value-like addressing emerges as a learnable interface between computation and storage. Generality is moderate: powerful, but most mainstream models replaced explicit external memory with transformer attention over context.
2015 (Nov)Convolutional algorithmic state representations (Neural GPU learns internal states that generalize addition/multiplication to long lengths)Algorithm learning on sequencesKaiser & Sutskever, Neural GPUs Learn Algorithms (arXiv)This is a landmark for “nets can learn algorithmic latent states,” not just pattern matching. Generality is medium: it works well for certain algorithmic tasks with the right inductive bias, but is not a universal recipe for systematic generalization.
2017 (Oct)Capsule pose-vector representations (entity presence + instantiation parameters; routing groups parts into wholes)Vision architectures emphasizing part–whole structureSabour et al., Dynamic Routing Between Capsules (arXiv)Conceptually important: it proposes a factorized internal code (pose/part structure) rather than “bags of features.” Generality is debated in mainstream practice, but the representational idea is crisp and has influenced later equivariant and compositional approaches.
2018 (Mar)Grid-like spatial codes (grid/border/band-cell-like units)RNNs trained for path integration / navigationCueva & Wei 2018 (arXiv)Very important scientifically: it shows a strong convergence between trained artificial networks and biological coding hypotheses. Generality is high within navigation/path-integration objectives; less directly portable to arbitrary domains.
2018 (Aug)Explicit arithmetic representations via specialized units (linear codes + gated primitive ops)Neural arithmetic modulesTrask et al., NALU (arXiv)This line is important because it cleanly separates “representation of quantity” from “operators on quantities,” targeting extrapolation. Generality is medium: works best when the task truly factors into arithmetic primitives and the architecture is used appropriately.
2020 (Jun)Fourier-feature positional encodings / spectral reparameterizations (map inputs through sinusoidal features to defeat spectral bias)Implicit neural representations; MLPs for signals/scenesTancik et al., Fourier Features… (NeurIPS Papers)Important as a unifying explanation for why plain MLPs underfit high frequencies and how a spectral basis fixes it. Generality is high for continuous regression/INR tasks; it is partly “designed,” but it formalizes the representational need very clearly.
2022 (Sep)Induction-head representations (“copy-from-previous-match” algorithm; pointer-like behavior)Transformers doing in-context learning / pattern completionOlsson et al., In-context Learning and Induction Heads (arXiv)This is one of the most important circuit-level representational discoveries in transformers: it identifies a reusable mechanism that looks like learned algorithmic pointer-chasing. Generality is high across autoregressive transformers and many ICL-like behaviors.
2022 (Sep)Superposition of features (many sparse features packed into fewer dimensions; polysemanticity as a geometric tradeoff)ReLU nets and plausibly large modelsElhage et al., Toy Models of Superposition (arXiv)Foundational for interpretability: it reframes “neurons are messy” as “the representation is compressed and distributed by necessity.” Generality is extremely high—this is an architectural/optimization-level phenomenon, not a task-specific trick.
2023 (Jan)Discrete Fourier Transform (DFT) / trig-identity representation for modular additionSmall transformers that grok modular arithmeticNanda et al., Progress measures for grokking via mechanistic interpretability (arXiv) (plus walkthrough (Neel Nanda))The model represents elements in a Fourier basis where modular addition becomes phase addition/rotation. Importance is high as a proof-of-mechanism (nets rediscover classic algebraic representations). Generality is moderate: strongest for tasks with group structure (cyclic groups, convolutions, periodicity).
2023 (Mar–Sep)Linear “world-state” representations in sequence models (latent state corresponds to board state; controllable by vector arithmetic)Othello-GPT-style modelsNanda’s exposition (Neel Nanda) and the associated paper on emergent linear representations (arXiv)Important because it shows a model trained only to predict tokens can learn an explicit internal state (a “world model”) that is linearly recoverable and causally editable. Generality is promising but not universal; it likely emerges when the task forces consistent latent state tracking.
2023 (Oct)Feature dictionaries / “monosemantic” features via sparse autoencoders (dictionary learning on activations)Mechanistic interpretability for transformersAnthropic’s “Towards Monosemanticity” line (Anthropic)This is less “the model’s native representation” and more “a recovered basis that better matches it,” but it’s crucial: it suggests models are organized around a large set of sparse features even when neurons are polysemantic mixtures. Generality is likely high, and it directly shapes practical interpretability workflows.
2024 (Feb, community analysis)Chess/Othello-like linear world representations (extensions/replications)Board-game GPTs; “world model” probing and interventionsExample community writeup (LessWrong)This is a continuation/expansion of the 2023 world-representation finding. Importance depends on replication rigor, but it is part of the emerging picture that “latent-state tracking” is a common representational strategy in sequence models under the right data/task constraints.

Update: Some more interesting representations

1) Finite-state / automaton-like representations (regular languages)

Transformers trained on formal languages can end up simulating automata, and recent work explicitly extracts finite state machines from trained transformers to characterize what they learned. This is close to “boolean/bitmap logic” in that the latent state is discrete and transitions are rule-like. https://arxiv.org/pdf/2410.06045

2) Stack-like representations for parentheses / Dyck-style tasks

Balanced bracket classification tasks are widely used in mech-interp pedagogy because they pressure the model toward a latent “depth” or stack surrogate. In practice, small transformers often learn a distributed state that tracks nesting structure, sometimes in a way that can be probed linearly.  https://arena-chapter1-transformer-interp.streamlit.app/%5B1.5.1%5D_Balanced_Bracket_Classifier

3) “World-state bitmaps” (board-state as a linear code)

In Othello-GPT-style settings, the residual stream contains a linearly recoverable encoding of the board. This is arguably a learned bitmap-like representation (one direction per square / feature), embedded in a continuous space.  https://www.neelnanda.io/mechanistic-interpretability/othello

4) Group-operation representations beyond modular addition

A closely related line studies how small nets learn group composition more broadly (a “universality” testbed). This generalizes the “DFT for cyclic groups” story into a broader family of algebraic representations and circuits.  https://openreview.net/pdf?id=jCOrkuUpss

5) Boolean satisfiability style reasoning (logical structure)

There is mechanistic-interpretability work on transformer-based models trained to solve 2-SAT, which is a canonical boolean-logic problem. This is a direct example of boolean structure expressed in transformer activations and circuits.  https://arxiv.org/html/2407.13594v1

6) Induction / copy (pointer-style algorithm)

Not boolean algebra per se, but it is a very simple learned algorithmic representation: a head learns to represent and retrieve repeated patterns (“copy what followed last time”). This often coexists with more symbolic-feeling representations in toy tasks.  https://arxiv.org/abs/2312.03002

Learned Representations in Neural Networks


Neural networks transform raw inputs — pixels, text, audio — into internal descriptions built layer by layer through learned weights and nonlinearities. The core mechanism is hierarchical composition: early layers detect local patterns like edges or n-gram features, while deeper layers combine these into abstract structures like object parts, semantic concepts, or reasoning patterns. Rather than relying on hand-engineered features, the network discovers whatever internal geometry best serves its training objective.

Representation spaces are not mere lookup tables; they are high-dimensional manifolds with structure that can be analyzed with the tools of differential geometry and information geometry. The Fisher information metric, for instance, naturally measures distances between probability distributions that a network implicitly encodes, connecting the curvature of representation space to the model’s sensitivity and generalization behavior.

More visibly, semantic relationships in language models manifest as linear directions in activation space, enabling vector arithmetic over meaning. This regularity reflects the network solving a smooth optimization problem in which nearby inputs on the data manifold are mapped to nearby points in representation space.


A critical consequence of this structure is transferability. Representations learned on large datasets tend to capture the intrinsic geometry of the data distribution itself, making them reusable across tasks. This underpins the modern pretrain-and-adapt paradigm: a foundation model distills general representational structure from vast data, and fine-tuning merely redirects it.
Interpretability research has complicated this picture. Networks appear to use superposition, encoding more features than they have dimensions by distributing concepts across overlapping, near-orthogonal directions rather than isolated neurons. This is geometrically efficient — nearly orthogonal vectors in high dimensions allow exponentially many features to coexist — but it makes the representation space harder to read.

a model now requires studying directions, circuits, and geodesics in activation space, not individual units. This is the project of mechanistic interpretability: recovering the internal computational geometry that produces a model’s behavior.

Three frontiers concentrate current research. First, what geometric properties of a representation predict its generalizability — smoothness, dimensionality, curvature of the learned manifold? Second, how do large language models encode causal relations, abstractions, and multi-step reasoning, and does this reflect genuine geometric structure or brittle surface statistics? Third, can training objectives be designed to produce representations that are sparse, disentangled, or causally structured by construction — making the geometry legible from the start rather than reverse-engineered after the fact? This last question connects representation learning directly to AI safety: systems whose internal geometry can be inspected and tested are systems whose behavior can actually be understood.

Examples of these three frontiers.

1) Generalization of representations
The clearest example is CLIP, which learns a joint image-text embedding by aligning representations across modalities. Its learned geometry transfers remarkably to tasks it never saw — zero-shot classification, image retrieval, robotic perception — suggesting it captured something close to the intrinsic manifold of visual concepts rather than task-specific shortcuts. Studying why it transfers (low intrinsic dimensionality? smooth curvature? alignment with human semantic structure?) is an open and active question.
2)Reasoning structure in language models
Anthropic’s “Scaling and evaluating sparse autoencoders” work, along with follow-on mechanistic interpretability research, has found evidence that models trained purely on next-token prediction develop internal representations of entity states, spatial relations, and multi-step dependencies — structures that look suspiciously like world models. The cleaner controlled example is othello-GPT (Nanda et al.), where a transformer trained only on legal move sequences was shown to linearly represent the board state internally, a clean demonstration that reasoning-like geometric structure emerges without explicit supervision.
3) More interpretable representations
β-VAEs are the canonical attempt: penalizing the KL term forces the latent space toward an axis-aligned, disentangled geometry where individual dimensions correspond to independent generative factors. The result is representations where traversing a single latent direction changes exactly one attribute — pose, lighting, shape — leaving others fixed. The limitation is that disentanglement defined this way is coordinate-dependent and doesn’t guarantee causal structure, which has pushed more recent work toward causal representation learning (Schölkopf et al.) as the right geometric target.