The Instruments — What computational models actually do, where they stop, and what fills the gap

§ 00 · The Frame

What These Tools Actually Do

The computational tools built over the last decade are extraordinarily capable. They recognize patterns humans miss. They predict outcomes across domains. They generate text, images, and code that pass for human work. Some of them can plan, search, and optimize.

Each of these capabilities is real. None of them is magic. Each one computes a specific mathematical operation on data — and each one has a hard ceiling defined by what that operation can and cannot reach.

That ceiling matters. Not as an academic observation, but because these tools are now operating inside the systems that sustain civilization — financial infrastructure, energy grids, governance mechanisms, public health systems, ecological networks. They are optimizing within those systems. They are predicting on behalf of those systems. They are making decisions that reshape those systems in real time. And in every case, the same question goes unasked: what kind of system is this?

An engineer who deploys a load-bearing structure without specifying its failure modes is negligent. A scientist who publishes a forecast without specifying the mechanism behind it is extrapolating, not explaining. A governing body that regulates a system it has not formally described is performing theater. In each case, what is missing is not more computational power. What is missing is an authored commitment — a human who says this is what I believe this system is and accepts the consequences of being wrong.

This document catalogs computational capabilities honestly: what each class of tool computes, where it stops, and what kind of knowledge fills the gap. The catalog is organized not by technical architecture but by what people think these tools do — recognition, prediction, generation, reasoning, decision, discovery — because that's where the conversation usually starts. But the destination is the question none of them ask, and the cost of leaving it unanswered.

The question isn't whether these instruments are powerful. They are. The question is whether power without specification is safe — in engineering, in science, in governance, in any domain where consequences are real.

§ 01 · Recognition

"It Can See"

The first capability people encounter is pattern recognition — the ability to look at an image, a sound, a dataset, and identify what's there. This is real and useful. But "seeing" is a metaphor. What these tools compute is feature extraction: the conversion of raw data into numerical representations that cluster similar things together.

Convolutional Network

CNN · ResNet · EfficientNet · YOLO

Computes

Hierarchical spatial features from grid data. Slides small filters across an image, building up from edges → textures → parts → objects. The output is a numerical signature of what's in the image.

Ceiling

Identifies what is present. Cannot say why it's there, what it's for, or how it relates to things outside the frame. Classifies a photograph of a factory — cannot tell you what the factory produces, who it serves, or why it was built there.

Vision Transformer

ViT · DINOv2 · I-JEPA

Computes

Relational structure between image patches via attention. Unlike convolutions, these learn which parts of an image are relevant to which other parts, regardless of distance.

Ceiling

Learns correlations between visual elements. Cannot distinguish correlation from mechanism. Sees that smokestacks co-occur with rail lines — cannot assert that one serves the other.

Clustering Algorithm

k-means · DBSCAN · HDBSCAN

Computes

Natural groupings in data based on similarity — which things belong together, without being told categories in advance.

Ceiling

Finds groups. Cannot name them, explain them, or say whether the groupings reflect anything real about the system's structure. The analyst decides what the clusters mean.

The gap: recognition tells you what patterns exist in the data. It cannot tell you what kind of system produced those patterns, or what the patterns mean for the system's behavior.

§ 02 · Prediction

"It Can Predict"

Prediction is the capability that feels most like understanding. If a tool can tell you what will happen next, it must "know" something. But prediction is extrapolation from pattern, not comprehension of mechanism. A barometer predicts rain without understanding atmospheric physics.

Time-Series Forecaster

Prophet · N-BEATS · TimesFM · Chronos

Computes

Extrapolation of temporal patterns — trend, seasonality, regime change — from historical data. Answers: given the past, what is the most likely future?

Ceiling

Predicts if nothing changes. Cannot tell you what would happen if you intervened. Cannot tell you which part of the system to change. Cannot distinguish a stable trend from one approaching a tipping point.

State Space Model

Mamba · S4 · S5

Computes

Long-range dependencies via structured state evolution. Tracks how the hidden state of a sequence develops over very long horizons, with linear-time efficiency.

Ceiling

Captures temporal dependencies without limit on range. Still learns correlation over time, not causation over time. Can track a slow regime shift. Cannot say what caused it.

Regression Estimator

OLS · Ridge · LASSO · GLM

Computes

The best-fit relationship between input variables and a continuous output. Mature, interpretable, transparent — the workhorse of quantitative science.

Ceiling

Estimates relationships that exist in the data. Cannot distinguish a causal relationship from a confounded one. The modeler must decide which variables to include and why.

The gap: prediction tells you what is likely to happen next. It cannot tell you what would happen if you changed the system, because it doesn't know what the system is.

§ 03 · Generation

"It Can Create"

Generation is the capability that most captures public imagination. Tools that write, draw, compose, and code. The outputs can be indistinguishable from human work. But generation is sampling from a learned distribution — producing new instances that are statistically consistent with the training data. That is not the same as designing something for a purpose.

Transformer (Language)

GPT · Claude · Gemini · Granite · LLaMA

Computes

Next-token probability, conditioned on all previous tokens. Compresses the statistical structure of human language into billions of parameters. At sufficient scale, this produces remarkably fluent, contextually appropriate text.

Ceiling

Knows what has been said about a system. Does not know what a system is. Can synthesize, summarize, and draft with extraordinary speed. Cannot make the ontological commitments that a formal specification requires — it doesn't know which claims to stake its credibility on.

Diffusion Model

DDPM · Stable Diffusion · Flow Matching

Computes

Reversal of a noise process — starting from random noise, iteratively refining toward a coherent output that is statistically consistent with the training distribution.

Ceiling

Generates plausible instances. Cannot generate specified instances — artifacts designed to satisfy explicit functional constraints. The difference between "a plausible bridge" and "a bridge that carries this load across this span."

Variational Autoencoder

VAE · β-VAE

Computes

A compressed latent space — learns the minimal dimensions that characterize a distribution, then samples from that space to generate new instances.

Ceiling

Discovers what the dimensions are. Cannot say what they mean. Finds that a dataset has three latent axes. The human must determine that they correspond to, say, energy flow, material cycling, and information processing.

The gap: generation produces new instances that are consistent with observed data. It cannot produce designs that satisfy authored requirements — because requirements are commitments, not patterns.

§ 04 · Reasoning

"It Can Reason"

This is where the conversation usually gets interesting — and where the language gets most misleading. "Reasoning" in the computational sense means decomposing a problem into steps and chaining operations toward a conclusion. That is genuinely useful. It is not the same as understanding why the steps work or whether the conclusion is true of the real system.

Structural Causal Model

Pearl's DAG · do-calculus · SCM

Computes

The consequences of intervention, given an asserted causal graph. Pearl's framework lets you answer: if I change this, what happens to that? But only after you've committed to a causal structure.

Ceiling

The closest any data-side instrument gets to authorship — because the causal graph must be hypothesized, not learned. The tool computes consequences. The human asserts structure. Pearl himself calls this the fundamental distinction.

Graph Neural Network

GCN · GAT · GraphSAGE

Computes

Learned representations over nodes and edges via message passing. Propagates information through a network to learn what each node's neighborhood implies about it.

Ceiling

Learns what the network looks like. Cannot say what the network is for. Detects that two clusters of nodes are densely connected. Cannot assert that they form distinct subsystems serving different functions.

Bayesian Network

BN · Probabilistic Graphical Model

Computes

Conditional probability propagation through a directed graph. Given evidence at some nodes, updates beliefs across the entire network. Rigorous, principled, transparent.

Ceiling

Quantifies uncertainty within a structure. The structure itself must be provided. The network topology is an assertion, not a computation.

Neuro-Symbolic System

NeSy · LNN · DeepProbLog

Computes

A hybrid: pattern recognition plus logical constraint enforcement. Learns from data while respecting formally specified rules — the most ambitious bridge between learning and specification.

Ceiling

Only as good as the rules it's given. The symbolic side requires the same authored commitments as any formal model. The neural side fills in where data is available. The human decides where the boundary between learning and assertion falls.

The gap: reasoning tools chain operations and propagate implications. They operate within a structure. They cannot author the structure itself — that requires an ontological commitment no computation produces.

§ 05 · Decision

"It Can Decide"

Decision-making tools optimize — they search for the action, policy, or configuration that maximizes some objective. This is genuine and powerful. But the objective itself must be specified. And the space within which the tool searches must be defined. Optimization without specification is a powerful engine with no destination.

Reinforcement Learner

PPO · SAC · DQN · RLHF

Computes

A policy — a mapping from states to actions — that maximizes cumulative reward over time. Learns by trial and error within an environment.

Ceiling

Optimizes within a defined environment toward a given reward. Cannot question the environment's boundaries. Cannot question the reward function. Both are authored.

Multi-Agent Reinforcement Learning

MAPPO · QMIX · Independent Learners

Computes

Concurrent policies for multiple interacting agents. Emergent behavior arises from the interaction of individually rational (or boundedly rational) agents.

Ceiling

Produces emergent dynamics. Cannot explain them. Generates a fascinating pattern — the human must determine whether that pattern reflects reality and what it means for the system.

Evolutionary Optimizer

Genetic Algorithm · CMA-ES · MAP-Elites

Computes

A population-based search over a design space — breeds, mutates, and selects candidate solutions across multiple objectives simultaneously.

Ceiling

Searches a space. Cannot define the space. The representation of candidate solutions, the fitness function, and the constraints are all authored. The optimizer explores; the designer specifies what's worth exploring.

The gap: decision tools optimize within authored constraints toward authored objectives. They are engines. The system model is the map and the destination.

§ 06 · Discovery

"It Can Discover"

The most recent and most ambitious claim — that these tools can discover new knowledge. Causal discovery algorithms propose candidate mechanisms. Physics-informed networks learn dynamics that respect conservation laws. World models build internal representations of how environments evolve. These are genuine advances. But each one still operates within a frame that someone had to define.

Causal Discovery Algorithm

PC · FCI · NOTEARS · GES

Computes

Candidate causal graphs from observational data using conditional independence tests. Proposes: these variables might cause those variables, through these pathways.

Ceiling

Proposes candidates. Cannot commit to them. The output is a set of hypotheses, not a specification. The modeler accepts, rejects, or modifies based on domain knowledge the algorithm doesn't have.

Physics-Informed Network

PINN · Neural ODE · Hamiltonian NN

Computes

Learned dynamics constrained by known physical laws. Embeds conservation laws, symmetries, and differential equations directly into the learning process — the most principled fusion of formal knowledge and data.

Ceiling

Only as good as the physics you give it. The conservation laws, the boundary conditions, the governing equations — all authored. The network fills in where the formal model is incomplete. It does not write the formal model.

World Model

Dreamer · IRIS · Genie · Cosmos

Computes

A learned internal representation of environment dynamics. Enables planning by imagining future states — the tool most explicitly aimed at "understanding" how things work.

Ceiling

Learns a compressed approximation of dynamics. The representation is opaque — you cannot inspect it and say "the system has these components connected by these flows." It is a black box that predicts well. It is not a specification.

The gap: discovery tools propose hypotheses and learn approximate dynamics. They do not commit to what a system is. Commitment is what specification means.

§ 07 · The Question None of Them Ask

The Question None of Them Ask

Every instrument in this catalog is powerful. Every one performs a specific, well-defined, mathematically rigorous operation. Together, they constitute the most sophisticated toolkit for pattern extraction, prediction, generation, and optimization ever built.

None of them asks:

What kind of system is this — what are its parts, its boundaries, its flows, its mechanisms — and why does it behave the way it does?

That question is not a pattern recognition question. It is not a prediction question. It is not a generation, reasoning, or optimization question. It is an ontological question — a question about what exists and how it is organized. Answering it requires:

Composition — specifying what entities constitute the system. Environment — specifying what the system is embedded within. Structure — specifying the relations among components. Mechanism — specifying the processes that generate behavior.

These specifications are not inferred from data. They are authored by a human who accepts the consequences of being wrong. That is what makes them specifications and not predictions.

This distinction might sound philosophical. It is not. It is the distinction that separates every discipline that builds things from every discipline that merely describes them. And it is precisely the distinction that is missing from the most consequential systems of our time.

§ 08 · Engineering

Specification Is What Makes Engineering Engineering

An engineer does not predict that a bridge will hold. An engineer specifies a bridge — its materials, its load paths, its tolerances, its failure modes — and signs their name. The specification is a commitment: I assert that this structure, built this way, will carry this load. When the bridge fails, it is the specification that gets interrogated. There is a document. There is a person. There is accountability.

This is not incidental to engineering. It is engineering. The computational instruments in this catalog — finite element analysis, structural optimization, fatigue prediction — are powerful tools that serve the specification. They do not replace it. No amount of simulation excuses the engineer from saying: this is what I believe the system is, and I am responsible for that claim.

The Boeing 737 MAX is the canonical illustration. MCAS — the Maneuvering Characteristics Augmentation System — was an optimization layer added to compensate for an aerodynamic shift introduced by larger engines. The airframe's aerodynamic envelope had changed. The specification of how the aircraft's control system interacted with that new envelope was inadequate. Sensors fed data to an algorithm. The algorithm acted on a model of the aircraft. The model was wrong. 346 people died.

The instruments worked. The sensors read correctly. The algorithm executed as coded. The optimization ran. What failed was the specification — the authored claim about what the system was and how it behaved. That gap was not computational. It was ontological. Someone needed to formally specify the full system — airframe, engines, control surfaces, sensor architecture, software logic, pilot interface — as an integrated whole, and commit to that description. The tools cannot do that. A human must.

Prediction without specification is forecasting. Engineering without specification is negligence. The difference is a human who signs the drawing.

§ 09 · Science

Specification Is What Makes Science Science

Science uses prediction, but prediction is not the product. The product is the model — a mechanistic claim about how a system works. The prediction is the test. When the prediction fails, you don't adjust the curve. You revise the mechanism.

This is the deepest difference between science and extrapolation, and it is routinely obscured. A curve fit through historical data can predict accurately for years. It is not science. It contains no claim about why the pattern holds, and therefore no guidance about when it will stop holding. Science requires someone to say: I assert that this mechanism produces this behavior. The assertion is falsifiable. The curve is merely extendable.

Climate science works precisely this way. General circulation models are not curve fits through temperature records. They are authored specifications of atmospheric mechanism — radiation physics, ocean circulation, carbon cycling, ice-albedo feedback — assembled into a formal model and then tested against observation. When the model diverges from data, the divergence tells you something about the mechanism. It tells you where your understanding is wrong. A curve fit that diverges tells you nothing except that the future is not like the past.

The early COVID-19 modeling landscape demonstrated the cost of the alternative. Competing models with different structural assumptions produced wildly divergent forecasts. Some modeled airborne transmission, some didn't. Some included behavioral feedback loops, some didn't. Some specified hospital capacity constraints, some assumed infinite capacity. The forecasts disagreed — but there was no transparent way to adjudicate between the specifications themselves, because most of the specifications were informal, implicit, or buried in code. The public saw disagreeing numbers. What they needed to see was disagreeing mechanisms — formally stated, openly comparable, and subject to structured critique. That is what formal system specification provides.

Extrapolation without mechanism is curve fitting. Science without mechanism is — as Darwin put it — stamp collecting. The mechanism is a commitment. The prediction is the test of that commitment.

§ 10 · Governance

Specification Is What Makes Governance Possible

Governance specifies what the system is — where authority lives, what flows are permitted, who is accountable when something goes wrong. Optimization finds the best move within a system. These are not the same activity. And when the second proceeds without the first, the result is automated power with no return address.

This is not hypothetical. It is the operational reality of March 2026.

The EU AI Act requires organizations deploying high-risk systems to define system boundaries, specify accountability chains, classify risks, and document the mechanisms by which the system affects people. Every compliance team working on implementation is discovering the same problem: they are trying to govern systems that no one has formally described. The regulation demands specification. The tools available produce prediction. The gap between these two is not a policy challenge. It is a structural impossibility — you cannot govern what you cannot describe.

The US executive orders on AI safety face the same structural problem. They mandate risk assessment, bias auditing, and transparency reporting for systems deployed across federal agencies. Each mandate implicitly assumes that someone, somewhere, has formally specified what the system is — its components, its data flows, its decision boundaries, its failure modes. In practice, that specification rarely exists. What exists is a trained model, a deployment pipeline, and a performance dashboard. The model predicts. The dashboard monitors. No one has committed to a structural claim about what the system is and how it works. When it fails — and complex systems always eventually do — there is no specification to interrogate. There is only output that stopped being useful, and no human who can explain why.

Cryptoeconomic systems illustrate the same gap from the design side. A staking mechanism, a governance token, a DeFi protocol — each is a complex adaptive system with interacting agents, feedback loops, and emergent dynamics. Optimization tools can find the parameter settings that maximize a given metric. But the question that precedes optimization is: what is this system? What are its subsystems? What flows between them? Where are the boundaries? What mechanisms generate the behaviors we observe? Without that specification, optimization is hill-climbing in the dark. You may reach a peak. You cannot know what landscape you are on.

You cannot steer what you have not specified. You cannot hold anyone accountable for a specification no one made. Governance without specification is theater.

§ 11 · The Cost of the Missing Question

The Cost of the Missing Question

In every domain — engineering, science, governance — the same structure holds. The specification creates the possibility of accountability. The engineer's drawing. The scientist's mechanism. The governor's charter. Each is an authored commitment: this is what I believe the system is, and I accept what follows from being wrong.

Without that commitment, there is no artifact to interrogate when things go wrong. There is only output.

The instruments in §01 through §06 of this catalog are powerful precisely because they operate on an authored structure. The forecaster tests the engineer's design. The simulation tests the scientist's mechanism. The optimizer searches the governor's policy space. Without the structure, the instruments spin. They produce numbers, images, forecasts, and recommendations that refer to nothing — that are accountable to no specification and falsifiable by no observation.

The scarce resource in March 2026 is not computation. We have more computational power than any generation in history. The scarce resource is commitment — the willingness to formally say this is what I believe this system is and to accept the consequences of that claim. That willingness is what separates engineering from prediction. It is what separates science from extrapolation. And it is what separates governance from optimization.

Every consequential system in the world — climate, financial, computational, biological, political — is currently being optimized by tools that cannot describe what they are optimizing. That is the cost of the missing question.

This is not an observation that only one community is making. Across multiple domains — independently, and without coordinating — sophisticated practitioners have converged on the same discovery: that authored specification is the load-bearing element in their work, and that no amount of computation replaces it.

In cryptoeconomic systems design, Michael Zargham and BlockScience built cadCAD — a computer-aided design framework grounded in control systems engineering — because they found that you cannot validate a protocol design without first formally specifying its state variables, update mechanisms, and feedback loops. In safety-critical systems engineering, Nancy Leveson at MIT developed STAMP — a systems-theoretic accident model — because she found that catastrophic failures occur not from component breakdowns but from inadequate system-level specification of control structures and safety constraints. In applied mathematics, John Baez and collaborators built AlgebraicJulia — a compositional modeling framework using category theory — because they found that scientific models become opaque and brittle when they are monolithic, and that composability requires formally specifying how subsystem models interface.

Each of these communities is right. Each has independently discovered that the critical gap is not computational but specificationary. And yet none of them has a framework that explains why all three of them are right for the same reason.

There is a further subtlety that all three examples obscure. Zargham simulates. Leveson models control loops. Baez composes dynamical systems. In each case, the specification earns its keep by enabling some downstream computation — a simulation, an analysis, a composition. This makes it easy to conclude that the specification's value is instrumental: it matters because it lets you run the next step.

But consider the domains where simulation is impractical or inconclusive — political economy, institutional design, geopolitical systems, ecological governance. You cannot fabricate a counterfactual nation-state. The feedback loop between policy intervention and observable outcome is long, noisy, and confounded. The actors inside the system change their behavior in response to being modeled. In these domains, simulation may be optional. The specification is not.

The specification still forces the modeler to commit: these are the subsystems, these are the flows, here is where the boundary falls, this is the mechanism I believe generates the behavior we observe. Those commitments discipline thinking. They make disagreements structural rather than rhetorical — two people arguing about a formal specification are arguing about which components exist and how they interact, not about whose narrative is more persuasive. They create a shared artifact that can be critiqued, revised, and falsified even when no simulation is ever run.

This means specification is not downstream of simulation. It is upstream of everything — including the decision about whether to simulate at all. Simulation is one possible use of a formal model. Clarity, communication, accountability, and structured disagreement are others. In the domains where the stakes are highest and the systems are least amenable to controlled experiment — which is to say, in the domains that matter most — those other uses are primary.

The reason is this: every one of them presupposes an answer to a prior question that none of their frameworks formally asks. Before you can specify state update maps, you need to know what the system's components are. Before you can specify control structures, you need to know what the system is embedded in and where authority flows. Before you can compose subsystem models, you need to know what the structural relations between subsystems are. The prior question — in every case — is ontological:

Composition — what entities constitute the system. Environment — what the system is embedded within. Structure — the relations among components. Mechanism — the processes that generate behavior.

That is Bunge's CESM ontology, formalized by Mobus into a rigorous systems science framework. It is the question that precedes Zargham's state variables, Leveson's control hierarchies, and Baez's compositional interfaces. It is the ground floor — and the fact that multiple independent communities have converged on the need for it, without having it, is itself the strongest evidence that the account is overdue.

The instruments are indispensable. They are also subordinate. They serve the specification. They do not replace it. They cannot replace it. That is not a limitation of current technology to be overcome by the next training run. It is a categorical distinction between learning structure from data and asserting structure from theory — a distinction Herbert Simon identified in 1969 and the field has spent fifty years forgetting.

The formal system model — authored, committed, falsifiable — occupies a different epistemic dimension from everything in this catalog. Not better. Not competing. Orthogonal. It answers the question none of these tools can ask. And in a world where the tools grow more powerful by the month, the question grows more urgent at exactly the same rate.

That is the work described in The Fourth Paradigm.

These tools can recognize, predict, generate, reason, decide, and discover. None of them can say what kind of system they're looking at.

What These Tools Actually Do

"It Can See"

"It Can Predict"

"It Can Create"

"It Can Reason"

"It Can Decide"

"It Can Discover"

The Question None of Them Ask

Specification Is What Makes Engineering Engineering

Specification Is What Makes Science Science

Specification Is What Makes Governance Possible

The Cost of the Missing Question

These tools can recognize, predict, generate,
reason, decide, and discover.
None of them can say what kind of system they're looking at.