On Agent-Native Research Artifact, or ARA

Converting a PDF Paper into an Agent-native Research Artifact (ARA) enables an agent to access research objects that it can inspect, execute, verify, and connect.

A conventional research paper is distributed as a PDF which presents research in a linear format designed for human reading. It includes motivation, related work, methods, results, figures, tables, and conclusions.

An Agent-Native Research Artifact, or ARA, treats the paper as a structured research object. The ARA includes the claims, code, data references, experiment settings, outputs, and research history needed for an AI agent to inspect, reproduce, and extend the work. Converting a PDF paper into an ARA turns a document into an executable and inspectable research package. An ARA should support a direct question: Can the main result be reproduced from this artifact?

A PDF might contain a sentence such as: We trained model X and obtained better results on benchmark Y.

An ARA would represent that claim with supporting structure:

claim_id: C1 claim: "Model X improves performance on benchmark Y." 
metric: "accuracy" 
baseline: "Model B"
effect_size: "+4.2 percentage points"
evidence: - run_042 - table_2 - figure_4
status: "supported" 

A PDF paper compresses the research process into a narrative. It often excludes failed experiments, discarded hypotheses, intermediate configurations, debugging decisions, data-cleaning steps, and exact runtime details. This compression works for publication, but it creates problems for agents. An agent cannot reliably reconstruct the research process from prose alone. It has to infer missing details, search for unstated assumptions, and guess how tables and figures were produced.

The ARA framing identifies two costs.

The first is the Storytelling Tax. Research is not usually linear, but the paper presents it as if it were. The path from question to result often involves branches, failures, reversals, and partial results. A PDF removes most of that structure.

The second is the Engineering Tax. A description that is sufficient for a reviewer may not be sufficient for reproduction. Phrases such as “standard settings,” “following prior work,” or “we use the default implementation” leave out details that an agent needs.

ARA addresses these costs by making the missing structure explicit. An ARA can be understood as four layers.

The first layer is scientific logic. This contains the claims, assumptions, definitions, hypotheses, and relationships among them.

The second layer is executable code and specifications. This contains scripts, commands, configurations, environment files, dataset versions, model checkpoints, and test procedures.

The third layer is an exploration graph. This records the research path, including failed attempts, partial results, alternative branches, and decisions.

The fourth layer is evidence grounding. This links each claim to the experiments, logs, outputs, tables, and figures that support it.

A simple ARA directory could look like this:

research-project/ paper.pdf paper.tex 
src/
notebooks/
data/
configs/
results/
logs/
ara/ ara.yaml claims.yaml assumptions.yaml evidence.yaml experiments.yaml exploration_graph.json reproduction_report.md

A normal paper often embeds claims inside paragraphs. An agent must identify which sentences are central, which are supporting, and which are background.

ARA makes claims explicit.

Instead of this: Our method improves robustness under distribution shift.

An ARA records this: claim_id: C3 claim: "Method A improves robustness on Dataset D under Shift S." metric: "accuracy" baseline: "Model B" effect_size: "+4.2 percentage points" evidence: - run_042 - table_2 - figure_4 status: "supported"

This gives the agent a unit of analysis. It can ask whether the claim has evidence, whether the metric is defined, whether the baseline is valid, and whether the result can be reproduced.

Exact experiment recipes

A paper may say that experiments were run using standard settings. That is not enough for an agent.

An ARA records the experiment as a reproducible recipe: experiment_id: E7 purpose: "reproduce Table 2" command: "python train.py --config configs/table2.yaml" dataset_version: "dataset-v2-2026-03-01" seed: 17 hardware: "8xA100-80GB" expected_output: "results/table2.json"

This gives the agent the command to run, the configuration to use, the dataset version to expect, the seed, the hardware assumptions, and the expected output.

The agent does not have to infer the procedure from prose. It can execute or inspect the procedure directly.

Evidence links

In a PDF, a claim may refer to a table or figure. That table or figure is usually a processed summary.

ARA records the chain behind the summary: claim → experiment → command → config → raw log → processed result → table or figure

This chain allows an agent to verify whether the claim is supported by the underlying evidence.

For example: evidence_id: EV12 supports_claim: C3 experiment: E7 raw_log: "logs/run_042.log" processed_result: "results/table2.json" rendered_output: "paper/table_2.tex"

The evidence link is important because it turns the paper from a static assertion into an inspectable object. The agent can trace a result back to its source.

Exploration graph

The exploration graph records the research path.

A PDF usually shows the final path. An ARA also records paths that did not become part of the final paper.

Hypothesis H1 
├── Experiment E1: failed; learning rate unstable
├── Experiment E2: failed; data leakage found
├── Experiment E3: partial; worked only on small model
└── Experiment E4: success; used in final paper

This is useful for an agent that tries to extend the work. It can see which branches were tested, which failures were encountered, and which decisions led to the final method.

The exploration graph also reduces repeated work. An agent does not need to retry an abandoned path without knowing that it was already tested.

Research papers define terms in prose. ARA makes central terms addressable.

concept_id: K1 concept: "agent-native research artifact" 

definition: "A structured research package intended for agent inspection, execution,

verification, and extension." related_claims: - C1 - C2 used_in_experiments: - E1 - E4

This allows an agent to track how a concept is used across the artifact. It also helps avoid ambiguity when a term appears in different sections.

Leave a comment