ARP in 10 minutes: one run, five artifacts

This post is a walkthrough of a single ARP run.

The goal is not to impress you with a model response. The goal is to show you the artifacts that make the system operable:

capability contract — NodeType definition
bounded candidate menu — CandidateSet
constraint envelope — budgets + gates
policy decision record — checkpoint result
durable event timeline — replayable evidence

Step 0: Run the stack

Follow the quickstart to run the reference stack locally:

/quickstart

Once it’s running, you’ll have endpoints for:

starting a run,
inspecting run state,
and viewing the run event timeline.

Step 1: Start a run with a root capability

Assume we start a run with a composite capability:

support.resolve_issue

Conceptual input:

{
  "customer_id": "cus_123",
  "message": "I was charged twice for last month. Can you fix it?"
}

Right away, the system creates a Run and a root NodeRun.

Artifact #1: Run record

{
  "run_id": "run_01J...",
  "root_node_run_id": "nr_01J...",
  "status": "running",
  "started_at": "..."
}

Step 2: Decompose into subtasks

The composite capability decomposes the goal into subtasks such as:

“Look up the customer”
“Retrieve billing history”
“Determine if duplicate charge occurred”
“If required, initiate refund”
“Send confirmation message”

This produces structured subtask specs and child NodeRun requests.

Artifact #2: Conceptual decomposition snapshot

{
  "node_run_id": "nr_01J...",
  "subtasks": [
    { "subtask_id": "t1", "goal": "lookup customer profile", "side_effect": "read" },
    { "subtask_id": "t2", "goal": "retrieve billing history", "side_effect": "read" },
    { "subtask_id": "t3", "goal": "initiate refund if eligible", "side_effect": "write" }
  ]
}

Step 3: Produce bounded candidate menus

For each subtask, selection produces a bounded CandidateSet.

Artifact #3: CandidateSet for refund subtask

{
  "candidate_set_id": "cs_01J...",
  "subtask_id": "t3",
  "top_k": 3,
  "candidates": [
    { "node_type_id": "billing.initiate_refund", "score": 0.9 },
    { "node_type_id": "billing.create_case_for_agent", "score": 0.72 },
    { "node_type_id": "support.escalate_to_human", "score": 0.61 }
  ]
}

This is the key shift: the system is not choosing from “everything.” It is choosing from a small set of approved options.

Step 4: Apply enforceable constraints and gates

Before execution, the system applies a constraint envelope.

Artifact #4: Conceptual ConstraintEnvelope

{
  "structural": { "max_depth": 3, "max_total_nodes_per_run": 25 },
  "candidates": {
    "allowed_node_type_ids": [
      "billing.initiate_refund",
      "billing.create_case_for_agent",
      "support.escalate_to_human"
    ]
  },
  "budgets": { "max_wall_time_ms": 45000, "max_steps": 35, "max_external_calls": 10 },
  "gates": { "side_effect_class": "write" }
}

If the planner tries to execute a capability outside the allowed set, enforcement rejects it.

Step 5: Policy checkpoint before side effects

Before invoking a write/irreversible action, the system records a policy checkpoint decision.

Artifact #5: Conceptual PolicyDecision record

{
  "checkpoint": "pre_invoke",
  "node_type_id": "billing.initiate_refund",
  "decision": "allow",
  "reason": "refund_amount <= limit and customer_verified == true",
  "recorded_at": "..."
}

Step 6: Durable event timeline

Finally: the run timeline.

A durable event stream lets you reconstruct “what happened and why,” including:

decomposition,
candidate menu generation,
binding decisions,
policy decisions,
execution results,
recovery steps.

Simplified example:

RunStarted run_01J...
NodeRunStarted nr_01J... support.resolve_issue
DecompositionProposed nr_01J... subtasks=[t1,t2,t3]
CandidateSetCreated cs_01J... subtask=t3 top_k=3
PolicyCheckpoint pre_invoke allow billing.initiate_refund
NodeRunCompleted nr_01J... billing.initiate_refund status=ok
RunCompleted run_01J...

What you should take away

This is the core of ARP:

Bound decisions using CandidateSets and constraints.
Gate side effects via policy checkpoints.
Keep durable evidence so systems are debuggable and improvable.

That’s what turns “tool-calling” into something you can operate.

Next in the series: ARP FAQ: what it is, what it isn’t, and how to adopt it