This post is a walkthrough of a single ARP run.
The goal is not to impress you with a model response. The goal is to show you the artifacts that make the system operable:
- capability contract — NodeType definition
- bounded candidate menu — CandidateSet
- constraint envelope — budgets + gates
- policy decision record — checkpoint result
- durable event timeline — replayable evidence
Step 0: Run the stack
Follow the quickstart to run the reference stack locally:
Once it’s running, you’ll have endpoints for:
- starting a run,
- inspecting run state,
- and viewing the run event timeline.
Step 1: Start a run with a root capability
Assume we start a run with a composite capability:
support.resolve_issue
Conceptual input:
{
"customer_id": "cus_123",
"message": "I was charged twice for last month. Can you fix it?"
}
Right away, the system creates a Run and a root NodeRun.
Artifact #1: Run record
{
"run_id": "run_01J...",
"root_node_run_id": "nr_01J...",
"status": "running",
"started_at": "..."
}
Step 2: Decompose into subtasks
The composite capability decomposes the goal into subtasks such as:
- “Look up the customer”
- “Retrieve billing history”
- “Determine if duplicate charge occurred”
- “If required, initiate refund”
- “Send confirmation message”
This produces structured subtask specs and child NodeRun requests.
Artifact #2: Conceptual decomposition snapshot
{
"node_run_id": "nr_01J...",
"subtasks": [
{ "subtask_id": "t1", "goal": "lookup customer profile", "side_effect": "read" },
{ "subtask_id": "t2", "goal": "retrieve billing history", "side_effect": "read" },
{ "subtask_id": "t3", "goal": "initiate refund if eligible", "side_effect": "write" }
]
}
Step 3: Produce bounded candidate menus
For each subtask, selection produces a bounded CandidateSet.
Artifact #3: CandidateSet for refund subtask
{
"candidate_set_id": "cs_01J...",
"subtask_id": "t3",
"top_k": 3,
"candidates": [
{ "node_type_id": "billing.initiate_refund", "score": 0.9 },
{ "node_type_id": "billing.create_case_for_agent", "score": 0.72 },
{ "node_type_id": "support.escalate_to_human", "score": 0.61 }
]
}
This is the key shift: the system is not choosing from “everything.” It is choosing from a small set of approved options.
Step 4: Apply enforceable constraints and gates
Before execution, the system applies a constraint envelope.
Artifact #4: Conceptual ConstraintEnvelope
{
"structural": { "max_depth": 3, "max_total_nodes_per_run": 25 },
"candidates": {
"allowed_node_type_ids": [
"billing.initiate_refund",
"billing.create_case_for_agent",
"support.escalate_to_human"
]
},
"budgets": { "max_wall_time_ms": 45000, "max_steps": 35, "max_external_calls": 10 },
"gates": { "side_effect_class": "write" }
}
If the planner tries to execute a capability outside the allowed set, enforcement rejects it.
Step 5: Policy checkpoint before side effects
Before invoking a write/irreversible action, the system records a policy checkpoint decision.
Artifact #5: Conceptual PolicyDecision record
{
"checkpoint": "pre_invoke",
"node_type_id": "billing.initiate_refund",
"decision": "allow",
"reason": "refund_amount <= limit and customer_verified == true",
"recorded_at": "..."
}
Step 6: Durable event timeline
Finally: the run timeline.
A durable event stream lets you reconstruct “what happened and why,” including:
- decomposition,
- candidate menu generation,
- binding decisions,
- policy decisions,
- execution results,
- recovery steps.
Simplified example:
RunStarted run_01J...
NodeRunStarted nr_01J... support.resolve_issue
DecompositionProposed nr_01J... subtasks=[t1,t2,t3]
CandidateSetCreated cs_01J... subtask=t3 top_k=3
PolicyCheckpoint pre_invoke allow billing.initiate_refund
NodeRunCompleted nr_01J... billing.initiate_refund status=ok
RunCompleted run_01J...
What you should take away
This is the core of ARP:
- Bound decisions using CandidateSets and constraints.
- Gate side effects via policy checkpoints.
- Keep durable evidence so systems are debuggable and improvable.
That’s what turns “tool-calling” into something you can operate.
Next in the series: ARP FAQ: what it is, what it isn’t, and how to adopt it