ARP FAQ: what it is, what it isn’t, and how to adopt it

This post answers the recurring questions we’ve heard while building ARP.

Why does ARP exist?

Because agentic reliability doesn’t scale automatically.

As workflows get longer and action inventories get larger:

mistakes compound,
policy becomes advisory,
and debugging becomes guesswork.

ARP’s thesis is that agentic systems become production systems when they have:

enforceable bounds,
durable evidence,
and an SDLC for capability reliability.

Is ARP an agent framework?

No.

Frameworks are great for authoring workflows and planner internals.

ARP is a layer that holds stable:

capability contracts and artifacts,
bounded candidate menus,
constraint enforcement points,
policy checkpoints,
and durable run evidence.

You can keep your framework internals. ARP standardizes what must remain stable and operable.

Is ARP trying to replace “tool protocols”?

No.

Tool protocols help you discover and call actions.

ARP focuses on:

bounded decision-making,
enforcement,
and durable evidence.

Tool-calling becomes safer and more reliable when it is:

mapped through bounded candidate menus,
constrained by explicit budgets and gates,
and recorded in a run history you can inspect.

Quick comparison

Category	Primary goal	What ARP adds
Agent frameworks	Author planner logic and workflows	Bounded execution artifacts, enforcement points, conformance seams
Tool protocols	Discover and call actions	CandidateSet boundedness, constraint envelopes, policy checkpoints, durable evidence
ARP	Operable capability execution	Replaceable components + artifacts that make runs inspectable and improvable

What exactly is standardized?

ARP Standard v1 defines contracts and schemas for:

capability definitions, called NodeTypes
runs and steps via Run and NodeRun records
bounded candidate menus via CandidateSets
enforceable envelopes via a ConstraintEnvelope
policy decisions recorded as PolicyDecision artifacts

It does not standardize planner internals.

What is JARVIS?

JARVIS is the first-party open-source reference implementation of ARP Standard v1.

It exists so you can:

run ARP end-to-end immediately,
inspect real artifacts,
and use it as a baseline when implementing your own components.

Do I have to adopt everything at once?

No. ARP is designed for incremental adoption.

A typical adoption path looks like:

Wrap 5–20 existing APIs as atomic capabilities with contracts and schemas
Start generating bounded candidate menus for a small set of subtasks
Record durable run artifacts and build basic debugging muscle memory
Add budgets and structural constraints to limit blast radius
Add policy checkpoints before side effects
Introduce evaluation and promotion gates for capability versions

You can stop at any step and still get value.

Example: adopt incrementally by wrapping one existing API as a NodeType:

id: support.lookup_customer
kind: atomic
input_schema:
  type: object
  required: [customer_id]
output_schema:
  type: object
  required: [name, plan, status]
semantics:
  side_effect_class: read
constraints_defaults:
  budgets:
    max_wall_time_ms: 2000
    max_external_calls: 1

Is this only for large enterprises?

No.

Solo builders and small teams often benefit the most from:

bounded action spaces,
durable run traces,
and structured failure reasons.

Enterprises benefit from:

multi-team capability catalogs,
governance checkpoints,
and consistent operational evidence across heterogeneous stacks.

How do you measure reliability?

You measure it as capability behavior over repeated trials and regression suites.

At minimum:

schema validity,
deterministic post-conditions,
and outcome-based evaluation.

Over time:

multi-trial stability measures,
scorecards attached to capability versions,
promotion gates based on evidence.

How does ARP help with safety?

ARP’s safety model is mechanical:

constrain what can be executed via CandidateSets and allow/deny lists,
constrain how much can happen via budgets and structural limits,
gate side effects via policy checkpoints and approvals,
record evidence via durable events and artifacts.

This creates enforceable safety boundaries instead of best-effort instructions.

Where should I start?

Choose one path:

Run the quickstart COP mindset ARP Standard JARVIS

If you have a question we missed, open a discussion and we’ll add it here:

Next: Interoperability first: how ARP composes with MCP, A2A, and existing stacks