This post answers the recurring questions we’ve heard while building ARP.
Why does ARP exist?
Because agentic reliability doesn’t scale automatically.
As workflows get longer and action inventories get larger:
- mistakes compound,
- policy becomes advisory,
- and debugging becomes guesswork.
ARP’s thesis is that agentic systems become production systems when they have:
- enforceable bounds,
- durable evidence,
- and an SDLC for capability reliability.
Is ARP an agent framework?
No.
Frameworks are great for authoring workflows and planner internals.
ARP is a layer that holds stable:
- capability contracts and artifacts,
- bounded candidate menus,
- constraint enforcement points,
- policy checkpoints,
- and durable run evidence.
You can keep your framework internals. ARP standardizes what must remain stable and operable.
Is ARP trying to replace “tool protocols”?
No.
Tool protocols help you discover and call actions.
ARP focuses on:
- bounded decision-making,
- enforcement,
- and durable evidence.
Tool-calling becomes safer and more reliable when it is:
- mapped through bounded candidate menus,
- constrained by explicit budgets and gates,
- and recorded in a run history you can inspect.
Quick comparison
| Category | Primary goal | What ARP adds |
|---|---|---|
| Agent frameworks | Author planner logic and workflows | Bounded execution artifacts, enforcement points, conformance seams |
| Tool protocols | Discover and call actions | CandidateSet boundedness, constraint envelopes, policy checkpoints, durable evidence |
| ARP | Operable capability execution | Replaceable components + artifacts that make runs inspectable and improvable |
What exactly is standardized?
ARP Standard v1 defines contracts and schemas for:
- capability definitions, called NodeTypes
- runs and steps via Run and NodeRun records
- bounded candidate menus via CandidateSets
- enforceable envelopes via a ConstraintEnvelope
- policy decisions recorded as PolicyDecision artifacts
It does not standardize planner internals.
What is JARVIS?
JARVIS is the first-party open-source reference implementation of ARP Standard v1.
It exists so you can:
- run ARP end-to-end immediately,
- inspect real artifacts,
- and use it as a baseline when implementing your own components.
Do I have to adopt everything at once?
No. ARP is designed for incremental adoption.
A typical adoption path looks like:
- Wrap 5–20 existing APIs as atomic capabilities with contracts and schemas
- Start generating bounded candidate menus for a small set of subtasks
- Record durable run artifacts and build basic debugging muscle memory
- Add budgets and structural constraints to limit blast radius
- Add policy checkpoints before side effects
- Introduce evaluation and promotion gates for capability versions
You can stop at any step and still get value.
Example: adopt incrementally by wrapping one existing API as a NodeType:
id: support.lookup_customer
kind: atomic
input_schema:
type: object
required: [customer_id]
output_schema:
type: object
required: [name, plan, status]
semantics:
side_effect_class: read
constraints_defaults:
budgets:
max_wall_time_ms: 2000
max_external_calls: 1
Is this only for large enterprises?
No.
Solo builders and small teams often benefit the most from:
- bounded action spaces,
- durable run traces,
- and structured failure reasons.
Enterprises benefit from:
- multi-team capability catalogs,
- governance checkpoints,
- and consistent operational evidence across heterogeneous stacks.
How do you measure reliability?
You measure it as capability behavior over repeated trials and regression suites.
At minimum:
- schema validity,
- deterministic post-conditions,
- and outcome-based evaluation.
Over time:
- multi-trial stability measures,
- scorecards attached to capability versions,
- promotion gates based on evidence.
How does ARP help with safety?
ARP’s safety model is mechanical:
- constrain what can be executed via CandidateSets and allow/deny lists,
- constrain how much can happen via budgets and structural limits,
- gate side effects via policy checkpoints and approvals,
- record evidence via durable events and artifacts.
This creates enforceable safety boundaries instead of best-effort instructions.
Where should I start?
Choose one path:
If you have a question we missed, open a discussion and we’ll add it here:
Next: Interoperability first: how ARP composes with MCP, A2A, and existing stacks