Introducing ARP: capability-oriented infrastructure for reliable agentic systems

Agentic demos are easy. Operating agentic systems on-call is not.

When you move from “one impressive run” to “a workflow you can depend on,” the failure modes look familiar:

action spaces explode,
errors compound across long horizons,
natural language becomes an implicit interface contract,
policy becomes advisory instead of enforceable,
and debugging turns into guesswork.

ARP exists because reliability is not a prompting problem. It’s a systems problem.

What is ARP?

ARP is an open, capability-oriented approach to building bounded, auditable agentic workflows.

ARP is intentionally three things:

COP — Capability-Oriented Programming — a mindset for engineering around what a system can reliably do
ARP Standard v1 — versioned API contracts and artifacts for portable execution
JARVIS — a first-party open-source reference stack that runs the Standard end-to-end

If you only remember one line:

ARP turns “agents calling tools” into capability execution that is bounded, inspectable, and improvable.

What ARP is not

To avoid category confusion:

ARP is not “another agent framework.”
ARP does not standardize planner internals or “the best prompts.”
ARP is not a hosted platform you must adopt wholesale.
ARP is not just a tool catalog protocol.

ARP focuses on the layer that’s missing across most stacks: capability contracts + enforceable bounds + durable evidence.

The core idea: shrink decisions, keep evidence

ARP systems are built around two beliefs:

If you want reliability, you must shrink the decision surface of the planner.
If you want systems that improve, you must keep durable evidence of what happened.

That is why ARP makes these first-class:

Capability contracts: explicit schemas + semantics
Bounded candidate menus: the system chooses from a small approved set at each step
Constraints + budgets: time, steps, cost, depth, branching, external calls
Policy checkpoints: allow/deny/require-approval decisions tied to the run record
Durable artifacts: events + inputs/outputs + decisions, designed for replay

How ARP works at a high level

ARP systems separate definition, selection, execution, and enforcement.

Diagram showing Run Gateway forwarding to Run Coordinator, which consults selection, node registry, and a PDP, and dispatches work to composite and atomic executors. — Node-centric execution fabric: gateway → coordinator → executors, with bounded selection, registry lookups, and policy checkpoints.

Key point: the coordinator is the enforcement anchor. Planners can suggest; the system enforces.

A concrete example in 60 seconds

Let’s say you’re building a support workflow. In ARP, you start with capabilities.

1) Define an atomic capability: contract + envelope

id: support.create_ticket
kind: atomic
input_schema:
  type: object
  required: [customer_id, issue_summary, priority]
output_schema:
  type: object
  required: [ticket_id, status]
semantics:
  side_effect_class: write
  idempotency: idempotent_by_key
constraints_defaults:
  budgets:
    max_wall_time_ms: 5000
    max_external_calls: 2
evaluation:
  success_criteria:
    - ticket_id is present
    - status in ["created", "queued"]

Instead of giving the system “every possible action,” selection returns a bounded CandidateSet.

{
  "candidate_set_id": "cs_01J...",
  "subtask": "Open a support ticket with the customer and issue summary",
  "top_k": 4,
  "candidates": [
    { "node_type_id": "support.create_ticket", "score": 0.92 },
    { "node_type_id": "support.lookup_customer", "score": 0.71 },
    { "node_type_id": "support.search_similar_tickets", "score": 0.63 },
    { "node_type_id": "support.escalate_to_human", "score": 0.55 }
  ]
}

3) Execute under explicit constraints and gates

{
  "constraints": {
    "structural": { "max_depth": 3, "max_total_nodes_per_run": 20 },
    "candidates": { "allowed_node_type_ids": ["support.create_ticket", "support.escalate_to_human"] },
    "budgets": { "max_wall_time_ms": 30000, "max_steps": 30 },
    "gates": { "side_effect_class": "write" }
  }
}

4) Keep durable evidence

A run produces a durable timeline of “what happened and why”:

what was decomposed,
what candidates were allowed,
what binding was chosen,
what policy decision applied,
what outputs were produced.

That evidence is what makes debugging and evaluation tractable.

Where to start

Pick your entry point:

Run the quickstart COP mindset ARP Standard JARVIS

What’s next

ARP is designed to get stronger as the ecosystem grows:

richer evaluation scorecards attached to capability versions
promotion/demotion gates so reuse is evidence-driven
more policy semantics around approvals for irreversible actions
deeper observability mappings into your existing telemetry systems

If you want to follow along, start with the launch series on the blog and the roadmap.

Next in the series: COP: Capability-Oriented Programming for production agentic systems