v0.3.8 - Early Preview Explore the docs

Capability-Oriented Programming

The engineering mindset behind ARP: build agentic systems around capabilities you can prove, not prompts you hope will work.

Shrink action space

Expose capabilities as typed nodes and select from bounded candidate sets instead of a giant “tool list”.

Bound each decision point to a small, auditable candidate menu.

Learn more →

Keep durable evidence

Events and artifacts make runs auditable and replayable, enabling real evaluation and regression testing.

Make every decision inspectable: candidate sets, bindings, outputs, and evals.

Learn more →

Gate with policy

Use explicit checkpoints and allow/deny enforcement rather than prompt-only safety rules.

Enforce budgets, side-effect classes, and approvals at runtime checkpoints.

Learn more →

COP in one sentence

Capability-Oriented Programming treats capability reliability as the primary unit of engineering. This applies to both traditional services and agentic capabilities.

Atomic capabilities

Do one thing reliably, with explicit contracts.

Composite capabilities

Coordinate other capabilities under constraints, evaluation, and governance.

Why COP exists

Agentic systems fail in production for the same reason scripts fail as systems: they scale complexity faster than they scale reliability.

Reliability collapses under long horizons and large action spaces:

  • The decision surface explodes across long horizons and large action spaces.
  • Errors compound across multi-step trajectories.
  • Natural language drifts as an implicit interface contract.
  • Policy and compliance can’t be enforced with best-effort instructions.
  • Debugging becomes forensic work without durable run records.

COP treats those as engineering realities, not temporary inconveniences.

The shift: autonomy → capability engineering

COP keeps LLMs — but treats capability reliability as the thing you engineer: contracts, bounded choices, enforced budgets, and durable evidence.

Prompt-first agent

One general agent, a huge tool list, and guardrails written in prompts.

  • A giant menu of possible actions at each step.
  • Policies live in instructions instead of enforcement points.
  • Logs show what happened, not why a choice was made.
  • Reliability is hard to measure and harder to reuse safely.

COP system

A catalog of versioned capabilities with bounded choices, budgets, and durable evidence.

  • Each decision point uses a bounded candidate menu, a CandidateSet.
  • Execution is constrained by budgets and gates, expressed in a ConstraintEnvelope.
  • Every decision emits durable artifacts for audit, replay, and evaluation.
  • Reuse is earned through evidence: test suites, scorecards, and promotion gates.

The COP principles

Make behavior repeatable, measurable, and governable.

Contracts over prompts

Prefer typed schemas + explicit semantics; keep natural language at the edges.

Reliability boundaries over “autonomy”

Define what a capability must do, must refuse, and must escalate.

Evaluation before reuse

Measure stability across runs and attach evidence to the version you ship.

Hierarchy and bounded action spaces

Decompose goals, map subtasks to top‑K candidates, and enforce constraints mechanically.

Governance is first-class

Identity, authorization, policy checkpoints, and auditability are part of the runtime.

Interop by composition

Import tools and endpoints as capabilities, inheriting the same constraints and audit surface.

Incremental adoption

Wrap existing services first; introduce composites only where tasks are open-ended.

The capability engineering loop

COP turns “agent building” into a repeatable SDLC.

  1. Define

    Schema, semantics, side-effect class, constraint defaults.

  2. Implement

    Build or import an atomic capability.

  3. Evaluate

    Offline suites, replay, multi-trial stability.

  4. Publish

    Versioned, discoverable, evidence attached.

  5. Promote

    Experimental → candidate → stable, gated by policy.

  6. Compose

    Bounded candidate menus, budgets, explicit recovery.

  7. Observe + audit

    Durable run history for every decision and side effect.

COP makes bounded action spaces concrete

A COP system doesn’t say: “Here are 400 actions, pick one.” It says: “For this subtask, here are a few approved candidates — with constraints and evidence.”

Capability contract

Explicit I/O schema, semantics, and constraint defaults.

Capability contract — YAML
id: support.create_ticket
kind: atomic
input_schema:
  type: object
  required: [subject, body]
output_schema:
  type: object
  required: [ticket_id, status]
semantics:
  side_effect_class: write
constraints_defaults:
  budgets:
    max_wall_time_ms: 5000
    max_external_calls: 2

Bounded candidate menu

An inspectable artifact: auditable, enforceable, replayable.

Bounded candidate menu — JSON
{
  "subtask": "Create a support ticket",
  "max_candidates": 5,
  "candidates": [
    { "node_type_id": "support.create_ticket@1.2.0", "side_effect_class": "write" },
    { "node_type_id": "support.queue_ticket@0.9.1", "side_effect_class": "write" },
    { "node_type_id": "support.ask_clarifying_question@1.0.0", "side_effect_class": "read" }
  ],
  "constraints": { "budgets": { "max_steps": 6, "max_wall_time_ms": 15000 } }
}

Governance and operations are first-class

Production systems require identity propagation, authorization, policy checkpoints, budgeting, rate limits, and auditability.

COP treats governance as part of the runtime loop — not a set of best-effort instructions that you hope the model follows.

What “gates” mean in practice

  • Selection-time allow/deny decisions tied to a run record.
  • Pre-invoke enforcement for budgets, timeouts, and rate limits.
  • Side-effect-aware checkpoints (read / write / irreversible).
  • Pre-irreversible approvals when required by policy.

How ARP makes COP operational

COP is the mindset. ARP is the substrate that standardizes the contracts, artifacts, and enforcement points needed to make COP work at team/enterprise scale.

NodeType

Versioned capability definition with contract and metadata.

Run / NodeRun

Durable execution records that show what happened.

CandidateSet

Bounded candidate menu that shows what was allowed.

ConstraintEnvelope

Enforceable limits covering structure, budgets, and gates.

Policy checkpoints

Allow/deny/require-approval decisions tied to the run record.

Durable events + artifacts

Replayable evidence for debugging and evaluation.

FAQ

Is COP the same as capability-based security?

No. Capability-based security is about authority and least privilege.

COP here is about reliability engineering for agentic systems: explicit bounds, contracts, and durable evidence.

What does “bounded action space” mean?

For each subtask, the system selects from a small, approved candidate menu rather than an unbounded tool universe.

In ARP terms: a CandidateSet is an artifact that can be inspected, enforced, evaluated, and replayed.

Where do budgets and policy gates live?

In the constraint envelope and policy checkpoints — not only in prompts.

Budgets, side-effect classes, allow/deny rules, and approvals are enforced at runtime checkpoints and recorded in the run history.

Do I need to rewrite my stack to adopt COP?

No. Start by wrapping existing APIs/workflows as atomic capabilities with contracts and constraints.

Then add composites only where tasks are inherently open-ended.

Is COP an agent framework?

No. COP is a discipline.

ARP standardizes the contracts, artifacts, and enforcement points so COP can be applied across different planners and execution stacks.

Get started

Read the fundamentals, explore the standard artifacts, and run the reference stack.