v0.3.8 — active development

Agent Runtime Protocol

Capability-oriented, bounded, auditable agentic systems.

Build reliable agentic workflows by shrinking action space for LLMs, enforcing policy checkpoints, and keeping durable events and artifacts.

Run the quickstart Read the standard View on GitHub

Layered /Bounded /Auditable /Interoperable

What is this?

Agent Runtime Protocol, or ARP, is a capability-oriented agentic execution fabric: a set of service contracts for running workflows with bounded action spaces, policy checkpoints, and durable artifacts.

JARVIS is the first-party reference OSS stack that implements ARP Standard v1.

Capability-Oriented Programming, or COP, is the mindset behind ARP and JARVIS. Instead of programming around process or objects, we design the system around nodes with certain provable capabilities.

Fixing Common Pain Points of Agentic Systems

Agentic systems fail in production not because models are weak, but because the execution fabric is unbounded, unauditable, and hard to evaluate.

ARP makes capability execution bounded and observable. With systems carefully designed around action candidate sets, constraints, and policy checkpoints, ARP turns non-deterministic “agent behavior” into clean, evidence-driven capability engineering.

Long-Horizon and Large-Action-Space Planning

How do we make tool use reliable at scale? Even the best LLM models fail to consistently plan and orchestrate large action spaces. When there are hundreds of tools available, they can't choose the right ones.

ARP Solution:

ARP mitigate large action spaces in 2 ways.

First, it provides ways to deterministically put structural constrains on the execution flow. For example, developers can require the planner to only split to ≤x subtasks. This should be deterministically enforced by ARP.

Second, it provides a way to build composite capabilities from ground up, reducing the number of capabilities visible to high-level LLMs.

Tool Execution Observability

How do we make behavior observable and debuggable? LLM tool-calling is driven by hidden reasoning, and the results are inconsistent, hard to control and complicated to test. E2E observability is often complicated. There are no native durable artifacts for audit, replay, or regression evaluation.

ARP Solution:

Designed around Capability-Oriented Programming principles, ARP fully exposes the decision process around “what capability to call” and “how to call it.” This gives developers natively supported visibility into how “tool-calling” is done, which improves debuggability, governability and consistency.

Production-Grade Policy and Governance

How do we enforce real governance, not just prompt rules? Prompt-based rules are not deterministic, consistent or reliable enough for production governance requirements.

ARP Solution:

ARP and JARVIS provides a built-in policy control engine. The deterministic executions in the system also means developers can easily integrate their own policy solutions that fits their need.

This means developers can easily control which capabilities are visible, according to the user identity or some other criteria. We are also working on Human-In-The-Loop policy controls so high-risk actions require confirmation.

Interoperability in a Meaningful Way

How do we stay interoperable in the current wild west of agentic systems? Vendor lock-in is real, while interoperability standards like MCP and A2A only defines interface, not guaranteed behavioral consistency and reliability. We are seeing thousands of MCP servers with varying level of scope and performance. How do we know which ones are good, and integrate them easily?

ARP Solution:

Treating reliable interoperability as first-class, ARP can seamlessly integrate MCP servers and A2A Agents as new “node types” that provides capabilities. ARP is not an agent framework, it is the execution fabric on top that unifies all capability providers.

On the other hand, ARP provides ways to enable evaluation of the capabilities. They can be promoted or demoted according to their performance. This is WIP in JARVIS, stay tuned!

Key differentiators

What's so special about ARP?

Spec-first, but real

ARP is built around well-defined OpenAPI + JSON Schema contracts. Any conformant client can talk to any conformant service. On the other hand, ARP is backed by SDKs, Clients, Conformance Helpers, Component Templates, and most importantly, the JARVIS Reference Stack. These real artifacts make ARP easy to adopt.

Designed around realistic limits of LLM

In 2025, we had seen LLM models improve at a stunning speed, but serious shortcomings remain. For example, even the best models have low success rates on long-horizon large-action-space planning and orchestration. Hallucinations are near impossible to fully remove. ARP is designed with these realistic limitations in mind. We acknowledge and actively mitigate these risks through proven software engineering and system design.

Interoperability by composition

ARP treat MCP/A2A services as capability providers by importing them as NodeTypes, and ARP is *NOT* designed to compete with these standards. MCP and A2A are wire-level protocols just like ARP. However, they focus on defining interoperability interfaces, while ARP focuses on providing infrastructure models to reliably build capabilities.

Core Components

ARP Standard v1 defines the core set of services that together form a bounded, auditable execution fabric. Note: since ARP is at an early stage, we may add more components if they are deemed beneficial to our goals of building the most reliable execution fabric.

Run Gateway

Client entrypoint for interacting with ARP system.

Run Coordinator

Run orchestrator that manages states, enforcement checkpoints, and dispatch tasks.

Atomic Executor

Executor for atomic node types, runs leaf nodes and produces durable events and artifacts.

Composite Executor

Decomposer for composite node types, decomposing them into sub nodes as needed.

Selection Service

Generates candidate NodeTypes for decomposed subtasks

Node Registry

Catalog of NodeTypes with metadata for Selection Service.

Policy Decision Point — PDP

Optional service for allow/deny decisions at policy checkpoints.

Diagram showing Run Gateway forwarding to Run Coordinator, which consults selection, node registry, and a PDP, and dispatches work to composite and atomic executors. — Node-centric execution fabric: gateway → coordinator → executors, with bounded selection, registry lookups, and policy checkpoints.

JARVIS-Specific Components

JARVIS includes additional, non-standard components to make the reference stack easy-to-use out of the box. These pieces are implementation choices and can be swapped out.

JARVIS_Release

End-to-end docker-composed release bundle so you can deploy the stack anywhere.

View on GitHub

First-Party Atomic Node Packs

First-party, trusted node packs that ship real capabilities for the ecosystem.

View on GitHub

Run Store

Durable data store used to persist runs and node run state.

Artifact Store

Durable storage for run artifacts produced during execution.

Event Stream

Visibility service streaming NDJSON run history so executions are replayable and inspectable.

LLM Adapter — ARP_LLM

Shared model-call adapter to provide standardized LLM provider clients. Currently supporting OpenAI with more to come.

View on GitHub

Auth Helpers — ARP_Auth

Client helpers to simplify inter-service authN.

View on GitHub

Security Token Service

Standalone local Keycloak STS + dev tooling for custom stacks.

View on GitHub

Get started now on macOS/Linux

You can try out the JARVIS open-source reference stack today!

Install and deploy

Install and run the JARVIS stack locally. Require Docker Engine ≥ 24.0 and Docker Compose ≥ 2.20.

Full quickstart

Bash

# Verify Docker + Compose
docker --version
docker compose version

# Get the release bundle
git clone https://github.com/AgentRuntimeProtocol/JARVIS_Release.git
cd JARVIS_Release

# Start the stack
# macOS/Linux or WSL:
bash ./start_dev.sh \
  --llm-api-key "<your_openai_api_key>" \
  --llm-chat-model "gpt-5-nano"

Sample run

Start a sample run via the Run Gateway, stream NDJSON events, and fetch the root NodeRun outputs.

Quickstart docs

Bash

RUN_ID="$(arp-jarvis --json runs start --goal "Generate a UUID, then return it." | \
  python3 -c 'import json,sys; print(json.load(sys.stdin)["run"]["run_id"])')"

echo "run_id: ${RUN_ID}"

# Stream run events (NDJSON)
arp-jarvis runs events "${RUN_ID}"

# Fetch root NodeRun outputs
arp-jarvis runs inspect "${RUN_ID}" --include-node-runs

Community and next steps

ARP is open source under the MIT license. Our vision is a capability ecosystem where proven nodes can be evaluated, published, and reused across stacks. Join the community, share feedback, and help shape the standard.

Join the community

Community hub

Now vs next

ARP Today

ARP Standard v1, node-centric services, plus the JARVIS reference stack.

ARP Tomorrow

Pinned full-stack releases, richer evaluation/promotion gates, and more integration templates.

By devs, for devs.

From the blog

Updates, release notes, and deep dives from the ARP maintainers.

Jan 5, 2026

Ready to build with ARP?

Start with the quickstart, explore the standard, or use JARVIS as a reference implementation.

Quickstart Read the standard

Agent Runtime Protocol

What is this?

Fixing Common Pain Points of Agentic Systems

Long-Horizon and Large-Action-Space Planning

Tool Execution Observability

Production-Grade Policy and Governance

Interoperability in a Meaningful Way

Key differentiators

Spec-first, but real

Designed around realistic limits of LLM

Interoperability by composition

Core Components

Run Gateway

Run Coordinator

Atomic Executor

Composite Executor

Selection Service

Node Registry

Policy Decision Point — PDP

JARVIS-Specific Components

JARVIS_Release

First-Party Atomic Node Packs

Run Store

Artifact Store

Event Stream

LLM Adapter — ARP_LLM

Auth Helpers — ARP_Auth

Security Token Service

Get started now on macOS/Linux

Community and next steps

Join the community

Now vs next

From the blog

Introducing ARP: capability-oriented infrastructure for reliable agentic systems

ARP in 10 minutes: one run, five artifacts

ARP Standard v1: what’s standardized — and what’s intentionally not

Ready to build with ARP?