Agent Runtime Protocol
Capability-oriented, bounded, auditable agentic systems.
Build reliable agentic workflows by shrinking action space for LLMs, enforcing policy checkpoints, and keeping durable events and artifacts.
What is this?
Agent Runtime Protocol, or ARP, is a capability-oriented agentic execution fabric: a set of service contracts for running workflows with bounded action spaces, policy checkpoints, and durable artifacts.
JARVIS is the first-party reference OSS stack that implements ARP Standard v1.
Capability-Oriented Programming, or COP, is the mindset behind ARP and JARVIS. Instead of programming around process or objects, we design the system around nodes with certain provable capabilities.
Fixing Common Pain Points of Agentic Systems
Agentic systems fail in production not because models are weak, but because the execution fabric is unbounded, unauditable, and hard to evaluate.
ARP makes capability execution bounded and observable. With systems carefully designed around action candidate sets, constraints, and policy checkpoints, ARP turns non-deterministic “agent behavior” into clean, evidence-driven capability engineering.
Long-Horizon and Large-Action-Space Planning
How do we make tool use reliable at scale? Even the best LLM models fail to consistently plan and orchestrate large action spaces. When there are hundreds of tools available, they can't choose the right ones.
ARP Solution:
ARP mitigate large action spaces in 2 ways.
First, it provides ways to deterministically put structural constrains on the execution flow. For example, developers can require the planner to only split to ≤x subtasks. This should be deterministically enforced by ARP.
Second, it provides a way to build composite capabilities from ground up, reducing the number of capabilities visible to high-level LLMs.
Tool Execution Observability
How do we make behavior observable and debuggable? LLM tool-calling is driven by hidden reasoning, and the results are inconsistent, hard to control and complicated to test. E2E observability is often complicated. There are no native durable artifacts for audit, replay, or regression evaluation.
ARP Solution:
Designed around Capability-Oriented Programming principles, ARP fully exposes the decision process around “what capability to call” and “how to call it.” This gives developers natively supported visibility into how “tool-calling” is done, which improves debuggability, governability and consistency.
Production-Grade Policy and Governance
How do we enforce real governance, not just prompt rules? Prompt-based rules are not deterministic, consistent or reliable enough for production governance requirements.
ARP Solution:
ARP and JARVIS provides a built-in policy control engine. The deterministic executions in the system also means developers can easily integrate their own policy solutions that fits their need.
This means developers can easily control which capabilities are visible, according to the user identity or some other criteria. We are also working on Human-In-The-Loop policy controls so high-risk actions require confirmation.
Interoperability in a Meaningful Way
How do we stay interoperable in the current wild west of agentic systems? Vendor lock-in is real, while interoperability standards like MCP and A2A only defines interface, not guaranteed behavioral consistency and reliability. We are seeing thousands of MCP servers with varying level of scope and performance. How do we know which ones are good, and integrate them easily?
ARP Solution:
Treating reliable interoperability as first-class, ARP can seamlessly integrate MCP servers and A2A Agents as new “node types” that provides capabilities. ARP is not an agent framework, it is the execution fabric on top that unifies all capability providers.
On the other hand, ARP provides ways to enable evaluation of the capabilities. They can be promoted or demoted according to their performance. This is WIP in JARVIS, stay tuned!
Key differentiators
What's so special about ARP?
Spec-first, but real
ARP is built around well-defined OpenAPI + JSON Schema contracts. Any conformant client can talk to any conformant service. On the other hand, ARP is backed by SDKs, Clients, Conformance Helpers, Component Templates, and most importantly, the JARVIS Reference Stack. These real artifacts make ARP easy to adopt.
Designed around realistic limits of LLM
In 2025, we had seen LLM models improve at a stunning speed, but serious shortcomings remain. For example, even the best models have low success rates on long-horizon large-action-space planning and orchestration. Hallucinations are near impossible to fully remove. ARP is designed with these realistic limitations in mind. We acknowledge and actively mitigate these risks through proven software engineering and system design.
Interoperability by composition
ARP treat MCP/A2A services as capability providers by importing them as NodeTypes, and ARP is *NOT* designed to compete with these standards. MCP and A2A are wire-level protocols just like ARP. However, they focus on defining interoperability interfaces, while ARP focuses on providing infrastructure models to reliably build capabilities.
Core Components
ARP Standard v1 defines the core set of services that together form a bounded, auditable execution fabric. Note: since ARP is at an early stage, we may add more components if they are deemed beneficial to our goals of building the most reliable execution fabric.
Run Gateway
Client entrypoint for interacting with ARP system.
Run Coordinator
Run orchestrator that manages states, enforcement checkpoints, and dispatch tasks.
Atomic Executor
Executor for atomic node types, runs leaf nodes and produces durable events and artifacts.
Composite Executor
Decomposer for composite node types, decomposing them into sub nodes as needed.
Selection Service
Generates candidate NodeTypes for decomposed subtasks
Node Registry
Catalog of NodeTypes with metadata for Selection Service.
Policy Decision Point — PDP
Optional service for allow/deny decisions at policy checkpoints.
JARVIS-Specific Components
JARVIS includes additional, non-standard components to make the reference stack easy-to-use out of the box. These pieces are implementation choices and can be swapped out.
JARVIS_Release
End-to-end docker-composed release bundle so you can deploy the stack anywhere.
View on GitHubFirst-Party Atomic Node Packs
First-party, trusted node packs that ship real capabilities for the ecosystem.
View on GitHubRun Store
Durable data store used to persist runs and node run state.
Artifact Store
Durable storage for run artifacts produced during execution.
Event Stream
Visibility service streaming NDJSON run history so executions are replayable and inspectable.
LLM Adapter — ARP_LLM
Shared model-call adapter to provide standardized LLM provider clients. Currently supporting OpenAI with more to come.
View on GitHubSecurity Token Service
Standalone local Keycloak STS + dev tooling for custom stacks.
View on GitHubGet started now on macOS/Linux
You can try out the JARVIS open-source reference stack today!
Install and run the JARVIS stack locally. Require Docker Engine ≥ 24.0 and Docker Compose ≥ 2.20.
# Verify Docker + Compose
docker --version
docker compose version
# Get the release bundle
git clone https://github.com/AgentRuntimeProtocol/JARVIS_Release.git
cd JARVIS_Release
# Start the stack
# macOS/Linux or WSL:
bash ./start_dev.sh \
--llm-api-key "<your_openai_api_key>" \
--llm-chat-model "gpt-5-nano" Start a sample run via the Run Gateway, stream NDJSON events, and fetch the root NodeRun outputs.
RUN_ID="$(arp-jarvis --json runs start --goal "Generate a UUID, then return it." | \
python3 -c 'import json,sys; print(json.load(sys.stdin)["run"]["run_id"])')"
echo "run_id: ${RUN_ID}"
# Stream run events (NDJSON)
arp-jarvis runs events "${RUN_ID}"
# Fetch root NodeRun outputs
arp-jarvis runs inspect "${RUN_ID}" --include-node-runs Community and next steps
ARP is open source under the MIT license. Our vision is a capability ecosystem where proven nodes can be evaluated, published, and reused across stacks. Join the community, share feedback, and help shape the standard.
Now vs next
ARP Standard v1, node-centric services, plus the JARVIS reference stack.
Pinned full-stack releases, richer evaluation/promotion gates, and more integration templates.
By devs, for devs.
From the blog
Updates, release notes, and deep dives from the ARP maintainers.
Introducing ARP: capability-oriented infrastructure for reliable agentic systems
ARP combines COP, ARP Standard v1, and JARVIS to deliver bounded, auditable agentic workflows.
ARP in 10 minutes: one run, five artifacts
A hands-on walkthrough of a single run that produces bounded candidate menus, enforceable constraints, policy decisions, and a durable event timeline.
ARP Standard v1: what’s standardized — and what’s intentionally not
ARP Standard v1 defines the contracts and artifacts for bounded, auditable capability execution—without standardizing planner internals.
Ready to build with ARP?
Start with the quickstart, explore the standard, or use JARVIS as a reference implementation.