Synthetic tests: end-to-end agentic workflows

01What runs

A synthetic test is a user goal run by a real agent against your agent-facing interface. The target can be an MCP server, CLI, OpenAPI-backed surface, or hosted Armature interface.

Goal: "find the latest failed deployment and explain why."
Agent: chooses tools, passes arguments, handles errors, and returns an answer.
Evidence: outcome, tool path, timing, errors, and trace quality.

02How it runs

Connect. MCP, CLI, OpenAPI, or hosted Armature surface.
Run. One workflow across the harness x model matrix.
Review. Passes stay quiet; failures carry the repair trace.

03Harness coverage

Verify the same workflow across every harness x model pair because one agent can pass while another fails.

Example workflow Investigate a failed deployment

Model / harness

Claude Code

Codex

Cursor

OpenClaw

Gemini CLI

OpenCode

ChatGPT Sonnet 4.6 ✓ ✓ ✗ ✓ ✗ ✓ ✓ GPT-5.5 ✓ ✓ ✓ ✗ ✓ ✓ ✓ Kimi K2.5 ✗ ✓ ✓ ✓ ✗ ✓ ✗ Qwen3.5 Coder ✓ ✗ ✓ ✓ ✓ ✗ ✓ Gemini 3 Pro ✗ ✓ ✗ ✓ ✓ ✓ ✓

Next Armature SDK Related Close the loop

Synthetic tests: end-to-end agentic workflows.

01What runs

02How it runs

03Harness coverage

Want this against your MCP or CLI?