01What runs
A synthetic test is a user goal run by a real agent against your agent-facing interface. The target can be an MCP server, CLI, OpenAPI-backed surface, or hosted Armature interface.
- Goal: "find the latest failed deployment and explain why."
- Agent: chooses tools, passes arguments, handles errors, and returns an answer.
- Evidence: outcome, tool path, timing, errors, and trace quality.
02How it runs
- Connect. MCP, CLI, OpenAPI, or hosted Armature surface.
- Run. One workflow across the harness x model matrix.
- Review. Passes stay quiet; failures carry the repair trace.
03Harness coverage
Verify the same workflow across every harness x model pair because one agent can pass while another fails.
Example workflow
Investigate a failed deployment
Model / harness
Claude Code
Codex
Cursor
OpenClaw
Gemini CLI
OpenCode
ChatGPT
Sonnet 4.6
✓
✓
✗
✓
✗
✓
✓
GPT-5.5
✓
✓
✓
✗
✓
✓
✓
Kimi K2.5
✗
✓
✓
✓
✗
✓
✗
Qwen3.5 Coder
✓
✗
✓
✓
✓
✗
✓
Gemini 3 Pro
✗
✓
✗
✓
✓
✓
✓
Claude Code
Codex
Cursor
OpenClaw
Gemini CLI
ChatGPT
Sonnet 4.6
✓
✓
✗
✓
✗
✓
✓
GPT-5.5
✓
✓
✓
✗
✓
✓
✓
Kimi K2.5
✗
✓
✓
✓
✗
✓
✗
Qwen3.5 Coder
✓
✗
✓
✓
✓
✗
✓
Gemini 3 Pro
✗
✓
✗
✓
✓
✓
✓