codex

Lightweight coding agent that runs in your terminal

agent Rust openai/codex
74% pass rate
2/8 principles met

Spec Coverage

How many of the spec's requirements were verified for this tool. See /coverage for the full matrix.

Level Total Verified Unverified
MUST 28 19 9
SHOULD 21 13 8
MAY 10 10 0

Audience signal: mixed

This tool sends mixed signals: some agent-readable affordances are present, others are not. Treat the warnings below as friction points, not defects.

This is an informational signal, not an authoritative verdict — see methodology. The per-audit evidence below is the ground truth.

Top Issues

All Audits

P1: Non-Interactive by Default

PASS Non-interactive by default
SKIP Non-interactive gate flag advertised in --help target satisfies P1 via alternative gate (help-on-bare or stdin-primary)
PASS Flags advertise env-var bindings in --help
FAIL Secret-bearing flags expose stdin or *-file companion secret-bearing flag(s) without `*-file` companion or stdin path: --remote-auth-token-env. Flag values leak via process tables, shell history, and CI logs; provide stdin support or a `--<flag>-file` variant.
WARN `--help` advertises default values for flags no default-value annotations found in --help. SHOULD-tier — agents reading help text need to see what value a flag falls back to when omitted (`[default: <value>]` per clap convention).
PASS Rich-TUI affordance for TTY contexts

P2: Structured, Parseable Output

WARN Structured output support --output/--format flag detected but could not validate JSON via safe probes (--help/--version override output flags in most CLIs)
SKIP Structured-output CLI exposes its schema at runtime no structured-output indicator (--output / --format / json / jsonl) in --help
WARN --json / --jsonl short aliases for --output no --json or --jsonl short alias found. Agents and pipelines benefit from short forms alongside the canonical `--output` enum.
WARN `--raw` flag for pipe-safe unformatted output no `--raw` flag advertised. MAY-tier — useful for pipelines that want to strip formatting before piping to other tools.
SKIP `--output` advertises additional formats beyond text/json no `--output` or `--format` flag advertised; vacuous skip for MAY-tier extra formats.
PASS Bad invocation exits with structured usage-error code (2)
SKIP Errors emit JSON envelope with `error`/`kind`/`message` under `--output json` binary does not advertise `--output json` in --help; MUST applies only to CLIs that opt into the JSON contract.
SKIP JSON success and error envelopes share their non-payload key set binary does not advertise `--output json` in --help; envelope-consistency only applies to CLIs that opt into the JSON contract.

P3: Progressive Help Discovery

PASS Help flag produces useful output
PASS Version flag works (`--version` plus short alias)
PASS Version flag works (`--version` plus short alias)
WARN `examples` subcommand or `--examples` flag for curated usage patterns no `examples` subcommand or `--examples` flag found. MAY-tier — a curated usage block keeps agents from hunting through long help text.
PASS Short `-h` summary differs from `--help` long form
PASS Each subcommand's `--help` ships at least one invocation example
WARN Help text pairs human and `--output json` example invocations no paired text + `--output json` example found within 5 lines in top-level or any subcommand `--help`. Pairing keeps agents from reverse-engineering the JSON invocation from the text one.

P4: Fail-Fast, Actionable Errors

PASS Rejects invalid arguments
PASS Error messages include a hint or remediation phrase
SKIP `--output json` produces JSON-formatted errors binary does not advertise `--output json` in --help; SHOULD applies only to CLIs that opt into the JSON contract.

P5: Safe Retries & Mutation Boundaries

SKIP Destructive subcommands require `--force` or `--yes` no destructive subcommands detected; MUST applies conditionally to CLIs with destructive operations.
WARN Read and write surfaces are both visible in subcommand list write-pattern subcommand(s) present (update) but no read-pattern surface detected. If the CLI is write-only by design the MUST is satisfied vacuously; otherwise expose the read surface with agent-recognizable verbs (list/get/show/query/find/search).

P6: Composable, Predictable Command Structure

PASS Handles SIGPIPE gracefully
SKIP Pager-using CLI ships --no-pager escape hatch no pager signal (less/more/$PAGER/--pager) in --help
PASS Respects NO_COLOR
WARN Subcommand verbs follow community-standard names 7/24 subcommand(s) follow standard verb names. Non-standard: review, mcp, plugin, mcp-server, app-server, remote-control, completion, sandbox, debug, working, resume, the, fork, most, cloud, exec-server, features. MAY-tier — community-standard verbs (get/list/create/update/delete) help agents predict subcommand behavior across CLIs.
WARN `--color` flag for explicit color control no `--color` flag advertised. MAY-tier — `auto|always|never` lets agents and pipelines override the TTY-based default.
SKIP Input-accepting commands read from stdin when no file is given no input-accepting subcommand detected (process/parse/convert/transform/analyze/validate/format/lint/audit); vacuous skip for the conditional SHOULD.
WARN Subcommand naming follows a consistent verb/noun convention subcommand naming is inconsistent: 7 non-verb subcommand(s) (mcp, plugin, working, the, most, cloud, features) mix verb and non-verb children at the second level, so an agent cannot predict where the action lives. SHOULD-tier: pick a consistent shape (all verb-first, all noun-verb hierarchy, or any combination where each non-verb group's children are uniformly verbs). The verb list is a heuristic; inspect `--help` to confirm.
WARN Operations are subcommands, not verb-shaped flags top-level verb-shaped flag(s) found: --search. Operations belong under the `Commands:` block (`tool search "q"`), not on the flag namespace where they fight the `--help` filtering agents rely on.

P7: Bounded, High-Signal Responses

WARN Quiet mode available no --quiet/-q flag detected in --help output
WARN `--verbose` flag for diagnostic escalation no `--verbose` / `-v` flag advertised. SHOULD-tier — agents debugging failures need a way to escalate diagnostic detail.
SKIP `--limit` / `--max-results` flag for list operations no list-style subcommand detected (list/ls/search/query/find/show/get); vacuous skip for the list-only SHOULD.
SKIP Cursor-based pagination flags for list traversal no list-style subcommand detected; vacuous skip for the list-only MAY.
SKIP `--timeout` flag for long-running operations no long-running subcommand detected (serve/daemon/watch/tail/monitor/follow/run/start/stream); vacuous skip for the conditional SHOULD.
WARN Help text advertises TTY-aware verbosity behavior no TTY-aware language found in `--help`. MAY-tier — automatic verbosity reduction when stdout is piped or redirected lets agents skip the explicit `--quiet` flag. Behavioral probes cannot simulate a real TTY without a pty crate, so this audit relies on documented intent.

P8: Discoverable Through Agent Skill Bundles

PASS Skill bundle has install path (`tool skill install [<host>]`)
PASS `skill install --all` for multi-runtime install
PASS `skill update` / `skill upgrade` for bundle refresh

Details

Version scored
0.135.0
Audit date
2026-06-01 17:35:49 UTC
Duration
3.1s
Platform
linux/x86_64
Mode
command
Anc build
0.5.0
Install
bun add -g @openai/codex

Embed the badge

This score (74%) clears the badge floor (70%). Copy this into your README:

[![agent-native](https://anc.dev/badge/codex.svg)](https://anc.dev/score/codex)

Preview: agent-native badge for codex

Reproduce this scorecard for codex locally and inspect the failing audits:

anc audit --command codex --output json

Install anc first if you don't have it. Add --output json to get the same JSON shape committed under scorecards/.