# The agent-native CLI standard Source: https://anc.dev/ Canonical-Markdown: https://anc.dev/index.md # The agent-native CLI standard CLI tools are how AI agents touch everything else. Compilers, databases, git, the cloud, the shell. An agent asked to ship code, rotate a credential, grep a log, or deploy a branch frequently shells out to a binary. It's the lowest-common-denominator interface where APIs don't exist or don't compose. The agent reads the output, decides what went right or wrong, and picks the next move. There is no human between the request and the process. The CLI either makes that loop tractable or it does not. This is the specification for CLIs that make it tractable. Eight principles, each expressing a requirement with [RFC 2119](https://www.rfc-editor.org/rfc/rfc2119) tiers: **MUST** for the contract, **SHOULD** for the default, **MAY** for the optional affordance. The companion linter, [`anc`](https://anc.dev/audit), scores any CLI against them and reports results by stable audit ID (`p1-non-interactive`, `p2-json-output`, `p6-sigpipe`, …). Cite a principle by its anchor slug (`#p1-non-interactive-by-default` through `#p8-discoverable-skill-bundle`); those are permanent. Each of the eight principles below has its own page (`/p1` through `/p8`) for deep-linking, and the same text is available as raw markdown at `/p1.md` … `/p8.md` for agent consumption. The entire spec as one file lives at [`/llms-full.txt`](https://anc.dev/llms-full.txt); a curated index for retrieval-heavy agents lives at [`/llms.txt`](https://anc.dev/llms.txt). For the standard applied at scale, see the [**ANC 100 leaderboard**](https://anc.dev/scorecards) — 100 widely-used CLI tools, scored against the same eight principles. The scoring methodology and per-field schema for the underlying JSON live at [`/methodology`](https://anc.dev/methodology) and [`/scorecard-schema`](https://anc.dev/scorecard-schema). To install the linter (`anc`) locally, see [/install](https://anc.dev/install). To install the agent-native-cli **skill bundle** (the Claude Code / Codex / Cursor / OpenCode skill), see [/skill](https://anc.dev/skill). --- # P1: Non-Interactive by Default Source: https://anc.dev/p1 Canonical-Markdown: https://anc.dev/p1.md # P1: Non-Interactive by Default ## Definition Every automation path MUST run without human input. A CLI tool that blocks on an interactive prompt is invisible to an agent: the agent hangs, the user sees nothing, and the operation times out silently. ## Why Agents Need It An agent calling a CLI cannot type. When the tool prompts for a confirmation or a credential, the agent's process stalls until timeout: no tokens recovered, no structured signal that interaction was requested, and no way to distinguish "waiting for input" from "still processing." Interactive prompts in automation paths are a leading cause of agent-tool deadlock. ## Requirements **MUST:** - Every flag settable via environment variable. Use a falsey-value parser for booleans so that `TOOL_QUIET=0` and `TOOL_QUIET=false` correctly disable the flag rather than being treated as truthy non-empty strings. In Rust / clap: ```rust #[arg(long, env = "TOOL_QUIET", global = true, value_parser = FalseyValueParser::new())] quiet: bool, ``` - When `--no-interactive` is set, or when stdin is not a TTY, the tool does not enter any blocking-interactive surface. It uses defaults, reads from stdin, or exits with an actionable error. "Blocking-interactive surface" includes prompt library calls AND TUI session initialization. - *(Applies when: CLI uses interactive OAuth.)* A headless authentication path. The canonical flag is `--no-browser`, which SHOULD trigger the OAuth 2.0 Device Authorization Grant ([RFC 8628](https://www.rfc-editor.org/rfc/rfc8628)) when the identity provider supports it: the CLI prints a URL and a code; the user authorizes on another device. Agents cannot open browsers. Non-canonical alternatives (`--device-code`, `--remote`, `--headless`) are acceptable but should migrate toward `--no-browser`. CLIs that authenticate via static API key, PAT, or pre-issued token satisfy this requirement through the env-var-settable flags MUST above (no browser to begin with). - *(Applies when: CLI accepts secret material as input.)* At least one input path that does not leak the secret into process listings, shell history, or the parent environment. The two leak-resistant paths are stdin and a `--*-file` flag pointing at a credential file. Flag-value forms (`--token `) and environment variables (`TOOL_TOKEN`) MAY exist as convenience surfaces but MUST NOT be the only programmatic path. Cloud-CLI env-var conventions (`AWS_ACCESS_KEY_ID`, `GH_TOKEN`) count as convenience paths under this rule, not as substitutes for it. A CLI whose only secret-input path is `--password ` leaks the secret into `ps` output on every invocation. **SHOULD:** - Auto-detect non-interactive context via TTY detection (`std::io::IsTerminal` in Rust 1.70+, `process.stdin.isTTY` in Node, `sys.stdin.isatty()` in Python) and suppress prompts when stdin is not a terminal, even without an explicit `--no-interactive` flag. - Document default values for prompted inputs in `--help` output so agents can pass them explicitly instead of accepting whatever default ships. **MAY:** - Offer rich interactive experiences (spinners, progress bars, multi-select menus) when a TTY is detected and `--no-interactive` is not set, provided the non-interactive path remains fully functional. ## Scope "Agent" in this specification means a process invoking the CLI as a subprocess. This spec's automated audits verify behavior under non-TTY stdin. TTY-driving agents (tmux panes, `ssh -t` sandbox shells, `expect` automation, computer-use desktop agents) are affected by the same MUSTs, but `anc` currently does not allocate a PTY during verification. Pass verdicts for TTY-driving-agent scenarios are probable-but-not-verified; see [/coverage](https://anc.dev/coverage) for the gap. ## Evidence - `--no-interactive` flag in the CLI struct with an env-var binding. - Boolean env vars parsed with a falsey-value parser (not the default string parser). - TTY guard wrapping every `dialoguer`, `inquire`, or equivalent prompt call. - `--no-browser` flag present on authenticated CLIs. - `env = "TOOL_..."` attribute on every flag that takes user input. - A stdin or `--*-file` path for every secret-accepting flag, present alongside (not instead of) any convenience flag-value or env-var alternative. ## Anti-Patterns - Bare `dialoguer::Confirm::new().interact()` with no TTY check and no `--no-interactive` override — agents hang indefinitely. - Boolean environment variables parsed as plain strings, so `TOOL_QUIET=false` is truthy because the string is non-empty. - `stdin().read_line()` in a code path reached during normal operation without a TTY check first. - Hard-coded credentials prompts with no env-var or config-file alternative. - OAuth flow that unconditionally opens a browser with no headless escape hatch. - A `--password ` flag with no stdin or file alternative — every invocation leaks the secret into process listings. Measured by audit IDs `p1-non-interactive` (behavioral) and `p1-non-interactive-source` (source) today. Run `anc audit --principle 1 .` against the CLI under test to see each. --- # P2: Structured, Parseable Output Source: https://anc.dev/p2 Canonical-Markdown: https://anc.dev/p2.md # P2: Structured, Parseable Output ## Definition CLI tools MUST separate data from diagnostics and offer machine-readable output formats. Mixing status messages with data forces agents into fragile regex extraction that breaks on any format change. ## Why Agents Need It An agent calling a CLI needs three things from each invocation: the data, the error (if any), and the exit code. When data goes to stdout, diagnostics go to stderr, and errors carry machine-readable fields, the agent parses the result reliably without heuristics. Mix these channels or ship human-formatted output only, and the agent falls back to best-effort text parsing that fails unpredictably across versions, locales, and edge cases: silently at first, catastrophically later. ## Requirements **MUST:** - Structured-output CLIs offer at least one machine-readable format selectable via `--output`, with `json` and `jsonl` as canonical values; `text` is the default human-facing form. The format selection threads through every output path, so a single invocation never mixes formats. - Data goes to stdout. Diagnostics, progress indicators, and warnings go to stderr. An agent consuming JSON from stdout must never encounter an interleaved progress message. - Exit codes are structured and documented. Codes 77 and 78 follow BSD `sysexits.h` (`EX_NOPERM`, `EX_CONFIG`); the broader sysexits range is intentionally not mandated to keep the surface small: | Code | Meaning | | ---: | --------------------------------- | | 0 | Success | | 1 | General command error | | 2 | Usage error (bad arguments) | | 77 | Authentication / permission error | | 78 | Configuration error | - When `--output json` is active, errors are emitted as JSON (to stderr) with at least `error`, `kind`, and `message` fields. A plain-text error in a JSON run breaks the consumer's parser on the only shape it was told to expect. - *(Applies when: CLI emits structured output.)* The output schema is exposed at runtime via a `schema` subcommand or a `--schema` flag on each data-emitting subcommand. The schema identifies its format (canonical recommendation: JSON Schema 2020-12, the same dialect OpenAPI 3.1 uses), so an agent reading the schema loads the right validator without parsing prose. A consumer asking "what shape am I about to receive?" gets a machine-readable answer in one call. **SHOULD:** - JSON output uses a consistent envelope (a top-level object with predictable keys) across every command so agents can rely on the same shape. Passthrough tools whose value is the user's own JSON (`jq`, `dasel`) are exempt; the envelope applies to commands that emit tool-defined data. - *(Applies when: CLI emits structured output.)* The same schema is also exported to a stable file path (e.g., `schema/.json` in the release artifact). Runtime discovery covers ad-hoc inspection; the file path lets CI and static-analysis consumers pin the schema without invoking the tool. - `--json` and `--jsonl` are accepted as aliases for `--output json` and `--output jsonl`. The `--output` enum remains the canonical surface for `p2-must-output-flag`; a CLI shipping only the short forms still satisfies the canonical MUST through the alias path. **MAY:** - Additional output formats (CSV, TSV, YAML) beyond the core three. The core three remain mandatory. - A `--raw` flag for unformatted output suitable for piping to other tools. ## Evidence Rust reference implementation: - `OutputFormat` enum with `Text`, `Json`, `Jsonl` variants deriving `ValueEnum`. - `OutputConfig` struct with `format`, `use_color`, and `quiet` fields threaded through every output-producing function. - `serde_json` in `Cargo.toml`. - No `println!` in `src/` outside the output module: every print goes through `OutputConfig`. - Exit-code constants or match arms mapping error variants to distinct numeric codes. - `eprintln!` (or an equivalent diagnostic macro) for every diagnostic line. ## Anti-Patterns - `println!` scattered across handlers instead of routing through the output config. - A single exit code (1) for everything: agents cannot distinguish auth failures from config errors. - Status lines ("Fetching data…") printed to stdout where they contaminate JSON output. - `process::exit()` in library code, bypassing structured error propagation. - Human-formatted tables as the only output mode with no JSON alternative. Measured by audit IDs `p2-output-json`, `p2-output-format`, `p2-stderr-diagnostics`. Run `anc audit --principle 2 .` against the CLI under test to see each. --- # P3: Progressive Help Discovery Source: https://anc.dev/p3 Canonical-Markdown: https://anc.dev/p3.md # P3: Progressive Help Discovery ## Definition Help text MUST be layered so agents (and humans) can drill from a short summary to concrete usage examples without reading the entire manual. The critical layer is the one that appears **after** the flags list, because that is where readers look for invocation patterns. ## Why Agents Need It Agents discover how to use a tool by calling `--help` and scanning the output. They skip past flag definitions (which describe what is *possible*) and hunt for examples (which describe what to *do*). A flags list alone is enough rope to produce a failed invocation; examples are what turn discovery into action. Without examples in the help output, an agent trial-and-errors its way into a working call, burning tokens and sometimes landing on a wrong-but-silent success. ## Requirements The principle is framework-agnostic. `clap`'s `after_help` is the worked example below; analogs include `cobra`'s `Example` (Go), `argparse`'s `epilog` (Python), `docopt`'s usage block, and the `Examples:` convention in `gh` / `kubectl`. **MUST:** - Every subcommand ships at least one concrete invocation example showing the command with realistic arguments, rendered in the section that appears after the flags list. In clap this is the `after_help` attribute. - The top-level command ships at least one concrete example, and 2–3 when the tool has multiple primary use cases. - The top-level command responds to `--version` with a non-empty version line on stdout and exits 0. Agents pin against tool versions to detect breaking changes; a `--version` that errors, exits non-zero, or prints nothing forces every consumer to scrape the binary path or manifest for a clue. **SHOULD:** - When the CLI exposes a structured-output mode (see [P2](https://anc.dev/p2)), examples show human and agent invocations side by side: a text-output example followed by its `--output json` equivalent. Readers see the pair; agents see the JSON form. - Short `about` for command-list summaries; `long_about` reserved for detailed descriptions visible with `--help` but not `-h`. - A short alias for `--version` works: `-V` (clap default; `curl`, `wget`, `gzip`), `-v` (`npm`, `node`, `bun`, `yarn`, `make`), or `-version` (Go's `flag` package). Any of the three is sufficient. Agents probing tool versions across many CLIs save token cost when they can pin against a one- or two-character flag; the long-only path forces an extra parse step. **MAY:** - A dedicated `examples` subcommand or `--examples` flag that outputs a curated set of usage patterns for agent consumption. ## Evidence - `after_help` (or `after_long_help`) attribute on the top-level parser struct. - `after_help` attribute on every subcommand variant. - Example invocations in `after_help` text that include realistic arguments, not placeholder `` tokens. - Both `about` (short) and `after_help` (examples) present on each subcommand. ## Anti-Patterns - Relying solely on `///` doc comments: those populate `about` / `long_about`, not `after_help`, so no examples render after the flags list. - A single `about` string serving as both summary and usage documentation. - Examples buried in a README or man page but absent from `--help` output. - `after_help` text that describes the flags in prose instead of demonstrating them in code. Measured by audit IDs `p3-help`, `p3-after-help`, `p3-version`. Run `anc audit --principle 3 .` against the CLI under test to see each. --- # P4: Fail Fast with Actionable Errors Source: https://anc.dev/p4 Canonical-Markdown: https://anc.dev/p4.md # P4: Fail Fast with Actionable Errors ## Definition CLI tools MUST detect invalid state early, exit with a structured error, and tell the caller three things: what failed, why, and what to do next. An error that says "operation failed" gives an agent nothing to act on. ## Why Agents Need It Agents operate in a retry loop: attempt, observe, decide. When an error is vague or unstructured (a bare stack trace, a one-word failure, a mixed-channel splurge), the agent cannot tell whether to retry, re-authenticate, fix configuration, or escalate to the user. Distinct exit codes paired with this standard's published mapping let the agent act correctly on the first read. The difference between exit code 77 (re-authenticate) and exit code 78 (fix config) determines whether the agent retries OAuth or asks the user to check their config file. Getting that wrong wastes entire conversation turns. Codes 77 and 78 follow BSD `sysexits.h` (`EX_NOPERM`, `EX_CONFIG`). Most CLIs today do not distinguish auth from config errors at the exit-code layer; this standard adopts `sysexits.h` numbering so agents can disambiguate. ## Requirements **MUST:** - Parse arguments with `try_parse()` instead of `parse()`. Clap's `parse()` calls `process::exit()` directly, bypassing custom error handlers, which means `--output json` cannot emit JSON parse errors. `try_parse()` returns a `Result` the tool can format: ```rust let cli = Cli::try_parse()?; ``` - Error types map to distinct exit codes. Use 77 when the CLI has an auth surface and 78 when it has a config surface; 0 (success), 1 (general command error), and 2 (usage / argument error) are universal. - Every error message names **the failure**, **the cause**, and **a concrete remediation**: a command to run or a value to set, not a hint to consult documentation. Example: ```text Authentication failed: token expired (expires_at: 2026-03-25T00:00:00Z). Run `tool auth refresh` or set TOOL_TOKEN. ``` **SHOULD:** - Error types use a structured enum (via `thiserror` in Rust) with variant-to-kind mapping for JSON serialization. Agents match on error kinds programmatically rather than parsing message text. - Locally-verifiable config and auth invariants (file presence, token format, required keys) are checked before any network call. Remote validation is the network call's responsibility and SHOULD use distinct exit codes. - Error output respects `--output json`: JSON-formatted errors go to stderr when JSON output is selected, consistent with [P2](https://anc.dev/p2)'s stream discipline (stdout for data, stderr for diagnostics). - *(Applies when: rejecting input against an enum or a fixed-allowed-values set.)* The error message includes the valid set. `unknown command 'lst' (valid: list, get, create, update, delete)` is more useful than `unknown command 'lst'`: the agent learns the surface from the failure instead of going back to `--help`. ## Evidence - `Cli::try_parse()` in `main()`, not `Cli::parse()`. - Error enum with `#[derive(Error)]` and distinct variants for config, auth, and command errors. - `exit_code()` method on the error type returning variant-specific codes. - `kind()` method returning a machine-readable string for JSON serialization. - `run()` function returning `Result<(), AppError>`, not calling `process::exit()` internally. - Error messages containing remediation steps ("run X" or "set Y") alongside the cause. ## Anti-Patterns - `Cli::parse()` anywhere in the codebase, because it silently prevents JSON error output. - `process::exit()` in library code or command handlers. Only `main()` (and signal/panic handlers it installs) may call it, after all error handling. - A single catch-all error variant that maps everything to exit code 1. - Error messages that state the symptom without the cause or fix ("Error: request failed"). - Panics (`unwrap()`, `expect()`) on recoverable errors in production code paths. Measured by audit IDs `p4-bad-args`, `p4-process-exit`, `p4-unwrap`, `p4-exit-codes`. Run `anc audit --principle 4 .` against the CLI under test to see each. --- # P5: Safe Retries and Explicit Mutation Boundaries Source: https://anc.dev/p5 Canonical-Markdown: https://anc.dev/p5.md # P5: Safe Retries and Explicit Mutation Boundaries ## Definition Every CLI with write operations MUST support `--dry-run` so agents can preview a mutation before committing it. Commands MUST make the read-vs-write distinction visible from name and `--help` alone, and destructive writes MUST require explicit confirmation. An agent that cannot distinguish a safe read from a dangerous write will either avoid the tool or execute mutations blindly. Both are failure modes. ## Why Agents Need It Agent harnesses commonly retry failed operations. If a write operation is not idempotent, a retry creates duplicates, corrupts data, or trips rate limits. When destructive operations require explicit confirmation (`--force`, `--yes`) and support preview (`--dry-run`), an agent can safely explore what a command would do before committing to it. Read-only tools are inherently safe for retries, but they still benefit from help text that names the mutation contract: "this does not modify state" is a better sentence to put in `--help` than to assume. ## Requirements **MUST:** - *(Applies when: CLI has destructive operations.)* Destructive operations (delete, overwrite, bulk modify) require an explicit `--force` or `--yes` flag. Without it, the tool refuses the operation or enters dry-run mode. It MUST NOT mutate silently. - *(Applies when: CLI has both read and write operations.)* The distinction between read and write commands is clear from the command name and help text alone. An agent reading `--help` immediately knows whether a command mutates state. - *(Applies when: CLI has write operations.)* A `--dry-run` flag is present on every write command. When set, the command validates inputs and reports what it would do without executing. Dry-run output respects `--output json` so agents can parse the preview programmatically. **SHOULD:** - Write operations are idempotent where the domain allows it. Running the same command twice produces the same end state, not a doubled effect. ## Evidence - `--dry-run` flag on commands that create, update, or delete resources. - `--force` or `--yes` flag on destructive commands. - Command names that signal intent: `add`, `remove`, `delete`, `create` for writes; `list`, `show`, `get`, `search` for reads. - Dry-run output that shows what *would* change without executing. ## Anti-Patterns - A `delete` command that executes immediately without `--force` or confirmation. - Write commands sharing a name pattern with read commands (e.g., a `sync` that silently overwrites local state). - No `--dry-run` option on bulk operations, where a preview prevents costly mistakes. - Operations that fail on retry because the first attempt partially succeeded: non-idempotent writes without rollback. Measured by audit IDs `p5-dry-run`, `p5-destructive-guard`. Run `anc audit --principle 5 .` against the CLI under test to see each. --- # P6: Composable and Predictable Command Structure Source: https://anc.dev/p6 Canonical-Markdown: https://anc.dev/p6.md # P6: Composable and Predictable Command Structure ## Definition CLI tools MUST integrate cleanly with pipes, scripts, and other tools. That means handling SIGPIPE, detecting TTY for color and formatting decisions, supporting stdin for piped input, and maintaining a consistent, predictable subcommand structure. ## Why Agents Need It Agents compose CLI tools into pipelines: ```bash tool list --output json | jaq '.[] | .id' | xargs tool get ``` Every link in that chain has to behave predictably. A tool that panics on SIGPIPE when piped to `head` breaks the pipeline. A tool that emits ANSI color codes into a pipe pollutes downstream JSON parsing. A tool with inconsistent subcommand naming forces the agent to memorize exceptions rather than apply patterns. Composability is what makes a CLI tool a building block rather than a dead end. ## Requirements **MUST:** - SIGPIPE is handled so that piping to `head`, `tail`, or any tool that closes the pipe early does not crash the process. In Rust, restore the default SIGPIPE handler as the first executable statement in `main()`: ```rust unsafe { libc::signal(libc::SIGPIPE, libc::SIG_DFL); } ``` Equivalents in other languages: in Python, restore the default `SIGPIPE` handler at startup (`signal.signal(signal.SIGPIPE, signal.SIG_DFL)`); in Go, the runtime's default handling already exits cleanly on EPIPE writes; in Node.js, handle `EPIPE` on `process.stdout`. - TTY detection, plus support for `NO_COLOR` and `TERM=dumb`. When stdout or stderr is not a terminal, color codes are suppressed automatically. - Shell completions available via a `completions` subcommand (clap_complete in Rust; equivalents elsewhere). This is a Tier 1 meta-command: it works without config, auth, or network. - *(Applies when: CLI makes network calls.)* A `--timeout` flag with a sensible default (30 seconds is the canonical recommendation for typical request/response operations; longer for streaming or upload commands). Agents operating under their own time budgets need to fail fast rather than block on a slow upstream. - *(Applies when: CLI invokes a pager for output.)* Support `--no-pager` or respect `PAGER=""`. Pagers block headless execution indefinitely. - *(Applies when: CLI uses subcommands.)* Agentic flags (`--output`, `--quiet`, `--no-interactive`, `--timeout`) propagate to every subcommand automatically (e.g., `global = true` in clap). - *(Applies when: CLI runs long-running operations.)* SIGTERM is handled gracefully: in-flight writes flush or roll back, locks release, and the process exits non-zero within a bounded shutdown window. The next invocation succeeds without manual cleanup. Agents commonly cancel a long-running call when their own deadline expires; a tool that leaves half-written state behind on SIGTERM forces the agent to clean up before retrying. **SHOULD:** - Commands that accept input read from stdin when no file argument is provided. Pipeline composition depends on it. - Pick one subcommand-naming convention and apply it consistently: `noun verb`, `verb noun`, or verb-only at the top level (e.g., `git commit`, `git push`). Mixing kebab-case compound verbs (`list-users`) with nested noun-verb (`user show`) in the same tool forces agents to learn exceptions. - Three-tier dependency gating: Tier 1 (meta-commands like `completions`, `version`) needs nothing; Tier 2 (local commands) needs config; Tier 3 (network commands) needs config + auth. `completions` and `version` always work, even in broken environments. - When a CLI exposes multiple distinct operations, model them as subcommands rather than mutually-exclusive flags. `tool search "query"` is preferable to `tool --search "query" --get "id"`. Single-operation tools (`grep`, `curl`, `jq`) are exempt: flags are their operation surface. **MAY:** - A `--color auto|always|never` flag for explicit color control beyond TTY auto-detection. - *(Applies when: CLI uses subcommands.)* Subcommand verbs follow community-standard names (`get`, `list`, `create`, `update`, `delete`); flag spellings follow widely-used canonical forms (`--force` for confirmation bypass, `--yes` for prompt bypass, `--limit` for pagination, `--quiet` / `--verbose` for volume control). Convergence reduces an agent's per-tool relearning cost: an agent that has seen `kubectl get` and `gh repo list` recognizes `tool list` immediately, without re-reading `--help`. ## Evidence - `libc::signal(libc::SIGPIPE, libc::SIG_DFL)` (or the equivalent in the target language) as the first statement of `main()`. - `IsTerminal` trait usage (`std::io::IsTerminal` or the `is-terminal` crate). - `NO_COLOR` and `TERM=dumb` checks. - `clap_complete` in `Cargo.toml`. - A `completions` subcommand in the CLI enum. - Tiered match arms in `main()` separating meta-commands from config-dependent commands. ## Anti-Patterns - Missing SIGPIPE handler: `cargo run -- list | head` panics with "broken pipe". - Hard-coded ANSI escape codes without TTY detection. - Color output in JSON mode: ANSI codes inside JSON string values break downstream parsing. - A `completions` command that requires auth or config to run. - No stdin support on commands where piped input is a natural use case. Measured by audit IDs `p6-sigpipe`, `p6-no-color`, `p6-completions`, `p6-timeout`, `p6-agents-md`. Run `anc audit --principle 6 .` against the CLI under test to see each. --- # P7: Bounded, High-Signal Responses Source: https://anc.dev/p7 Canonical-Markdown: https://anc.dev/p7.md # P7: Bounded, High-Signal Responses ## Definition CLI tools MUST provide mechanisms to control output volume. Agent context windows are finite and expensive: a tool that dumps tens of thousands of lines of unfiltered output wastes tokens on every request and can exceed smaller context windows entirely, breaking the conversation that invoked it. "High-signal" here means the bytes that survive `--quiet` are the ones the caller asked for (data and errors), not progress, decoration, or chatter. ## Why Agents Need It Unbounded CLI output is expensive for any agent: token cost and context-window capacity for LLM agents, parse cost and memory pressure for scripts, schedulers, and other automation. Either way, the agent ends up truncating (losing potentially important data) or consuming the full response (wasting cycles on noise). Bounded output with `--quiet`, `--verbose`, and `--limit` flags gives the agent precise control over how much data arrives, keeping responses high-signal and inside budget. ## Requirements **MUST:** - A `--quiet` flag suppresses non-essential output: progress indicators, informational messages, decorative formatting. When `--quiet` is set, only requested data and errors appear. Implementations route diagnostics through a macro that short-circuits when quiet is on: ```rust macro_rules! diag { ($cfg:expr, $($arg:tt)*) => { if !$cfg.quiet { eprintln!($($arg)*); } } } ``` - *(Applies when: CLI has list-style commands.)* List operations clamp to a documented default maximum. A `list` without `--limit` does not return more than a configurable ceiling (e.g., 100 items), and that ceiling is named in `--help` so callers can plan around it. If more items exist, the output indicates truncation: `"truncated": true` in JSON, a stderr note in text mode. **SHOULD:** - A `--verbose` flag (or `-v` / `-vv`) escalates diagnostic detail when agents need to debug failures. - A `--limit` or `--max-results` flag lets callers request exactly the number of items they want. - A `--timeout` flag bounds execution time. An agent waiting indefinitely on a hung network call cannot proceed. **MAY:** - Cursor-based pagination flags (`--after`, `--before`) for efficient traversal of large result sets. - Automatic verbosity reduction in non-TTY contexts (the same behavior `--quiet` explicitly requests). ## Evidence - A `--quiet` flag that respects both CLI and environment-variable input, with explicit override semantics (e.g., `--quiet=false` beats `QUIET=1`). - A diagnostic macro (or equivalent gate) that short-circuits when `quiet` is true. - `--limit` or `--max-results` on every list / search command. - Pagination clamping logic (e.g., `min(requested, MAX_RESULTS)`). - `--timeout` flag with a sensible default. - `--verbose` flag for diagnostic escalation. - A `suppress_diag()` method that returns true when quiet is set or when the output format is JSON / JSONL. ## Anti-Patterns - List commands that return all results with no default limit: an agent listing 50,000 items floods its context window. - No `--quiet` flag: agents consuming JSON output still receive interleaved diagnostic text on stderr. - `--verbose` as the only output control. If there is no way to reduce output, bounded responses do not exist. - Progress bars or spinners that write to stderr in non-TTY contexts, adding noise to agent logs. - No `--timeout` on network operations. A stalled request blocks the agent indefinitely. Measured by audit IDs `p7-quiet`, `p7-limit`, `p7-timeout`. Run `anc audit --principle 7 .` against the CLI under test to see each. --- # P8: Discoverable Through Agent Skill Bundles Source: https://anc.dev/p8 Canonical-Markdown: https://anc.dev/p8.md # P8: Discoverable Through Agent Skill Bundles ## Definition Without a skill bundle, every fresh agent invocation begins the same way: pull `--help`, infer the idioms, try a command, parse the error, try again. A skill bundle (canonical names `AGENTS.md` or `SKILL.md`) collapses that loop: agent-discoverable through filesystem convention rather than through `--help`, loaded once, recognized thereafter. ## Why Agents Need It `--help` describes what is *possible* (the flag and subcommand surface); a skill bundle describes what to *do* (workflow knowledge, common compositions, recovery patterns). Workflow knowledge does not fit in `after_help` examples. The bundle is also where conventions that span multiple subcommands live (exit-code tables, output-channel discipline, retry semantics): context that `--help` for a single subcommand cannot carry on its own. Without one, the agent has nowhere to durably register what it learned, and re-pays the discovery cost on every fresh session. ## Requirements **MUST:** - *(Applies when: CLI ships a skill bundle.)* An install path that registers the bundle with installed agent runtimes. The canonical form is a `tool skill install []` subcommand that writes into the runtime's filesystem cascade (`~/.claude/skills/`, `~/.cursor/skills/`, `~/.codex/skills/`, etc.). Non-canonical alternatives (`tool init --skill`, `tool skills add`, `tool agents add`) are acceptable but should migrate toward `tool skill install`. A bundle without an install path sits unread until a human manually copies it; the install path is what turns the bundle from documentation into discoverable runtime knowledge. **SHOULD:** - A top-level agent-discoverable markdown bundle. Canonical filenames are `AGENTS.md` or `SKILL.md`, both recognized by major agent runtimes. The bundle's first job is to be findable by filesystem convention; its second is to teach the agent how to invoke the tool well. YAML frontmatter at minimum names the tool and a one-line capability summary so agents can scan and route without reading the full body. **MAY:** - *(Applies when: CLI ships a skill bundle.)* An `--all` mode auto-detects installed agent runtimes (Claude Code, Cursor, Codex, OpenCode, and others as the ecosystem evolves) and installs the bundle across each. A user setting up a new machine with multiple coding agents installs once and gets coverage everywhere. - *(Applies when: CLI ships a skill bundle.)* An `update` (or `upgrade`) subcommand under `tool skill` pulls the latest bundle version, so agents stay current with the CLI's evolving surface without a full reinstall. ## Evidence - A top-level `AGENTS.md` or `SKILL.md` in the CLI's source tree, shipped in the release artifact, with YAML frontmatter declaring at least the tool name and a one-line capability summary. - A `skill` subcommand group in the CLI (e.g., `tool skill install`, `tool skill update`, `tool skill list`). - An installer that writes directly to the runtime cascade (`~/.claude/skills//`, `~/.cursor/skills//`) rather than requiring the runtime to be running. - Bundle content versioned alongside the CLI's release: the bundle ships from the same commit as the binary, not from a separate documentation tree that drifts. ## Anti-Patterns - A CLI shipping a skill bundle with no install path. The bundle sits unread until a human copies it manually. - An install path that requires the agent runtime to be running. The runtime cascade is filesystem-resident; writing to it should not need an active session. - A bundle whose contents drift from the CLI's actual surface: a skill bundle in a docs subtree maintained by a different cadence than the binary itself, naming flags or subcommands that no longer exist. Requirement IDs `p8-must-bundle-install`, `p8-should-bundle-exists`, `p8-may-install-all`, and `p8-may-bundle-update` define the contract; the antecedent audit `p8-bundle-exists` gates the conditional requirements. Run `anc audit --principle 8 .` against the CLI under test to see each. --- # Audit your CLI Source: https://anc.dev/audit Canonical-Markdown: https://anc.dev/audit.md # Audit your CLI `anc` is the reference linter for this standard. It scores any CLI tool against the eight principles and tells you, by audit ID, where it passes and where it falls short. ## Install See [/install](https://anc.dev/install) for `brew`, `cargo`, and platform-archive instructions. Once `anc` is on `$PATH`, the rest of this page is what to do with it. ## Run it ```bash # Against the current project (cargo workspace, binary, or source tree) anc audit . # Against a compiled binary directly anc audit ./target/release/mycli # Agent-friendly output anc audit . --output json # Narrow to one principle anc audit . --principle 3 ``` ## Read the output ```text P1 — Non-Interactive by Default [PASS] Non-interactive by default (p1-non-interactive) [PASS] Has --no-interactive flag (p1-flag-existence) P3 — Progressive Help [PASS] Help flag produces useful output (p3-help) [PASS] Version flag produces version output (p3-version) P4 — Fail Fast with Actionable Errors [PASS] Rejects invalid arguments (p4-bad-args) P6 — Composable Command Structure [PASS] Handles SIGPIPE cleanly (p6-sigpipe) [PASS] Respects NO_COLOR (p6-no-color-behavioral) P7 — Bounded, High-Signal Responses [PASS] Has --quiet flag (p7-quiet) ``` Each line ends with a stable audit ID (`p1-non-interactive`, `p2-json-output`, `p6-sigpipe`, etc.). Cite those IDs in issues, commits, and agent output; they do not change between versions. ## Three audit layers - **Behavioral**: runs your compiled binary and inspects `--help`, `--version`, `--output json`, SIGPIPE, NO_COLOR, and exit codes. Language-agnostic. - **Source**: ast-grep pattern matching on source code. Catches `.unwrap()`, missing error types, naked `println!`. Rust and Python today; more languages as they land. - **Project**: file and manifest inspection. Looks for `AGENTS.md`, recommended dependencies, dedicated error and output modules. Pass `--binary` for behavioral-only (skip source). Pass `--source` for source-only (skip behavioral). Most projects want the default, which is "run everything." ## What a score means A `[PASS]` is a requirement met, not a compliment. A `[WARN]` is a SHOULD the tool doesn't satisfy; ignoring it is a choice, not a bug. A `[FAIL]` is a MUST the tool doesn't satisfy; agents will hit the edge it describes, and the tool will surprise them. Nothing here is a vanity metric — the audits map one-to-one to the requirements on the [principles page](https://anc.dev/). ## See how widely-used CLIs score The [**ANC 100 leaderboard**](https://anc.dev/scorecards) is what running `anc audit` produces at scale: every popular CLI tool, scored against the same eight principles, with full per-audit evidence under `/score/`. The scoring rules are documented on the [methodology page](https://anc.dev/methodology); the underlying JSON schema is enumerated at [/scorecard-schema](https://anc.dev/scorecard-schema). Source: [github.com/brettdavies/agentnative-cli](https://github.com/brettdavies/agentnative-cli). --- # Install anc Source: https://anc.dev/install Canonical-Markdown: https://anc.dev/install.md # Install anc `anc` is the reference linter for the agent-native CLI standard. It scores any CLI against the eight principles and tells you, by audit ID, where it passes and where it falls short. Install it locally, then point it at a binary or a project directory. ## Homebrew ```bash brew install brettdavies/tap/agentnative ``` The tap publishes prebuilt bottles for Apple Silicon Macs (macOS 14 Sonoma and 15 Sequoia) and x86_64 Linux, with SHA256 integrity hashes recorded in the formula. Intel Mac and arm64 Linux users compile from source via the same formula. To update in place: ```bash brew upgrade brettdavies/tap/agentnative ``` ## Cargo ```bash cargo install agentnative ``` For a prebuilt binary without compiling from source (requires [`cargo-binstall`](https://github.com/cargo-bins/cargo-binstall); skip if you don't already have it): ```bash cargo binstall agentnative ``` ## GitHub Releases Platform archives, including Windows builds and SHA256 checksums, live at [github.com/brettdavies/agentnative-cli/releases](https://github.com/brettdavies/agentnative-cli/releases). Download the archive for your platform, extract, and put the `anc` binary on `$PATH`. ## What's next Once installed, invoke the CLI as `anc`. See [/audit](https://anc.dev/audit) for usage: flags, output shapes, and how to interpret the per-principle audit IDs. The principles themselves are spelled out at [/](https://anc.dev/), with one page per principle (`/p1` through `/p8`). To install the **agent-native-cli skill bundle** instead (the Claude Code / Codex / Cursor / OpenCode skill that teaches an agent to write CLIs against this standard), see [/skill](https://anc.dev/skill). --- # About this standard Source: https://anc.dev/about Canonical-Markdown: https://anc.dev/about.md # About this standard The agent-native CLI standard is a specification for command-line tools that behave predictably under agent control. Eight principles, enforced by RFC 2119 requirement tiers (MUST / SHOULD / MAY), and measured by a companion linter ([`anc`](https://anc.dev/audit)) that scores any CLI against them. ## Provenance This spec is authored and maintained in the open by Brett Davies, with contributions accepted via the channels below. It is a proposal pressure-tested in public, not a ratified industry standard. The goal is to converge on something worth ratifying, by writing it down concretely first and inviting people to break it. ## Prior art The eight-principle structure draws on two distinct lineages. **Standards and methodologies that shaped the format:** - [Command Line Interface Guidelines (clig.dev)](https://clig.dev/): the closest direct prior art for CLI design guidance; the Unix-philosophy distillation this spec departs from when agents change the human-only assumptions. - [The Twelve-Factor App (12factor.net)](https://12factor.net/): the numbered-principle layout, environment-first configuration, and the discipline of writing each factor down concretely. - [IETF RFC 2119](https://www.rfc-editor.org/rfc/rfc2119): the MUST / SHOULD / MAY contract that turns prose into a conformance bar. **Writing that informed the principles directly:** - Cloudflare's [Building a CLI for all of Cloudflare](https://blog.cloudflare.com/cf-cli-local-explorer) (2026-04-13) and the [HN discussion that followed](https://news.ycombinator.com/item?id=47753689): the most concrete public statement of "agents are the primary customer" of CLI tools, paired with crowdsourced failure modes from people who run agents against CLIs every day. Quotes from that thread shaped the framing of P3 (progressive help), P4 (actionable errors), and P6 (composable structure) directly. P2 and P6 mirror the Cloudflare team's own rules (`get` not `info`, `--json` always, `--force` not `--skip-confirmations`). If a specific principle's framing seems to echo prior writing, it probably does. Credit accrues to the people whose public reasoning informed it; mistakes are mine. ## Versioning The spec uses semver-adjacent versioning with three tiers: MAJOR changes break citation IDs or remove MUSTs; MINOR changes add MUSTs or promote SHOULDs; PATCH changes edit prose without shifting requirements. The current version appears in the footer of every page. Principle anchor slugs (`#p1-non-interactive-by-default` through `#p8-discoverable-skill-bundle`) are permanent. If a principle merges or splits in a future MAJOR version, the old slug will resolve as a permanent redirect to wherever the requirement now lives; citations made today will not 404 after a future restructuring. ## RFC 2119 MUST, SHOULD, MAY, MUST NOT, and SHOULD NOT on this site carry their [RFC 2119](https://www.rfc-editor.org/rfc/rfc2119) meanings. They are not emphatic; they are contractual. A tool that does not satisfy a MUST is non-conformant with the version of the standard it claims to target. ## License Spec text on this site is available under [CC BY 4.0](https://creativecommons.org/licenses/by/4.0/). The `anc` linter is dual-licensed under MIT and Apache-2.0; see [its LICENSE files](https://github.com/brettdavies/agentnative-cli). ## Contributing Pressure-testing is how the spec evolves. Three ways to contribute: 1. **[Submit a grading finding](https://github.com/brettdavies/agentnative/issues/new?template=grading-finding.yml):** score a real CLI against a principle you think the spec gets wrong, and report what you found. Name the CLI, the principle, and the specific MUST/SHOULD/MAY that failed (or passed unexpectedly). 2. **[Report a false positive or false negative](https://github.com/brettdavies/agentnative-cli/issues/new?template=false-positive.yml):** in the `anc` auditor. Include the command, the output, and the audit ID. 3. **[Propose a principle edit](https://github.com/brettdavies/agentnative/issues/new?template=pressure-test.yml):** merge, split, rewording, demotion of a MUST to a SHOULD. Describe the problem before proposing a solution. 4. **[Add a tool to the registry](https://github.com/brettdavies/agentnative-cli/issues/new?template=add-tool-to-registry.yml):** propose a CLI for inclusion on the anc.dev/scorecards leaderboard. Include the install command, the source repo, and (optionally) the result of a fresh `anc audit` run. For full routing guidance, see the spec repo's [CONTRIBUTING.md](https://github.com/brettdavies/agentnative/blob/main/CONTRIBUTING.md). ## Colophon Built with Cloudflare Workers. Typeset in Uncut Sans and Monaspace Xenon. Source: [github.com/brettdavies/agentnative-site](https://github.com/brettdavies/agentnative-site). --- # Agent-native badge Source: https://anc.dev/badge Canonical-Markdown: https://anc.dev/badge.md # Agent-native badge The agent-native badge is a small SVG that lets a CLI author advertise agent-native conformance on their tool's README. It links to the live scorecard for that tool, so any reader can follow the claim to current evidence rather than trust a self-declaration. This page is the convention. It defines what claiming the badge means, what it does not mean, and how the URL behaves when a tool's score changes. ## What the badge looks like The badge is rendered at build time and served as a static SVG. The site does not use shields.io, has no third-party render dependency, and requires no account. The renderer is [`badge-maker`](https://www.npmjs.com/package/badge-maker), the same library shields.io uses internally. Visually identical output, fully self-hosted. For a tool named ``: ```text https://anc.dev/badge/.svg ``` The label reads `agent-native vMAJOR.MINOR` (the spec version the score is rooted in); the message is the rounded percent score; the color tracks the cohort band the score falls into (see [Color bands](#color-bands) below). ## How to embed it ```markdown [![agent-native](https://anc.dev/badge/.svg)](https://anc.dev/score/) ``` Replace `` with the tool's slug: the same name that appears on the leaderboard and in the registry. The badge links to the tool's per-tool scorecard page so a reader who clicks lands on the live evidence. Per-tool scorecard pages whose tool clears the eligibility floor render this snippet inline, ready to copy. ## Eligibility: the floor A tool may legitimately embed the badge when its score is **70% or higher**. The floor sits at 70: the entry to the lowest eligible cohort band (Qualified). A tool that clears it has taken agent-readiness seriously enough to earn a public signal, and the band above the floor (Solid, Strong, Exemplary) tells a reader how far past the bar the tool sits. A tool below the floor can still link to its scorecard page (that is the public-by-default posture of the standard), but should not embed the badge as a quality signal until it clears 70. The floor is enforced by the per-tool scorecard page, not by the SVG endpoint. The SVG is rendered for every scored tool regardless of score. This is intentional: a tool that already embedded the badge should see the visual color shift if its score regresses, not a 404. ## Score format: `XX%` The score on the badge is the same behavioral-layer score the leaderboard reports, rounded to the nearest integer percent. The [methodology](https://anc.dev/methodology#how-a-score-is-computed) defines the formula. `91/100` and `6/8 principles` were both considered and rejected. The rounded percent reads cleanest at badge size and matches the leaderboard's score column so a reader sees the same number across surfaces. ## Color bands The badge color tracks the cohort band the score falls into. The four bands at or above the floor are eligible; below the floor, the color shifts to a warm warning. The band thresholds are the spec-side contract; the colors are a site choice. | Score | Band | Color | What it means | | -------- | ----------- | ------ | ------------------------------- | | 85–100 | Exemplary | navy | Eligible; the strongest cohort | | 80–84 | Strong | teal | Eligible; broad conformance | | 75–79 | Solid | green | Eligible; solid agent-readiness | | 70–74 | Qualified | ochre | Eligible; clears the floor | | 50–69 | Below floor | orange | Not eligible; meaningful gaps | | Under 50 | Below floor | red | Not eligible; significant gaps | Tools below the floor still receive a rendered SVG so an embedded badge stays honest after a regression. ## Version pinning: URL always-latest, label cites the spec The URL `/badge/.svg` always reflects the tool's most recent score against the most recent published spec. The spec version baseline is carried in the badge **label** (e.g., `agent-native v0.3`), not in the URL. This is a deliberate trust-and-verify choice. A tool author embeds the badge once; the score and the spec-version label update on every site build. If the spec moves to v0.4 and a tool's score drops because a new MUST landed, the badge reflects the drop the next time it's served. If the URL pinned a spec version (`/badge//v0.3.svg`), readers would be looking at a snapshot — exactly what trust-and-verify says not to do. The trade-off: there is no "I was 100% conformant on v0.3" archival URL. That is by design. The historical record lives in [the scorecard JSON archive](https://github.com/brettdavies/agentnative-site/tree/main/scorecards), which carries the spec version inside each file. The badge surface is for the present, not the past. ## Honesty expectation Self-grading is acceptable. The badge URL must resolve to a scorecard that anyone can re-run. That is the whole story. In practice this means: - The tool must be listed in the [registry](https://github.com/brettdavies/agentnative-site/blob/main/registry.yaml). - The scorecard must be a real `anc audit --output json` run, committed under [`scorecards/`](https://github.com/brettdavies/agentnative-site/tree/main/scorecards). - Anyone reading the badge can run `anc audit --command ` locally and arrive at the same number, modulo scorecard-staleness. See the regression policy below. If the live re-run produces a different score than the badge, the live re-run wins. The badge is a pointer, not an authority. ## Regression policy If a tool regresses below the floor, no separate action is required from the tool author. The next site build will render a below-floor color in place of the eligible band, and the per-tool scorecard page will replace the embed snippet with a "top issues to address" hint. There is no maintainer takedown, no embargo, no retroactive edit of historical commits in the tool's README. This is the core promise: the badge is an outbound link, not a stamp. Embedding it is permission for the world to audit your work continuously, not a one-time award. ## Claiming the badge 1. Get on the [leaderboard](https://anc.dev/scorecards): file a registry entry per [the registry README](https://github.com/brettdavies/agentnative-site/blob/main/registry.yaml). The site auto-discovers the latest scorecard for each registry entry on every build. 2. Run `anc audit --command --output json > scorecards/-v.json` and commit the result. 3. When your tool's row on the leaderboard reads 70% or higher, the per-tool page at `/score/` renders the embed snippet inline. Copy it into your README. That is the whole flow. The convention is intentionally narrow. ## Related - [Methodology](https://anc.dev/methodology): how scores are computed and what the audience signal does and does not claim - [Scorecard schema](https://anc.dev/scorecard-schema): the shape of the underlying JSON - [Leaderboard](https://anc.dev/scorecards): every scored tool, sortable - [Install `anc`](https://anc.dev/install): the CLI that produces scorecards --- # Changelog Source: https://anc.dev/changelog Canonical-Markdown: https://anc.dev/changelog.md # Changelog Reverse-chronological record of changes to the agent-native CLI standard. The spec uses semver-adjacent versioning: MAJOR changes break citation IDs or remove MUSTs; MINOR changes add MUSTs or promote SHOULDs; PATCH changes edit prose without shifting requirements. See [Versioning](https://anc.dev/about) for the full policy. ## Unreleased Initial publication of the standard. Seven principles (P1 through P7) defining agent-native CLI behavior, enforced by RFC 2119 requirement tiers. Companion linter [`agentnative`](https://anc.dev/audit) scores any CLI against them. - P1: Non-Interactive by Default - P2: Structured, Parseable Output - P3: Progressive Help Discovery - P4: Fail Fast with Actionable Errors - P5: Safe Retries and Explicit Mutation Boundaries - P6: Composable and Predictable Command Structure - P7: Bounded, High-Signal Responses --- # Contribute Source: https://anc.dev/contribute Canonical-Markdown: https://anc.dev/contribute.md # Contribute The agent-native CLI spec is `status: active` because the contracts are stable enough to cite, not because anything is locked. The pressure-test mechanism is how the spec revises a position when a finding warrants it. This page is the navigation across the four repos that make up the project, plus the honest expectations on response time. ## What kinds of contribution are welcome Three tiers, all welcome, none required. The shape of the contribution determines the intake. | Tier | What | Where | Time | | --------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | ------------------------------------------------------------------ | -------- | | **1. Signal** | A finding against a principle's wording, a missing citation, a contradiction between two principles, a false positive in `anc`, a broken link on the site, a bundle content issue | A repo-specific issue template (see "Per-repo intake" below) | ~5 min | | **2. Proposal** | A new principle the spec is missing, a MUST/SHOULD tier change with rationale, a counter-example that breaks an applicability clause, a new language auditor design, a new host runtime for the skill bundle | An issue with the full case in the body, against the relevant repo | ~1-2 hrs | | **3. Code** | A new language auditor for `anc`, a tool scoring submission for the leaderboard, a site or skill-bundle improvement, a governance or workflow PR | A pull request against the relevant repo's `dev` branch | Variable | ## Per-repo intake Each repo handles a different layer of the project. File against the one that matches the contribution's shape. ### Spec: [agentnative](https://github.com/brettdavies/agentnative) The principle text, the requirement IDs, the versioning policy. Pressure-tests against the standard live here. - [Pressure-test a principle](https://github.com/brettdavies/agentnative/issues/new?template=pressure-test.yml) (Tier 1 or 2) - [Ask a spec question](https://github.com/brettdavies/agentnative/issues/new?template=spec-question.yml) (Tier 1) - [Submit a grading finding](https://github.com/brettdavies/agentnative/issues/new?template=grading-finding.yml) (Tier 1 or 2): spec-feedback derived from scoring a real CLI against the standard - [`CONTRIBUTING.md`](https://github.com/brettdavies/agentnative/blob/main/CONTRIBUTING.md) · [`principles/AGENTS.md` § Pressure-test protocol](https://github.com/brettdavies/agentnative/blob/main/principles/AGENTS.md#pressure-test-protocol) ### Linter: [agentnative-cli](https://github.com/brettdavies/agentnative-cli) `anc`, the Rust linter that scores any repo against the spec. The scoring engine, the registry, the language auditors. - [Report a false positive](https://github.com/brettdavies/agentnative-cli/issues/new?template=false-positive.yml) (Tier 1) - [Request a feature](https://github.com/brettdavies/agentnative-cli/issues/new?template=feature-request.yml) (Tier 1 or 2) - [Report a scoring bug](https://github.com/brettdavies/agentnative-cli/issues/new?template=scoring-bug.yml) (Tier 1) - [Add a tool to the registry][add-tool] (Tier 3): propose a CLI for the anc.dev/scorecards leaderboard - [Source repo](https://github.com/brettdavies/agentnative-cli) [add-tool]: https://github.com/brettdavies/agentnative-cli/issues/new?template=add-tool-to-registry.yml ### Site: [agentnative-site](https://github.com/brettdavies/agentnative-site) This site. The leaderboard renderer, the live-scoring loop, the per-tool scorecard pages, the Worker. - [File a site bug](https://github.com/brettdavies/agentnative-site/issues/new?template=site-bug.yml) (Tier 1) - [Source repo](https://github.com/brettdavies/agentnative-site) ### Skill bundle: [agentnative-skill](https://github.com/brettdavies/agentnative-skill) The `agent-native-cli` bundle that agents discover via filesystem convention. The install paths, the host-runtime detection, the SKILL.md prose. - [Source repo + intake](https://github.com/brettdavies/agentnative-skill) ## Response expectations (the honest part) This is a solo-maintainer project. The honest framing: - **Tier 1 and 2** are welcome and get a substantive reply when time allows. A pressure-test that names a specific failure mode, with the reasoning behind it, is the contribution shape that lands fastest. - **Tier 3 PRs** are reviewed when scope and time permit. Real PRs land. No merge-window promise; the queue is what the maintainer can actually read. - **Status flips** are how the spec records work in progress on a finding. A principle moves to `status: under-review` when a substantive pressure-test is being processed, then back to `status: active` once the next MINOR release lands. Visible in the principle file's frontmatter. The standard takes positions because positions are useful. Positions held without willingness to revise them are dogma. Both halves of that are intentional. ## How the revision mechanism works For a Tier 2 proposal that changes a MUST/SHOULD/MAY tier or adds a new principle: 1. The pressure-test issue lands with a specific finding: which requirement, which direction, what failure mode argues for the change. 2. If the finding is substantive, the relevant principle file's `status` flips from `active` to `under-review`. 3. The next MINOR spec release resolves the finding: the prose is revised, `last-revised` updates, status returns to `active`. Or the finding is closed with a documented `[wontfix]` rationale appended to the principle's pressure-test notes section. Full description lives at [`principles/AGENTS.md` § Pressure-test protocol](https://github.com/brettdavies/agentnative/blob/main/principles/AGENTS.md#pressure-test-protocol). ## Adjacent reading - [Spec status lifecycle](https://github.com/brettdavies/agentnative/blob/main/principles/AGENTS.md#pressure-test-protocol) — the `draft → under-review → active → locked` flow - [BRAND.md](https://github.com/brettdavies/agentnative/blob/main/BRAND.md): voice and identity - [CHANGELOG.md](https://github.com/brettdavies/agentnative/blob/main/CHANGELOG.md): what landed when The leaderboard at [`/scorecards`](https://anc.dev/scorecards) is the running answer to "what does the spec catch in practice." --- # Methodology Source: https://anc.dev/methodology Canonical-Markdown: https://anc.dev/methodology.md # Methodology How the ANC 100 leaderboard is built, what each score means, and what the leaderboard does *not* claim to measure. For the field-by-field shape of the underlying JSON, see the [scorecard schema reference](https://anc.dev/scorecard-schema). ## What gets scored Every entry on the [leaderboard](https://anc.dev/scorecards) is the output of `anc audit ` against a real CLI tool, run on Brett's machine, committed to the site repo as JSON, and rendered into the per-tool page at `/score/`. The [registry](https://github.com/brettdavies/agentnative-site/blob/main/registry.yaml) is the single source of truth for which tools are in the set. Adding a tool means filing a registry entry. Removing a tool means filing a registry deletion. There is no other inclusion criterion. ### Contributor flow: registry PR and scorecard PR may land in either order A tool needs two artifacts to appear on the leaderboard: a registry entry (`registry.yaml`) and a scorecard (`scorecards/-v.json`). The build accepts these in either order: - **Editorial-PR-first.** A registry entry without a matching scorecard is a "registry orphan": the build emits a warning and excludes the entry from the leaderboard until a scorecard PR lands. This is the expected steady-state for a freshly-nominated tool. - **Scorecard-PR-first.** A scorecard whose filename slug has no registry entry is a "scorecard orphan": the build emits the symmetric warning and excludes the scorecard from the leaderboard until the editorial PR lands. Both directions surface as a structured CI annotation on the PR (`WARNINGS_JSON: { scorecardOrphans, registryOrphans }`) so reviewers see drift without grepping logs. The build still passes in either orphaned state; the warning is the nudge, not a blocker. Once both halves land, the tool appears on the leaderboard at the next deploy. ## The seven outcomes Each audit resolves to one of seven statuses. Whether a status counts in the denominator is the load-bearing distinction: it decides whether the audit moves the score at all. | Status | Credit | In denominator | Meaning | | --------- | ------ | -------------- | ---------------------------------------------------------------------------------------- | | `pass` | full | yes | Requirement met. | | `warn` | half | yes | A requirement was only partially satisfied. | | `fail` | none | yes | A MUST-tier requirement was not satisfied. | | `opt_out` | none | yes | The tool could implement the requirement but deliberately does not. | | `n_a` | — | no | A conditional requirement whose antecedent is absent: it does not apply to this tool. | | `skip` | — | no | The probe could not measure the property. A linter limitation, not a tool defect. | | `error` | — | no | The audit raised an exception inside `anc`. A bug on the linter side, not a tool defect. | `pass`, `warn`, `fail`, and `opt_out` all count toward the denominator, so a tool is measured against everything it could reasonably be expected to do. `n_a`, `skip`, and `error` drop out of both sides of the ratio, so a audit that does not apply or could not be measured never moves the number in either direction. `opt_out`, `n_a`, and `skip` separate three situations that a single status would blur together: a tool that chose not to ship a feature (`opt_out`), a requirement that does not apply to this tool (`n_a`), and a property the probe could not see (`skip`). Whether a missing feature is a deliberate `opt_out` or a genuine `n_a` is decided per audit by `anc`'s verifier and documented in its source; a disagreement about that call is filed against the [`agentnative` CLI](https://github.com/brettdavies/agentnative-cli/issues), not this site. ## How a score is computed The headline number on each tool's row scores how the shipped binary behaves against the requirements that apply to it. **Scope: behavioral audits only.** Only behavioral-layer requirements, the ones that invoke the binary and observe what it does, enter the headline score. Source-layer and project-layer results are reported on the per-tool page but do not move the number (see [Layers](#layers-behavioral-project-source)). A behavioral score compares every tool on the same ground: what an agent observes when it runs the tool, independent of implementation language. **The denominator** is every behavioral audit whose status is `pass`, `warn`, `fail`, or `opt_out`. `n_a`, `skip`, and `error` are excluded from both numerator and denominator. **The numerator credits each outcome.** A `pass` earns full credit (1.0); a `warn` earns half (0.5), because a partial satisfaction is worth more than none; a `fail` and an `opt_out` earn nothing (0.0). The score is the credit earned over the credit available: ```text score = round(100 × (pass + 0.5 × warn) / (pass + warn + fail + opt_out)) ``` When no behavioral audit applies, the denominator is empty and the score is 0. **Worked example.** A tool with 20 `pass`, 7 `warn`, 0 `fail`, 1 `opt_out`, 1 `n_a`, and 14 `skip` behavioral audits: the denominator is the 28 audits that count, since the `n_a` and the 14 `skip`s drop out. The numerator is 20 + 0.5 × 7 = 23.5. The score is round(100 × 23.5 / 28) = 84. The formula also carries a per-tier weight (see [Requirement tiers](#requirement-tiers)). It is flat today, so it does not change the arithmetic above; it is a published parameter rather than a hard-coded constant, so re-tuning it later is a documented change rather than a silent one. The formula, the tier weights, and the badge floor are held stable for at least six months from publication. ## Requirement tiers RFC 2119 defines three requirement levels, and each scored requirement carries its tier in the scorecard: - **MUST** — required for conformance. A missed MUST is a `fail` (no credit). - **SHOULD** — strongly recommended absent a good reason. A missed SHOULD is a `warn` (half credit). - **MAY** — genuinely optional. A missed MAY is a `warn` (half credit). The status already encodes severity: a missed MUST is scored `fail` (no credit), and a missed SHOULD or MAY is scored `warn` (half credit). The separate per-tier weight is the lever for valuing the tiers differently in the denominator; while it stays flat, missing a SHOULD and missing a MAY move the score by the same amount. ## Conditional requirements Some requirements bind only when an antecedent feature is present. "If a CLI ships `--output json`, it MUST also expose its schema" is a MUST, but only for tools that ship JSON output. The standard models these as conditional requirements with a named antecedent audit. When the antecedent is present, the requirement is evaluated normally. When the antecedent is absent, the requirement is `n_a` and drops out of the score. A tool is never penalized for skipping a requirement whose precondition it never met. The antecedent's own outcome decides what the dependent requirement emits: | Antecedent outcome | Dependent requirement | | -------------------------------- | ------------------------- | | `pass`, `warn`, `fail` (present) | evaluated normally | | `opt_out`, `n_a` (absent) | `n_a` | | `skip`, `error` (unmeasured) | inherits `skip` / `error` | The tier is independent of the condition. A conditional MUST applies with full MUST force when its antecedent is met; a conditional SHOULD applies with full SHOULD force. The antecedent decides *whether* the requirement fires; the tier decides *how much* a miss costs once it does. ## Cohort bands and the badge floor A tool clears the badge floor at a score of **70**. At or above the floor, scores fall into named cohort bands: | Band | Score | | ----------- | -------- | | Exemplary | 85–100 | | Strong | 80–84 | | Solid | 75–79 | | Qualified | 70–74 | | Below floor | under 70 | The band thresholds are part of the standard; the color the site renders for each band is a site choice. A tool at or above 70 may embed the [agent-native badge](https://anc.dev/badge); below 70 it can still link to its scorecard but should not display the badge as a quality signal. See the [badge convention](https://anc.dev/badge) for the embed contract. ## Principles met The **principles met** column counts how many of the eight principles (P1–P8) have *all* their audits passing: no warnings, no failures. A tool can post a 90% score and still meet only four of eight principles, if the misses cluster inside a few principle groups. Both numbers are surfaced because either, alone, hides the shape of the result. The per-tool page is the ground truth. Bonus audits (`CodeQuality` and `ProjectStructure`) are listed on each tool's page but not blended into the score. They are language-specific and would create unfair comparisons across tools. ## What the audience signal is, and is not `anc` classifies each scored tool as one of: - `agent-optimized`: the four signal audits (P1 non-interactive, P2 JSON output, P6 NO_COLOR, P7 quiet) all pass or warn at most once. (One warn allowance reflects the reality that the four signal audits are correlated; a near-conformant tool may miss on one edge, e.g., honoring `NO_COLOR` but not `NO_COLOR=0`. Requiring zero warns would over-penalize otherwise-conformant tools.) - `mixed`: two of the four signal audits warn. - `human-primary`: three or more of the four signal audits warn. - `null` with `audience_reason: "suppressed"`: when the active audit profile suppresses one or more of the four signal audits, the classifier has insufficient input and refuses to label. The per-tool page surfaces the reason so a reader can see *why* the field is empty rather than guessing. The classifier is **informational, not authoritative**. It is a one-line summary derived from a fixed set of four behavioral audits. The per-audit evidence shown alongside is the ground truth. A tool labeled `human-primary` may still be safe to use from an agent in narrow, well-bounded ways. A tool labeled `agent-optimized` may still surprise an agent on a audit the classifier does not look at. When the classifier disagrees with intuition (for example, a tool you consider agent-hostile gets `agent-optimized`), the fix lives in one of two places: 1. The tool fits an exception category that should suppress some audits → file a registry update adding an `audit_profile` (see below). 2. The classifier is missing a signal that ought to count → file an issue against the [`agentnative` CLI](https://github.com/brettdavies/agentnative-cli) proposing a new MUST-level audit. Patching the *site* to override a CLI verdict is never the answer. The site renders what the CLI emits. ## Audit profiles: scoping the standard to a tool's category Some tools intentionally do not satisfy parts of the standard because the standard does not apply to their category. Lazygit is interactive on purpose because it is a TUI. `find` does not emit JSON because POSIX utilities don't. Holding these tools to audits that punish their core design produces a misleading score and a hostile leaderboard. `anc` v0.1.3 exposes four exception categories via `--audit-profile`. The exact suppression set lives in [`SUPPRESSION_TABLE`](https://github.com/brettdavies/agentnative-cli/blob/main/src/principles/registry.rs) in the CLI source and is the contract this site renders against: | Category | Suppresses | Use when... | | ----------------- | ---------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `human-tui` | P1 non-interactive variants + P6 SIGPIPE | Tool's primary mode is an interactive terminal UI (e.g., `lazygit`). TUIs intercept the TTY by design and install their own signal handlers. | | `file-traversal` | (no audits suppressed in v0.1.3) | Tool emits filenames as its output protocol (`fd`, `find`). Today the applicability filter on subcommand-shape audits already produces the right Skip outcome; the table entry is reserved for future audits. | | `posix-utility` | P1 non-interactive variants | Tool predates structured output and follows POSIX conventions (`grep`, `awk`). The no-prompt MUST is satisfied vacuously by the stdin protocol. | | `diagnostic-only` | P5 dry-run | Tool is read-only by design (`nvidia-smi`, `lsof`). Read-write-distinction and force-yes are still uncovered in v0.1.3. | When a tool is scored under an audit profile, the suppressed audits still appear on the per-tool page, tagged **N/A by category** with a pointer to the profile that excluded them. The reader sees what was excluded and why; the audits are not silently removed. Every audit-profile change is a registry change, reviewed in the open. There is no per-tool override that does not show its work. ### Profiles applied to the current registry | Tool | Profile | Why | | ----------- | ---------------- | ----------------------------------------------------------------------------------- | | `lazygit` | `human-tui` | Git TUI - primary mode is full-screen interactive UI | | `gitui` | `human-tui` | Git TUI - parallel project to lazygit | | `tmux` | `human-tui` | Terminal multiplexer - bare invocation attaches/starts an interactive session | | `fzf` | `human-tui` | Interactive fuzzy-match picker over stdin | | `broot` | `human-tui` | Interactive directory-tree browser | | `yazi` | `human-tui` | Interactive file manager - full-screen browse is the primary mode | | `bottom` | `human-tui` | Interactive process/system monitor (htop-class) | | `bandwhich` | `human-tui` | Interactive network bandwidth monitor | | `atuin` | `human-tui` | Interactive shell-history search; bare-binary mode and `atuin search` are TUI-first | | `navi` | `human-tui` | Interactive cheatsheet picker | | `jnv` | `human-tui` | Interactive jq-filter editor over a JSON document | | `fd` | `file-traversal` | Emits filenames as its output protocol; reserved for future suppressions in v0.1.3 | Profiles **not** currently applied to any tool, with the criteria a future entry must meet: - `posix-utility`: Tool predates structured output and follows POSIX-style stdin/stdout conventions. Modern stream processors (`jq`, `yq`, `dasel`, `miller`, etc.) already pass P1 non-interactive audits vacuously, so the suppression is unnecessary and `posix-utility` is not applied. - `diagnostic-only`: Tool can never mutate state by design. Suppresses only P5 dry-run. The current registry's read-only candidates (`procs`, `dust`, `tree`) all pass P5 already, so the profile would be a no-op annotation. It will become useful when P5 grows audits beyond dry-run that warrant skipping for read-only diagnostics. The general rule for adding a profile: **apply it only when an unsuppressed audit is fighting the tool's category, not its design quality**. A TUI legitimately blocks on a TTY; that's a category fact, not a defect. A CLI that *could* be non-interactive but isn't is a defect; no profile applies. ## Layers: behavioral, project, source `anc` runs three layers of audits: - **Behavioral**: invokes the binary and observes what it does — `--help`, `--version`, `--output json`, SIGPIPE, NO_COLOR, exit codes. Language-agnostic. This is the only layer that feeds the headline score. - **Project**: inspects the project tree: `AGENTS.md`, manifest files, recommended dependencies. Language-agnostic. Reported on the per-tool page, not scored. - **Source**: runs ast-grep patterns against source code. Catches `unwrap()`, naked `println!`, missing error types. Rust and Python today, more languages as they ship. Reported on the per-tool page, not scored. Only behavioral results move the headline number. Project- and source-layer results are shown on the per-tool page for context but stay out of the score: blending them would penalize a tool for how many languages the linter covers, or for a project-tree convention, rather than for how the shipped binary behaves to an agent. A behavioral-only score keeps every tool measured on the same ground. Note that P8 (discoverable skill bundles) spans both layers: its bundle-install and related behavioral audits count toward the score, while the presence of the bundle file itself is a project-layer audit that does not. ## Re-running the same audits locally Every score on the leaderboard is reproducible. [Install `anc`](https://anc.dev/install), then run: ```bash anc audit --output json ``` Pass `--audit-profile ` to apply the same suppression set the leaderboard applies. The committed scorecards under [`scorecards/`](https://github.com/brettdavies/agentnative-site/tree/main/scorecards) record the exact CLI version each score was generated from, so anyone can pin to the same `anc` build and reproduce a row exactly. ## Re-scoring and challenges Re-scoring is manual at launch. When a tool ships a release that changes its agent-readiness story: - File an issue on [`agentnative-site`](https://github.com/brettdavies/agentnative-site/issues/new) titled `re-score: ` and link the release notes. The committed scorecard will be regenerated against the new version. - If a tool's category is misclassified (e.g., a TUI is being scored as a general-purpose CLI), file an issue titled `audit-profile: ` with the rationale. Audit-profile changes are registry edits; they ship with the next site deploy. - If a audit itself is wrong (false positives, weak signal, missing edge case), file the issue against the [`agentnative` CLI](https://github.com/brettdavies/agentnative-cli/issues), not this site. Site renders; CLI judges. ## Constructive framing A low score is a snapshot, not a verdict. Each failing audit on a per-tool page links to the principle page that defines the requirement and the fix guidance. The leaderboard exists to make the standard concrete, not to shame tool authors who built before the standard existed. Most of the tools listed here predate `anc` by years. If you maintain one of the tools on the leaderboard and want to improve its score, the per-tool page is your punch list. The audit IDs (`p1-non-interactive`, `p2-json-output`, etc.) are stable and citeable in commits and PRs. --- # Scorecard schema Source: https://anc.dev/scorecard-schema Canonical-Markdown: https://anc.dev/scorecard-schema.md # Scorecard schema A scorecard is the structured output of `anc audit --output json`. The site reads scorecards under [`scorecards/`](https://github.com/brettdavies/agentnative-site/tree/main/scorecards) and renders them at [/scorecards](https://anc.dev/scorecards) and at each tool's `/score/` page. This page documents every field that appears in a scorecard, what it means, and where it comes from. It is intentionally exhaustive: anything in the JSON should be explainable from this page. ## Filename Scorecards are stored on disk as: ```text scorecards/-v.json ``` Where `` matches the registry's `name` field (URL slug) and `` is the SemVer string captured at scoring time. The filename's `` segment is the **canonical version anchor**: the site reads it directly off disk and displays it as the scored version on every per-tool page. The scorecard's `tool.version` field (added in schema 0.4) is informational; when both are present and disagree, the build aborts with an integrity error. The filename never lies. ## Top-level fields ```json { "schema_version": "0.6", "spec_version": "...", "tool": { "name": "...", "binary": "...", "version": "..." }, "anc": { "version": "..." }, "run": { "invocation": "...", "started_at": "...", "duration_ms": 0, "platform": { "os": "...", "arch": "..." } }, "target": { "kind": "...", "path": null, "command": "..." }, "audience": "...", "audience_reason": null, "audit_profile": null, "summary": { ... }, "coverage_summary": { ... }, "badge": { ... }, "results": [ ... ] } ``` | Field | Type | Source | Meaning | | ------------------ | ------------------- | ------------- | ---------------------------------------------------------------------------------------------------------------- | | `schema_version` | string | `anc` emitted | Version of the JSON envelope itself. Pre-1.0; bumped when fields are added, removed, or renamed. Current: 0.6. | | `spec_version` | string | `anc` emitted | Version of the [agentnative spec](https://anc.dev/principles) the run conformed to. Independent of `schema_version`. | | `tool` | object | `anc` emitted | Self-describing identity for the tool that was scored. See [tool](#tool) below. **Added in 0.4.** | | `anc` | object | `anc` emitted | Provenance of the `anc` build that produced the scorecard. See [anc](#anc) below. **Added in 0.4.** | | `run` | object | `anc` emitted | Run-context: what was invoked, when, on what platform. See [run](#run) below. **Added in 0.4.** | | `target` | object | `anc` emitted | What the run was scoring (command vs. binary vs. project). See [target](#target) below. **Added in 0.4.** | | `audience` | string \| null | `anc` emitted | One-line audience classification. See [audience signal](https://anc.dev/methodology#what-the-audience-signal-is-and-is-not). | | `audience_reason` | string \| null | `anc` emitted | Set to `"suppressed"` when an active audit profile blanks the audience signal. Otherwise null. | | `audit_profile` | string \| null | registry | Exception category passed to anc via `--audit-profile`. See [audit profiles](https://anc.dev/methodology#audit-profiles). | | `summary` | object | derived | Tally of how the runner's audits finished. See [summary](#summary) below. | | `coverage_summary` | object | derived | Tally of how many spec requirements the run verified. See [coverage_summary](#coverage_summary) below. | | `badge` | object | derived | Score, eligibility, and the copy-paste embed snippet for this tool. See [badge](#badge) below. **Added in 0.5.** | | `results` | array of result obj | `anc` emitted | One entry per evaluated requirement row. See [results](#results) below. | ## `tool` Self-describing tool identity. Lets a downstream consumer answer "what was scored?" without cross-referencing the registry. ```json "tool": { "name": "rg", "binary": "rg", "version": "ripgrep 15.1.0" } ``` | Field | Type | Meaning | | --------- | -------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `name` | string | The literal `--command` argv passed to anc. For tools where the registry name differs from the binary (e.g., registry `ripgrep` → binary `rg`), this is the binary, not the registry slug. The filename slug owns the registry-name side of the join. | | `binary` | string | Executable name resolved from `$PATH` at scoring time. Equals `tool.name` for command-mode runs except when a tool ships under an alias. | | `version` | string \| null | Best-effort version string. The CLI dumps the first line of ` --version` here without further parsing; it may carry the marketing string ("eza - A modern, maintained replacement for ls"), the full multi-line block, or `null` when the binary doesn't print anything parseable. **The filename's `` is canonical**; this field is a courtesy. | **Build-time invariant:** when `tool.version` contains a SemVer-shaped token (`X.Y` or `X.Y.Z`), it must equal the filename version. Drift fails the build with a parser-asymmetry error: the regen script's `version_extract` snippet and the CLI's internal probe are the only two places that derive a version from the binary, and they must agree. ## `anc` Provenance for the scorecard: which `anc` build produced it. ```json "anc": { "version": "0.2.1" } ``` | Field | Type | Meaning | | --------- | ------ | --------------------------------------------------------------------------------------------------- | | `version` | string | Self-reported version of the `anc` binary that ran the score. Read from `Cargo.toml` at build time. | The per-tool page renders `anc.version`. ## `run` Run-context: the literal invocation, when it ran, how long it took, and what platform it ran on. ```json "run": { "invocation": "anc audit --command rg --output json", "started_at": "2026-04-30T04:18:53.099683344Z", "duration_ms": 53, "platform": { "os": "linux", "arch": "x86_64" } } ``` | Field | Type | Meaning | | --------------- | ------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------ | | `invocation` | string | The verbatim argv that produced this scorecard, joined with single spaces. The per-tool "Reproduce locally" CTA renders this directly for command-mode runs. | | `started_at` | string | RFC 3339 timestamp of when the run started. UTC. Validated as parseable by `Date` at build time. | | `duration_ms` | integer | Wall-clock duration of the entire scoring run in milliseconds. | | `platform.os` | string | OS the binary ran on (`linux`, `darwin`, `windows`). | | `platform.arch` | string | CPU architecture the binary ran on (`x86_64`, `aarch64`, …). | **Security note (`run.invocation`):** for command-mode runs the invocation is the canonical `anc audit --command [--audit-profile ] [--output json]` shape, which is safe to embed publicly. For project-mode runs (`target.kind: "project"`) the invocation may include a local filesystem path (`anc audit ./local/repo`); the site falls back to the synthesized form for those runs to avoid leaking machine-local paths into HTML, markdown, and `/llms-full.txt`. Mirror this gate downstream if you fetch the JSON directly. ## `target` What the run was scoring: a command, a binary on disk, or a project tree. ```json "target": { "kind": "command", "path": null, "command": "rg" } ``` | Field | Type | Meaning | | --------- | -------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `kind` | string | One of `command`, `binary`, or `project`. Drives whether the per-tool page's reproduce CTA renders `run.invocation` verbatim (`command`) or falls back to the synthesized form (others). Future schema-version source-layer scorecards will use `project`. | | `path` | string \| null | Filesystem path when `kind` is `project` or `binary`; `null` for `command`-mode runs. | | `command` | string \| null | The `--command` argv string when `kind` is `command`; `null` otherwise. For command-mode runs this equals `tool.name`. | **Security note (`target.path`):** when `kind` is `project`, this can carry a local directory path (`/home/me/dev/foo`). It is not currently rendered on any per-tool page (every leaderboard entry today is command-mode), but downstream consumers reading the JSON should treat it as machine-local. ## `summary` Counts of how the runner's audits finished, one per evaluated requirement row. Adds up to `total`. ```json "summary": { "total": 18, "pass": 14, "warn": 1, "fail": 0, "opt_out": 1, "n_a": 0, "skip": 2, "error": 0 } ``` | Field | Type | Meaning | | --------- | ------- | --------------------------------------------------------------------------------------------------------------------------------- | | `total` | integer | Requirement rows evaluated on this tool. Equals `pass + warn + fail + opt_out + n_a + skip + error`. | | `pass` | integer | Requirements met with no concerns. | | `warn` | integer | Requirements only partially satisfied (a missed SHOULD or MAY). | | `fail` | integer | MUST-tier requirements not satisfied. | | `opt_out` | integer | Requirements the tool could implement but deliberately does not. **Added in 0.6.** | | `n_a` | integer | Requirements that do not apply to this tool (conditional antecedent absent, or scoped out by an audit profile). **Added in 0.6.** | | `skip` | integer | Audits the probe could not measure. A linter limitation, not a tool defect. | | `error` | integer | The audit crashed and produced no signal. Not evidence of a defect. | The leaderboard score counts `pass`, `warn`, `fail`, and `opt_out` in the denominator and credits `pass` full and `warn` half; `n_a`, `skip`, and `error` are excluded from both sides, and only behavioral-layer rows are scored. The full formula is on the [methodology page](https://anc.dev/methodology#how-a-score-is-computed). ## `coverage_summary` Tally of how much of the agentnative **spec** the run verified. Distinct from `summary`, which counts the runner's audits. One implemented audit can verify zero, one, or many spec requirements; many spec requirements aren't yet covered by any implemented audit. ```json "coverage_summary": { "must": { "total": 23, "verified": 9 }, "should": { "total": 16, "verified": 0 }, "may": { "total": 7, "verified": 0 } } ``` | Field | Type | Meaning | | ----------------- | ------- | ----------------------------------------------------------------------------------------------------- | | `must.total` | integer | Number of MUST-tier requirements in the active spec version. Same for every tool scored at that spec. | | `must.verified` | integer | MUSTs satisfied by passing audits for *this* tool. | | `should.total` | integer | Number of SHOULD-tier requirements in the spec. | | `should.verified` | integer | SHOULDs satisfied by passing audits for this tool. | | `may.total` | integer | Number of MAY-tier requirements in the spec. | | `may.verified` | integer | MAYs satisfied by passing audits for this tool. | If `coverage_summary.must.verified` is below `summary.pass`, that's expected because a single passing audit can map to multiple MUSTs. If `should.verified` and `may.verified` are zero across the board, that's also expected: those tiers are aspirational and will fill in as the runner grows audits mapped to them. ## `badge` The leaderboard score, its eligibility verdict, and the ready-to-paste embed snippet. `anc` derives these from `results` so a consumer can read the score without recomputing it. ```json "badge": { "eligible": true, "score_pct": 84, "embed_markdown": "[![agent-native](https://anc.dev/badge/rg.svg)](https://anc.dev/score/rg)", "scorecard_url": "https://anc.dev/score/rg", "badge_url": "https://anc.dev/badge/rg.svg", "convention_url": "https://anc.dev/badge" } ``` | Field | Type | Meaning | | ---------------- | ------- | ------------------------------------------------------------------------------------------------------------------------------ | | `eligible` | boolean | True when `score_pct` is at or above the badge floor (70). The site reads this field directly; it never recomputes it. | | `score_pct` | integer | The behavioral-layer score, 0–100, rounded. Computed per the [methodology](https://anc.dev/methodology#how-a-score-is-computed). | | `embed_markdown` | string | Copy-paste markdown rendering the badge linked to the scorecard. Surfaced on the per-tool page once the tool clears the floor. | | `scorecard_url` | string | Absolute URL of the per-tool scorecard page. | | `badge_url` | string | Absolute URL of the badge SVG. | | `convention_url` | string | Absolute URL of the [badge convention](https://anc.dev/badge) page. | The site reads `badge.score_pct` and `badge.eligible` straight from the scorecard rather than recomputing them, so the number a third party reads matches the number `anc` produced. See the [badge convention](https://anc.dev/badge) for the embed contract and the cohort-band colors. ## `results` Array of one object per evaluated requirement row. Order is stable across runs of the same `anc` version. One probe can back several requirement rows (for example, the `--version` probe answers both a MUST and a SHOULD); each row is scored independently and carries its own `tier`. ```json { "id": "p3-must-help", "label": "Help flag produces useful output", "group": "P3", "layer": "behavioral", "tier": "must", "status": "pass", "evidence": null, "confidence": "high" } ``` | Field | Type | Meaning | | ------------ | -------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `id` | string | Stable identifier for the requirement row (e.g., `p3-must-help`). Citeable in commits and PRs. | | `label` | string | Human-readable name for the audit. | | `group` | string | Principle group this audit belongs to: `P1` through `P8`. Drives the **principles met** column on the leaderboard. | | `layer` | string | `behavioral`, `project`, or `source`. Only `behavioral` rows feed the score. See [layers](https://anc.dev/methodology#layers-behavioral-project-source). | | `tier` | string | `must`, `should`, or `may`. The requirement's RFC 2119 level, carried per row so the score is computable from the scorecard alone. **Added in 0.6.** | | `status` | string | `pass`, `warn`, `fail`, `opt_out`, `n_a`, `skip`, or `error`. Definitions match the [`summary` table](#summary) above. `opt_out` and `n_a` added in 0.6. | | `evidence` | string \| null | Short explanation when the status is not a clean `pass`. Often names the suppressing audit profile, the unmet antecedent, or the input that triggered the audit. | | `confidence` | string | `high`, `medium`, or `low`. Reflects how directly the audit observed the property: direct flag presence is high; inference from `--help` text is lower. | ### `status` semantics in detail - `pass`: Requirement met. Full credit. - `warn`: Requirement partially satisfied (a missed SHOULD or MAY). Half credit. - `fail`: A MUST-tier requirement was not satisfied. No credit, counts in the denominator. - `opt_out`: The tool could implement the requirement but deliberately does not. No credit, counts in the denominator. - `n_a`: The requirement does not apply: a conditional antecedent is absent, or an audit profile scopes it out. Excluded from the score. - `skip`: The probe could not measure the property: a linter limitation, not a tool defect (for example, a JSON-output audit that detected `--output` but could not validate the payload through safe probes). `evidence` says why. - `error`: Audit tried to run and crashed before producing a verdict. Treated as no-signal, not a defect. ## What is *not* in the scorecard (yet) The site is transparent about gaps that future schema bumps may fill. Schema 0.4 closed the tool-identity / generated-at gap (see `tool`, `anc`, `run`, `target` above). Still outstanding today: - **Per-audit timing.** `run.duration_ms` is the wall-clock total for the run, not per-audit. Individual audit timings are observable from the runner's stdout but not captured in the JSON. - **Editorial fields inside the scorecard.** Language, creator, description, install, and repo/url remain in the registry. Migrating them into the scorecard would let the registry shrink to a name list; deferred to a future schema bump. When the `anc` CLI grows fields for any of the above, the schema page above will be updated and `schema_version` will bump. ## Why is `audience` `null` for some tools? When a tool is scored with `audit_profile: human-tui`, anc suppresses one or more of the four signal audits the audience classifier consumes. With incomplete inputs, the classifier emits `audience: null` and sets `audience_reason: "suppressed"` rather than guess. This is correct, intentional, and shared across every TUI on the leaderboard. The [methodology page](https://anc.dev/methodology#what-the-audience-signal-is-and-is-not) covers the full classifier definition. --- # ANC 100 — Agent-Native CLI Leaderboard Source: https://anc.dev/scorecards Canonical-Markdown: https://anc.dev/scorecards.md # ANC 100 — Agent-Native CLI Leaderboard Automated agent-readiness scores for real CLI tools, scored against the [eight principles](https://anc.dev/). | # | Tool | Tier | Lang | Score | Principles | |---|------|------|------|-------|------------| | 1 | [anc](https://anc.dev/score/anc) | notable | Rust | 100% | 8/8 | | 2 | [watchexec](https://anc.dev/score/watchexec) | workhorse | Rust | 88% | 5/8 | | 3 | [miniserve](https://anc.dev/score/miniserve) | workhorse | Rust | 85% | 4/8 | | 4 | [delta](https://anc.dev/score/delta) | workhorse | Rust | 84% | 4/8 | | 5 | [bat](https://anc.dev/score/bat) | workhorse | Rust | 84% | 4/8 | | 6 | [fd](https://anc.dev/score/fd) | workhorse | Rust | 82% | 3/8 | | 7 | [bandwhich](https://anc.dev/score/bandwhich) | workhorse | Rust | 82% | 3/8 | | 8 | [just](https://anc.dev/score/just) | workhorse | Rust | 82% | 3/8 | | 9 | [ripgrep](https://anc.dev/score/ripgrep) | workhorse | Rust | 82% | 3/8 | | 10 | [lsd](https://anc.dev/score/lsd) | workhorse | Rust | 81% | 4/8 | | 11 | [make](https://anc.dev/score/make) | workhorse | C | 80% | 3/8 | | 12 | [mise](https://anc.dev/score/mise) | workhorse | Rust | 80% | 4/8 | | 13 | [procs](https://anc.dev/score/procs) | workhorse | Rust | 79% | 3/8 | | 14 | [tealdeer](https://anc.dev/score/tealdeer) | notable | Rust | 79% | 3/8 | | 15 | [ruff](https://anc.dev/score/ruff) | workhorse | Rust | 79% | 3/8 | | 16 | [sd](https://anc.dev/score/sd) | workhorse | Rust | 79% | 3/8 | | 17 | [trivy](https://anc.dev/score/trivy) | workhorse | Go | 78% | 3/8 | | 18 | [eza](https://anc.dev/score/eza) | workhorse | Rust | 77% | 3/8 | | 19 | [hyperfine](https://anc.dev/score/hyperfine) | workhorse | Rust | 77% | 3/8 | | 20 | [jq](https://anc.dev/score/jq) | workhorse | C | 77% | 3/8 | | 21 | [gitui](https://anc.dev/score/gitui) | workhorse | Rust | 77% | 4/8 | | 22 | [rclone](https://anc.dev/score/rclone) | workhorse | Go | 76% | 3/8 | | 23 | [xr](https://anc.dev/score/xr) | notable | Rust | 76% | 2/8 | | 24 | [actionlint](https://anc.dev/score/actionlint) | workhorse | Go | 76% | 3/8 | | 25 | [xh](https://anc.dev/score/xh) | workhorse | Rust | 76% | 2/8 | | 26 | [opencode](https://anc.dev/score/opencode) | agent | TypeScript | 76% | 3/8 | | 27 | [jnv](https://anc.dev/score/jnv) | notable | Rust | 76% | 3/8 | | 28 | [pixi](https://anc.dev/score/pixi) | workhorse | Rust | 75% | 3/8 | | 29 | [lazygit](https://anc.dev/score/lazygit) | workhorse | Go | 75% | 3/8 | | 30 | [gitleaks](https://anc.dev/score/gitleaks) | workhorse | Go | 75% | 3/8 | | 31 | [typst](https://anc.dev/score/typst) | workhorse | Rust | 75% | 2/8 | | 32 | [mods](https://anc.dev/score/mods) | agent | Go | 75% | 3/8 | | 33 | [jj](https://anc.dev/score/jj) | workhorse | Rust | 75% | 2/8 | | 34 | [zoxide](https://anc.dev/score/zoxide) | workhorse | Rust | 74% | 2/8 | | 35 | [docker](https://anc.dev/score/docker) | workhorse | Go | 74% | 2/8 | | 36 | [dust](https://anc.dev/score/dust) | workhorse | Rust | 74% | 2/8 | | 37 | [codex](https://anc.dev/score/codex) | agent | Rust | 74% | 2/8 | | 38 | [ast-grep](https://anc.dev/score/ast-grep) | workhorse | Rust | 74% | 3/8 | | 39 | [nushell](https://anc.dev/score/nushell) | notable | Rust | 73% | 4/8 | | 40 | [curl](https://anc.dev/score/curl) | workhorse | C | 73% | 3/8 | | 41 | [cargo-binstall](https://anc.dev/score/cargo-binstall) | workhorse | Rust | 73% | 3/8 | | 42 | [fzf](https://anc.dev/score/fzf) | workhorse | Go | 73% | 3/8 | | 43 | [uv](https://anc.dev/score/uv) | workhorse | Rust | 73% | 2/8 | | 44 | [git-cliff](https://anc.dev/score/git-cliff) | workhorse | Rust | 73% | 2/8 | | 45 | [cf](https://anc.dev/score/cf) | workhorse | TypeScript | 73% | 2/8 | | 46 | [bottom](https://anc.dev/score/bottom) | workhorse | Rust | 73% | 3/8 | | 47 | [goose](https://anc.dev/score/goose) | agent | Rust | 73% | 2/8 | | 48 | [cmake](https://anc.dev/score/cmake) | workhorse | C++ | 73% | 3/8 | | 49 | [biome](https://anc.dev/score/biome) | workhorse | Rust | 72% | 3/8 | | 50 | [rsync](https://anc.dev/score/rsync) | workhorse | C | 72% | 2/8 | | 51 | [tokei](https://anc.dev/score/tokei) | workhorse | Rust | 72% | 2/8 | | 52 | [age](https://anc.dev/score/age) | workhorse | Go | 72% | 3/8 | | 53 | [deno](https://anc.dev/score/deno) | workhorse | Rust | 72% | 3/8 | | 54 | [yazi](https://anc.dev/score/yazi) | workhorse | Rust | 72% | 3/8 | | 55 | [scc](https://anc.dev/score/scc) | workhorse | Go | 72% | 2/8 | | 56 | [wrangler](https://anc.dev/score/wrangler) | workhorse | TypeScript | 72% | 2/8 | | 57 | [shellcheck](https://anc.dev/score/shellcheck) | workhorse | Haskell | 72% | 3/8 | | 58 | [qmd](https://anc.dev/score/qmd) | notable | TypeScript | 72% | 2/8 | | 59 | [yq](https://anc.dev/score/yq) | workhorse | Go | 71% | 2/8 | | 60 | [bird](https://anc.dev/score/bird) | notable | Rust | 71% | 1/8 | | 61 | [glow](https://anc.dev/score/glow) | workhorse | Go | 71% | 2/8 | | 62 | [pastel](https://anc.dev/score/pastel) | notable | Rust | 71% | 2/8 | | 63 | [doggo](https://anc.dev/score/doggo) | notable | Go | 71% | 2/8 | | 64 | [files-to-prompt](https://anc.dev/score/files-to-prompt) | notable | Python | 71% | 3/8 | | 65 | [terraform](https://anc.dev/score/terraform) | workhorse | Go | 71% | 2/8 | | 66 | [gemini-cli](https://anc.dev/score/gemini-cli) | agent | TypeScript | 71% | 2/8 | | 67 | [navi](https://anc.dev/score/navi) | notable | Rust | 70% | 3/8 | | 68 | [pandoc](https://anc.dev/score/pandoc) | workhorse | Haskell | 70% | 2/8 | | 69 | [broot](https://anc.dev/score/broot) | workhorse | Rust | 70% | 4/8 | | 70 | [starship](https://anc.dev/score/starship) | workhorse | Rust | 70% | 2/8 | | 71 | [claude-code](https://anc.dev/score/claude-code) | agent | TypeScript | 70% | 3/8 | | 72 | [vhs](https://anc.dev/score/vhs) | notable | Go | 70% | 2/8 | | 73 | [ollama](https://anc.dev/score/ollama) | agent | Go | 70% | 2/8 | | 74 | [miller](https://anc.dev/score/miller) | workhorse | Go | 69% | 3/8 | | 75 | [atuin](https://anc.dev/score/atuin) | notable | Rust | 69% | 2/8 | | 76 | [git](https://anc.dev/score/git) | workhorse | C | 69% | 2/8 | | 77 | [llm](https://anc.dev/score/llm) | notable | Python | 68% | 3/8 | | 78 | [sqlite-utils](https://anc.dev/score/sqlite-utils) | notable | Python | 68% | 2/8 | | 79 | [gum](https://anc.dev/score/gum) | notable | Go | 68% | 1/8 | | 80 | [nvidia-smi](https://anc.dev/score/nvidia-smi) | workhorse | C | 68% | 2/8 | | 81 | [gh](https://anc.dev/score/gh) | workhorse | Go | 68% | 3/8 | | 82 | [act](https://anc.dev/score/act) | workhorse | Go | 67% | 2/8 | | 83 | [datasette](https://anc.dev/score/datasette) | notable | Python | 67% | 3/8 | | 84 | [flyctl](https://anc.dev/score/flyctl) | workhorse | Go | 66% | 2/8 | | 85 | [aws-cli](https://anc.dev/score/aws-cli) | workhorse | Python | 66% | 3/8 | | 86 | [supabase](https://anc.dev/score/supabase) | workhorse | Go | 66% | 2/8 | | 87 | [direnv](https://anc.dev/score/direnv) | workhorse | Go | 66% | 2/8 | | 88 | [xsv](https://anc.dev/score/xsv) | workhorse | Rust | 65% | 1/8 | | 89 | [kubectl](https://anc.dev/score/kubectl) | workhorse | Go | 65% | 3/8 | | 90 | [bun](https://anc.dev/score/bun) | workhorse | Zig | 65% | 2/8 | | 91 | [cosign](https://anc.dev/score/cosign) | workhorse | Go | 64% | 2/8 | | 92 | [helm](https://anc.dev/score/helm) | workhorse | Go | 63% | 2/8 | | 93 | [ffmpeg](https://anc.dev/score/ffmpeg) | workhorse | C | 63% | 2/8 | | 94 | [dasel](https://anc.dev/score/dasel) | workhorse | Go | 60% | 1/8 | | 95 | [shell-gpt](https://anc.dev/score/shell-gpt) | agent | Python | 58% | 2/8 | | 96 | [tmux](https://anc.dev/score/tmux) | workhorse | C | 54% | 2/8 | --- # Spec Coverage Matrix Source: https://anc.dev/coverage Canonical-Markdown: https://anc.dev/coverage.md # Spec Coverage Matrix 59 requirements: 56 covered, 3 uncovered. | Level | Total | Covered | Uncovered | |-------|-------|---------|-----------| | MUST | 28 | 28 | 0 | | SHOULD | 21 | 18 | 3 | | MAY | 10 | 10 | 0 | ## [P1: Non-Interactive by Default](https://anc.dev/p1) | Level | Requirement | Applicability | Verified by | |-------|-------------|---------------|-------------| | MUST | Every flag settable via environment variable (falsey-value parser for booleans). | Universal | `p1-env-hints`, `p1-env-flags-source` | | MUST | When stdin is not a TTY or `--no-interactive` is set, every blocking-input surface (prompt libraries, read-line, TUI init) resolves from defaults/stdin or exits with an actionable error. | Universal | `p1-non-interactive`, `p1-flag-existence`, `p1-non-interactive-source` | | MUST | Headless authentication path (`--no-browser` / OAuth Device Authorization Grant). | CLI authenticates against a remote service | `p1-headless-auth` | | MUST | Sensitive inputs are readable via stdin or a `--*-file` flag; flag-value and env-var inputs MAY exist for convenience but MUST NOT be the only path. | CLI accepts secret material (tokens, passwords, keys) as input | `p1-secret-non-leaky-path` | | SHOULD | Auto-detect non-interactive context via TTY detection; suppress prompts when stderr is not a terminal. | Universal | `p1-tty-detection-source` | | SHOULD | Document default values for prompted inputs in `--help` output. | Universal | `p1-defaults-in-help` | | MAY | Rich interactive experiences (spinners, progress bars, menus) when TTY is detected and `--no-interactive` is not set. | Universal | `p1-rich-tui` | ## [P2: Structured, Parseable Output](https://anc.dev/p2) | Level | Requirement | Applicability | Verified by | |-------|-------------|---------------|-------------| | MUST | `--output` flag selects format with `json` and `jsonl` as canonical machine-readable values; `text` is the default human-facing form. | Universal | `p2-json-output`, `p2-structured-output` | | MUST | Data goes to stdout; diagnostics/progress/warnings go to stderr, never interleaved. | Universal | `p2-output-module` | | MUST | Exit codes are structured and documented (0 success, 1 general, 2 usage, 77 auth, 78 config). | Universal | `p2-structured-exit-codes` | | MUST | When `--output json` is active, errors are emitted as JSON (to stderr) with at least `error`, `kind`, and `message` fields. | Universal | `p2-json-errors` | | MUST | CLIs that emit structured output expose the output schema via a `schema` subcommand or `--schema` flag: runtime-discoverable, with a documented format identifier. | Conditional | `p2-schema-print` | | SHOULD | JSON output uses a consistent envelope (a top-level object with predictable keys) across every command. | Universal | `p2-consistent-envelope` | | SHOULD | Output schemas are also exported to a stable file path (e.g., `schema/.json`) so CI/static-analysis consumers pin without invoking the tool. | Conditional | `p2-schema-file` | | SHOULD | `--json` and `--jsonl` are accepted as aliases for `--output json` and `--output jsonl`; the short forms work alongside the canonical enum. | Universal | `p2-json-aliases` | | MAY | Additional output formats (CSV, TSV, YAML) beyond the core three. | Universal | `p2-more-formats` | | MAY | `--raw` flag for unformatted output suitable for piping to other tools. | Universal | `p2-raw-flag` | ## [P3: Progressive Help Discovery](https://anc.dev/p3) | Level | Requirement | Applicability | Verified by | |-------|-------------|---------------|-------------| | MUST | Every subcommand ships at least one concrete invocation example (`after_help` in clap). | CLI uses subcommands | `p3-subcommand-examples` | | MUST | The top-level command ships 2–3 examples covering the primary use cases. | Universal | `p3-help` | | MUST | Top-level `--version` prints a non-empty version line and exits 0. | Universal | `p3-version` | | SHOULD | A short version alias (`-V`, `-v`, or `-version`) accompanies `--version` for fast version probes. | Universal | `p3-version` | | SHOULD | Examples show human and agent invocations side by side (text then `--output json` equivalent). | Universal | `p3-paired-examples` | | SHOULD | Short `about` for command-list summaries; `long_about` reserved for detailed descriptions visible with `--help`. | Universal | `p3-about-long-about` | | MAY | Dedicated `examples` subcommand or `--examples` flag for curated usage patterns. | Universal | `p3-examples-subcommand` | ## [P4: Fail-Fast, Actionable Errors](https://anc.dev/p4) | Level | Requirement | Applicability | Verified by | |-------|-------------|---------------|-------------| | MUST | Parse arguments with `try_parse()` instead of `parse()` so `--output json` can emit JSON parse errors. | Universal | `p4-try-parse` | | MUST | Error types map to distinct exit codes (0, 1, 2, 77, 78). | Universal | `p4-bad-args`, `p4-exit-codes` | | MUST | Every error message names the failure, the cause, and a concrete remediation (a command or a value, not a hint to consult docs). | Universal | `p4-actionable-errors` | | SHOULD | Error types use a structured enum (via `thiserror` in Rust) with variant-to-kind mapping for JSON serialization. | Universal | `p4-error-module`, `p4-error-types` | | SHOULD | Config and auth validation happen before any network call, failing at the earliest possible point. | CLI makes network calls | UNCOVERED | | SHOULD | Error output respects `--output json`: JSON-formatted errors go to stderr when JSON output is selected. | Universal | `p4-json-error-output` | | SHOULD | When rejecting input against an enum or fixed-allowed-values set, the error message includes the valid set. | CLI rejects input against a closed set | `p4-enumerate-valid-set`, `p4-enumerate-valid-set` | ## [P5: Safe Retries & Mutation Boundaries](https://anc.dev/p5) | Level | Requirement | Applicability | Verified by | |-------|-------------|---------------|-------------| | MUST | Destructive operations (delete, overwrite, bulk modify) require an explicit `--force` or `--yes` flag. | CLI has destructive operations | `p5-force-yes` | | MUST | The distinction between read and write commands is clear from the command name and help text alone. | CLI has both read and write operations | `p5-read-write-distinction` | | MUST | A `--dry-run` flag is present on every write command; dry-run output respects `--output json`. | CLI has write operations | `p5-dry-run` | | SHOULD | Write operations are idempotent where the domain allows it: running the same command twice produces the same result. | CLI has write operations | UNCOVERED | ## [P6: Composable, Predictable Command Structure](https://anc.dev/p6) | Level | Requirement | Applicability | Verified by | |-------|-------------|---------------|-------------| | MUST | SIGPIPE is handled so piping to `head`/`tail` does not crash the process (Rust example below; Python/Go/Node have language-specific equivalents). | Universal | `p6-sigpipe` | | MUST | Long-running operations handle SIGTERM gracefully: flush or roll back partial writes, release locks, exit non-zero within a bounded window. Next invocation succeeds without manual cleanup. | CLI has long-running operations | `p6-sigterm`, `p6-sigterm` | | MUST | TTY detection plus support for `NO_COLOR` and `TERM=dumb`: color codes suppressed when stdout/stderr is not a terminal. | Universal | `p6-no-color-behavioral`, `p6-no-color`, `p6-no-color` | | MUST | Shell completions available via a `completions` subcommand (Tier 1 meta-command, needs no config/auth/network). | Universal | `p6-completions` | | MUST | Network CLIs ship a `--timeout` flag with a sensible default (e.g., 30 seconds). | CLI makes network calls | `p6-timeout` | | MUST | If the CLI uses a pager (`less`, `more`, `$PAGER`), it supports `--no-pager` or respects `PAGER=""`. | CLI invokes a pager for output | `p6-no-pager-behavioral`, `p6-no-pager` | | MUST | Agentic flags (`--output`, `--quiet`, `--no-interactive`, `--timeout`) propagate to every subcommand (e.g., `global = true` in clap). | CLI uses subcommands | `p6-global-flags` | | SHOULD | Commands that accept input read from stdin when no file argument is provided. | CLI has commands that accept input data | `p6-stdin-input` | | SHOULD | Subcommand naming follows a consistent `noun verb` or `verb noun` convention throughout the tool. | CLI uses subcommands | `p6-consistent-naming` | | SHOULD | Three-tier dependency gating: Tier 1 (meta) needs nothing, Tier 2 (local) needs config, Tier 3 (network) needs config + auth. | Universal | UNCOVERED | | SHOULD | Operations are modeled as subcommands, not flags (`tool search "q"`, not `tool --search "q"`). | CLI performs multiple distinct operations | `p6-subcommand-operations` | | MAY | `--color auto|always|never` flag for explicit color control beyond TTY auto-detection. | Universal | `p6-color-flag` | | MAY | Subcommand verbs MAY follow community-standard names (`get`/`list`/`create`/`update`/`delete`); flag spellings MAY follow widely-used canonical forms (`--force`, `--yes`, `--limit`, `--quiet`, `--verbose`). | CLI uses subcommands | `p6-standard-names` | ## [P7: Bounded, High-Signal Responses](https://anc.dev/p7) | Level | Requirement | Applicability | Verified by | |-------|-------------|---------------|-------------| | MUST | A `--quiet` flag suppresses non-essential output; only requested data and errors appear. | Universal | `p7-quiet` | | MUST | List operations clamp to a documented default maximum; when truncated, indicate it (`"truncated": true` in JSON, stderr note in text). | CLI has list-style commands | `p7-output-clamping` | | SHOULD | A `--verbose` flag (or `-v` / `-vv`) escalates diagnostic detail when agents need to debug failures. | Universal | `p7-verbose` | | SHOULD | A `--limit` or `--max-results` flag lets callers request exactly the number of items they want. | CLI has list-style commands | `p7-limit` | | SHOULD | A `--timeout` flag bounds execution time so agents are not blocked indefinitely. | Universal | `p7-timeout-behavioral` | | MAY | Cursor-based pagination flags (`--after`, `--before`) for efficient traversal of large result sets. | CLI returns paginated results | `p7-cursor-pagination` | | MAY | Automatic verbosity reduction in non-TTY contexts (same behavior `--quiet` explicitly requests). | Universal | `p7-auto-verbosity` | ## [P8: Discoverable Through Agent Skill Bundles](https://anc.dev/p8) | Level | Requirement | Applicability | Verified by | |-------|-------------|---------------|-------------| | MUST | When a skill bundle exists, the CLI provides an install path (`tool skill install []`) that registers the bundle with installed agent runtimes. | Conditional | `p8-bundle-install` | | SHOULD | CLIs ship a top-level agent-discoverable markdown bundle (`AGENTS.md`, `SKILL.md`, or equivalent) with YAML frontmatter naming the tool and capability summary. | Universal | `p6-agents-md`, `p8-bundle-exists` | | MAY | An `--all` mode auto-detects installed runtimes (Claude Code, Cursor, Codex, OpenCode, etc.) and installs across all. | Conditional | `p8-install-all` | | MAY | An update/upgrade subcommand (`tool skill update`) pulls the latest bundle version. | Conditional | `p8-bundle-update` | --- # Install agent-native-cli Source: https://anc.dev/skill Canonical-Markdown: https://anc.dev/skill.md # Install agent-native-cli > Build CLI tools that AI agents can operate reliably. One skill, one repo. Choose your host below, run the command, the agent picks up `SKILL.md` on next launch. The same machine-readable manifest is at [`/skill.json`](https://anc.dev/skill.json). ## Choose your host ### Claude Code ```bash git clone --depth 1 https://github.com/brettdavies/agentnative-skill.git ~/.claude/skills/agent-native-cli ``` ### Codex ```bash git clone --depth 1 https://github.com/brettdavies/agentnative-skill.git ~/.codex/skills/agent-native-cli ``` ### Cursor ```bash git clone --depth 1 https://github.com/brettdavies/agentnative-skill.git ~/.cursor/skills/agent-native-cli ``` ### Factory Droid ```bash git clone --depth 1 https://github.com/brettdavies/agentnative-skill.git ~/.factory/skills/agent-native-cli ``` ### Kiro ```bash git clone --depth 1 https://github.com/brettdavies/agentnative-skill.git ~/.kiro/skills/agent-native-cli ``` ### OpenCode ```bash git clone --depth 1 https://github.com/brettdavies/agentnative-skill.git ~/.config/opencode/skills/agent-native-cli ``` ## What this does Clones `https://github.com/brettdavies/agentnative-skill.git` into your host's skills directory. `.git/` is preserved so future updates are a `git pull`. ## Already installed? If the destination directory already holds this skill (origin matches), update in place: ```bash cd && git pull --ff-only ``` If it holds something else, remove it first, then re-run the install command: ```bash rm -rf ``` ## Update ```bash cd && git pull --ff-only ``` To pin a specific release: `git checkout ` after pulling. Tags follow `vX.Y.Z` semver. ## Uninstall ```bash rm -rf ``` ## Trust model Piping a remote shell script into the local shell is the failure mode this install path rejects. Installation runs `git clone` against a specific repository on a specific host — the scripts are open-source and visible at the producer repo before they execute on the user's machine. ## Programmatic Agents fetch [`/skill.json`](https://anc.dev/skill.json) for the canonical manifest — `Content-Type: application/json`, `Accept: text/markdown` returns the JSON unchanged. ---