Scorecard schema

A scorecard is the structured output of anc audit <binary> --output json. The site reads scorecards under scorecards/ and renders them at /scorecards and at each tool's /score/<name> page. This page documents every field that appears in a scorecard, what it means, and where it comes from. It is intentionally exhaustive: anything in the JSON should be explainable from this page.

Filename

Scorecards are stored on disk as:

scorecards/<name>-v<version>.json

Where <name> matches the registry's name field (URL slug) and <version> is the SemVer string captured at scoring time. The filename's <version> segment is the canonical version anchor: the site reads it directly off disk and displays it as the scored version on every per-tool page. The scorecard's tool.version field (added in schema 0.4) is informational; when both are present and disagree, the build aborts with an integrity error. The filename never lies.

Top-level fields

{
  "schema_version": "0.7",
  "spec_version": "...",
  "tool":   { "name": "...", "binary": "...", "version": "..." },
  "anc":    { "version": "..." },
  "run":    { "invocation": "...", "started_at": "...", "duration_ms": 0,
              "platform": { "os": "...", "arch": "..." } },
  "target": { "kind": "...", "path": null, "command": "..." },
  "audience": "...",
  "audience_reason": null,
  "audit_profile": null,
  "summary": { ... },
  "coverage_summary": { ... },
  "badge": { ... },
  "results": [ ... ]
}

Field	Type	Source	Meaning
`schema_version`	string	`anc` emitted	Version of the JSON envelope itself. Pre-1.0; bumped when fields are added, removed, or renamed. Current: 0.7.
`spec_version`	string	`anc` emitted	Version of the agentnative spec the run conformed to. Independent of `schema_version`.
`tool`	object	`anc` emitted	Self-describing identity for the tool that was scored. See tool below. Added in 0.4.
`anc`	object	`anc` emitted	Provenance of the `anc` build that produced the scorecard. See anc below. Added in 0.4.
`run`	object	`anc` emitted	Run-context: what was invoked, when, on what platform. See run below. Added in 0.4.
`target`	object	`anc` emitted	What the run was scoring (command vs. binary vs. project). See target below. Added in 0.4.
`audience`	string \| null	`anc` emitted	One-line audience classification. See audience signal.
`audience_reason`	string \| null	`anc` emitted	Set to `"suppressed"` when an active audit profile blanks the audience signal. Otherwise null.
`audit_profile`	string \| null	registry	Exception category passed to anc via `--audit-profile`. See audit profiles.
`summary`	object	derived	Tally of how the runner's audits finished. See summary below.
`coverage_summary`	object	derived	Tally of how many spec requirements the run verified. See coverage_summary below.
`badge`	object	derived	Score, eligibility, and the copy-paste embed snippet for this tool. See badge below. Added in 0.5.
`results`	array of result obj	`anc` emitted	One entry per evaluated requirement row. See results below.

`tool`

Self-describing tool identity. Lets a downstream consumer answer "what was scored?" without cross-referencing the registry.

"tool": {
  "name": "rg",
  "binary": "rg",
  "version": "ripgrep 15.1.0"
}

Field	Type	Meaning
`name`	string	The literal `--command` argv passed to anc. For tools where the registry name differs from the binary (e.g., registry `ripgrep` → binary `rg`), this is the binary, not the registry slug. The filename slug owns the registry-name side of the join.
`binary`	string	Executable name resolved from `$PATH` at scoring time. Equals `tool.name` for command-mode runs except when a tool ships under an alias.
`version`	string \| null	Best-effort version string. The CLI dumps the first line of `<binary> --version` here without further parsing; it may carry the marketing string ("eza - A modern, maintained replacement for ls"), the full multi-line block, or `null` when the binary doesn't print anything parseable. The filename's `<version>` is canonical; this field is a courtesy.

Build-time invariant: when tool.version contains a SemVer-shaped token (X.Y or X.Y.Z), it must equal the filename version. Drift fails the build with a parser-asymmetry error: the regen script's version_extract snippet and the CLI's internal probe are the only two places that derive a version from the binary, and they must agree.

`anc`

Provenance for the scorecard: which anc build produced it.

"anc": {
  "version": "0.2.1"
}

Field	Type	Meaning
`version`	string	Self-reported version of the `anc` binary that ran the score. Read from `Cargo.toml` at build time.

The per-tool page renders anc.version.

`run`

Run-context: the literal invocation, when it ran, how long it took, and what platform it ran on.

"run": {
  "invocation": "anc audit --command rg --output json",
  "started_at": "2026-04-30T04:18:53.099683344Z",
  "duration_ms": 53,
  "platform": { "os": "linux", "arch": "x86_64" }
}

Field	Type	Meaning
`invocation`	string	The verbatim argv that produced this scorecard, joined with single spaces. The per-tool "Reproduce locally" CTA renders this directly for command-mode runs.
`started_at`	string	RFC 3339 timestamp of when the run started. UTC. Validated as parseable by `Date` at build time.
`duration_ms`	integer	Wall-clock duration of the entire scoring run in milliseconds.
`platform.os`	string	OS the binary ran on (`linux`, `darwin`, `windows`).
`platform.arch`	string	CPU architecture the binary ran on (`x86_64`, `aarch64`, …).

Security note (run.invocation): for command-mode runs the invocation is the canonical anc audit --command <name> [--audit-profile <X>] [--output json] shape, which is safe to embed publicly. For project-mode runs (target.kind: "project") the invocation may include a local filesystem path (anc audit ./local/repo); the site falls back to the synthesized form for those runs to avoid leaking machine-local paths into HTML, markdown, and /llms-full.txt. Mirror this gate downstream if you fetch the JSON directly.

`target`

What the run was scoring: a command, a binary on disk, or a project tree.

"target": {
  "kind": "command",
  "path": null,
  "command": "rg"
}

Field	Type	Meaning
`kind`	string	One of `command`, `binary`, or `project`. Drives whether the per-tool page's reproduce CTA renders `run.invocation` verbatim (`command`) or falls back to the synthesized form (others). Future schema-version source-layer scorecards will use `project`.
`path`	string \| null	Filesystem path when `kind` is `project` or `binary`; `null` for `command`-mode runs.
`command`	string \| null	The `--command` argv string when `kind` is `command`; `null` otherwise. For command-mode runs this equals `tool.name`.

Security note (target.path): when kind is project, this can carry a local directory path (/home/me/dev/foo). It is not currently rendered on any per-tool page (every leaderboard entry today is command-mode), but downstream consumers reading the JSON should treat it as machine-local.

`summary`

Counts of how the runner's audits finished, one per evaluated requirement row. Adds up to total.

"summary": {
  "total": 18,
  "pass": 14,
  "warn": 1,
  "fail": 0,
  "opt_out": 1,
  "n_a": 0,
  "skip": 2,
  "error": 0
}

Field	Type	Meaning
`total`	integer	Requirement rows evaluated on this tool. Equals `pass + warn + fail + opt_out + n_a + skip + error`.
`pass`	integer	Requirements met with no concerns.
`warn`	integer	Requirements only partially satisfied (a missed SHOULD or MAY).
`fail`	integer	MUST-tier requirements not satisfied.
`opt_out`	integer	Requirements the tool could implement but deliberately does not. Added in 0.6.
`n_a`	integer	Requirements that do not apply to this tool (conditional antecedent absent, or scoped out by an audit profile). Added in 0.6.
`skip`	integer	Audits the probe could not measure. A linter limitation, not a tool defect.
`error`	integer	The audit crashed and produced no signal. Not evidence of a defect.

The leaderboard score counts pass, warn, fail, and opt_out in the denominator and credits pass full and warn half; n_a, skip, and error are excluded from both sides, and only behavioral-layer rows are scored. The full formula is on the methodology page.

`coverage_summary`

Tally of how much of the agentnative spec the run verified. Distinct from summary, which counts the runner's audits. One implemented audit can verify zero, one, or many spec requirements; many spec requirements aren't yet covered by any implemented audit.

"coverage_summary": {
  "must":   { "total": 23, "verified": 9 },
  "should": { "total": 16, "verified": 0 },
  "may":    { "total": 7,  "verified": 0 }
}

Field	Type	Meaning
`must.total`	integer	Number of MUST-tier requirements in the active spec version. Same for every tool scored at that spec.
`must.verified`	integer	MUSTs satisfied by passing audits for this tool.
`should.total`	integer	Number of SHOULD-tier requirements in the spec.
`should.verified`	integer	SHOULDs satisfied by passing audits for this tool.
`may.total`	integer	Number of MAY-tier requirements in the spec.
`may.verified`	integer	MAYs satisfied by passing audits for this tool.

If coverage_summary.must.verified is below summary.pass, that's expected because a single passing audit can map to multiple MUSTs. If should.verified and may.verified are zero across the board, that's also expected: those tiers are aspirational and will fill in as the runner grows audits mapped to them.

`badge`

The leaderboard score, its eligibility verdict, and the ready-to-paste embed snippet. anc derives these from results so a consumer can read the score without recomputing it.

"badge": {
  "eligible": true,
  "score_pct": 84,
  "embed_markdown": "[![agent-native](https://anc.dev/badge/rg.svg)](https://anc.dev/score/rg)",
  "scorecard_url": "https://anc.dev/score/rg",
  "badge_url": "https://anc.dev/badge/rg.svg",
  "convention_url": "https://anc.dev/badge"
}

Field	Type	Meaning
`eligible`	boolean	True when `score_pct` is at or above the badge floor (70). The site reads this field directly; it never recomputes it.
`score_pct`	integer	The behavioral-layer score, 0–100, rounded. Computed per the methodology.
`embed_markdown`	string	Copy-paste markdown rendering the badge linked to the scorecard. Surfaced on the per-tool page once the tool clears the floor.
`scorecard_url`	string	Absolute URL of the per-tool scorecard page.
`badge_url`	string	Absolute URL of the badge SVG.
`convention_url`	string	Absolute URL of the badge convention page.

The site reads badge.score_pct and badge.eligible straight from the scorecard rather than recomputing them, so the number a third party reads matches the number anc produced. See the badge convention for the embed contract and the cohort-band colors.

`results`

Array of one object per evaluated requirement row. Order is stable across runs of the same anc version. One probe can back several requirement rows (for example, the --version probe answers both a MUST and a SHOULD); each row is scored independently and carries its own tier.

{
  "id": "p3-must-help",
  "audit_id": "p3-help",
  "label": "Help flag produces useful output",
  "group": "P3",
  "layer": "behavioral",
  "tier": "must",
  "status": "pass",
  "evidence": null,
  "confidence": "high"
}

Field	Type	Meaning
`id`	string	Stable identifier for the requirement row (e.g., `p3-must-help`). Citeable in commits and PRs.
`audit_id`	string	Identifier of the audit (probe) that produced this row (e.g., `p3-help`). Several requirement rows can share one `audit_id`. Added in 0.7.
`label`	string	Human-readable name for the audit.
`group`	string	Principle group this audit belongs to: `P1` through `P8`. Drives the principles met column on the leaderboard.
`layer`	string	`behavioral`, `project`, or `source`. Only `behavioral` rows feed the score. See layers.
`tier`	string	`must`, `should`, or `may`. The requirement's RFC 2119 level, carried per row so the score is computable from the scorecard alone. Added in 0.6.
`status`	string	`pass`, `warn`, `fail`, `opt_out`, `n_a`, `skip`, or `error`. Definitions match the `summary` table above. `opt_out` and `n_a` added in 0.6.
`evidence`	string \| null	Short explanation when the status is not a clean `pass`. Often names the suppressing audit profile, the unmet antecedent, or the input that triggered the audit.
`confidence`	string	`high`, `medium`, or `low`. Reflects how directly the audit observed the property: direct flag presence is high; inference from `--help` text is lower.

`status` semantics in detail

pass: Requirement met. Full credit.
warn: Requirement partially satisfied (a missed SHOULD or MAY). Half credit.
fail: A MUST-tier requirement was not satisfied. No credit, counts in the denominator.
opt_out: The tool could implement the requirement but deliberately does not. No credit, counts in the denominator.
n_a: The requirement does not apply: a conditional antecedent is absent, or an audit profile scopes it out. Excluded from the score.
skip: The probe could not measure the property: a linter limitation, not a tool defect (for example, a JSON-output audit that detected --output but could not validate the payload through safe probes). evidence says why.
error: Audit tried to run and crashed before producing a verdict. Treated as no-signal, not a defect.

What is not in the scorecard (yet)

The site is transparent about gaps that future schema bumps may fill. Schema 0.4 closed the tool-identity / generated-at gap (see tool, anc, run, target above). Still outstanding today:

Per-audit timing. run.duration_ms is the wall-clock total for the run, not per-audit. Individual audit timings are observable from the runner's stdout but not captured in the JSON.
Editorial fields inside the scorecard. Language, creator, description, install, and repo/url remain in the registry. Migrating them into the scorecard would let the registry shrink to a name list; deferred to a future schema bump.

When the anc CLI grows fields for any of the above, the schema page above will be updated and schema_version will bump.

Why is `audience` `null` for some tools?

When a tool is scored with audit_profile: human-tui, anc suppresses one or more of the four signal audits the audience classifier consumes. With incomplete inputs, the classifier emits audience: null and sets audience_reason: "suppressed" rather than guess. This is correct, intentional, and shared across every TUI on the leaderboard. The methodology page covers the full classifier definition.

Scorecard schema

Filename

Top-level fields

tool

anc

run

target

summary

coverage_summary

badge

results

status semantics in detail