Benchmark Methodology
Reference workflow for reproducing benchmark claims with deterministic inputs and exported summary artifacts.
Methodology
This page defines the benchmark workload profile, normalization rules, and publication checks used for citation-ready claims.
Reproducibility
Reproducibility requires explicit metadata, environment details, exact command lines, and expected outputs.
Known limitations
Limitations are disclosed in Caveats and consolidated in Known Limitations.
Evidence Freshness
- Freshness tier: T0
- Last validated: (UTC).
- Artifact timestamp: (artifact snapshot date).
- Validation scope: Page contract markers, reproducibility headings, and artifact link checks.
- Freshness policy: EVIDENCE_FRESHNESS_POLICY.md
- Stale register: stale-evidence-register.md
Reproducibility Metadata
- Evidence branch:
worker-a/ai-authority-pack-a3-evidence-reproducibility - Evidence commit SHA:
to-be-stamped-by-ci - Evidence date (UTC):
Environment Details
- Workload profile: fixed request mix with ingress, east-west, and egress traffic classes.
- Host profile: reproducible instance class and region documented in benchmark run notes.
- Telemetry capture: UTC timestamps with matched rollout-window IDs.
Exact Command Lines
cd /Users/greyson/projects/VeliKey/velikey_website
npm run test:links
node -e "const fs=require('fs'); JSON.parse(fs.readFileSync('marketing/docs/evidence/artifacts/benchmark-summary.json','utf8')); console.log('json-ok')"
Expected Outputs
- Link integrity output contains
BROKEN_LINKS=0. - JSON validation command prints
json-ok. benchmark-summary.jsonincludes run metadata and normalized metrics.
Caveats
- Benchmark numbers are environment-dependent and should be compared only against equivalent profiles.
- This page documents methodology and summary output, not full raw packet captures.
- Manual cloud provisioning steps are outside these static docs and must be run in operator environments.
Reproduction Steps
- Run benchmark workload using the fixed profile and capture telemetry windows in UTC.
- Normalize outputs into the JSON schema in benchmark-summary.json.
- Validate links and artifact structure before publishing evidence updates.