If your backend parses LLM responses and relies on specific fields, you need to test that behavior. aicert runs your prompts repeatedly, validates the output against your schema, and alerts you when it changes — before you deploy.
You will see schema failures, consistency drops, and latency regressions before they reach production.
LLMs are probabilistic. Even when you ask for JSON, the output can vary across runs.
If your application expects fields like status, confidence, or action,
your system depends on that JSON shape staying consistent.
That contract can break:
aicert runs your prompt across your test cases multiple times, then measures what actually happens:
You define thresholds. aicert enforces them in CI.
pip install aicert aicert init aicert ci aicert.yaml
For real evaluation, run against your configured LLM providers using your API keys.
Provider: openai:gpt-4.1-mini @temp=0.1 Compliance: 96.0% Consistency: 82.0% P95 latency: 4,820ms Top failures: - missing required field: "action" - extra key not allowed: "reason" - invalid JSON (parse error)
Ensure responses always match the schema your backend expects.
Validate structured outputs from contracts, tickets, or documents.
Detect behavior changes when prompts or models change.
Prevent silent JSON changes from breaking orchestration logic.
Core measures reliability. Pro enforces it in CI.
After purchase you will receive a signed license key by email.
Questions? mfifth@gmail.com
For real evaluation, yes. aicert runs against your configured LLM providers to measure behavior under your actual settings.
No. It applies to any system that depends on JSON from an LLM — APIs, extraction pipelines, classifiers, and automation workflows.
Core tells you when behavior changes. Pro prevents those changes from shipping by locking baselines and failing CI when regressions occur.
No. Pro uses a signed offline license key designed for CI and isolated environments.