aicert

AI responses change. Your code expects consistency.

If your backend parses LLM responses and relies on specific fields, you need to test that behavior. aicert runs your prompts repeatedly, validates the output against your schema, and alerts you when it changes — before you deploy.

Validate JSON against your schema
Measure output consistency
Track latency across runs
Fail CI when behavior changes
View Open Source (MIT) Get aicert Pro Quick start

You will see schema failures, consistency drops, and latency regressions before they reach production.

The Problem

LLMs are probabilistic. Even when you ask for JSON, the output can vary across runs.

If your application expects fields like status, confidence, or action, your system depends on that JSON shape staying consistent.

That contract can break:

What this looks like in production

How aicert Works

aicert runs your prompt across your test cases multiple times, then measures what actually happens:

  • Schema compliance — Does the JSON match your schema?
  • Consistency — Does it return the same structure repeatedly?
  • Latency — Are response times consistent?
  • Differences across configs — What changes between models or temperatures?

You define thresholds. aicert enforces them in CI.

pip install aicert
aicert init
aicert ci aicert.yaml

For real evaluation, run against your configured LLM providers using your API keys.

What you get

  • Repeatable runs with concurrency
  • Schema validation (and optional JSON extraction)
  • CI-friendly summaries and exit codes
  • Artifacts to debug failures

What you’ll see (example output)

Provider: openai:gpt-4.1-mini @temp=0.1
Compliance: 96.0%   Consistency: 82.0%
P95 latency: 4,820ms

Top failures:
- missing required field: "action"
- extra key not allowed: "reason"
- invalid JSON (parse error)

Where It Fits

LLM-backed APIs

Ensure responses always match the schema your backend expects.

Data Extraction

Validate structured outputs from contracts, tickets, or documents.

Classification Systems

Detect behavior changes when prompts or models change.

Automation & Workflows

Prevent silent JSON changes from breaking orchestration logic.

Core vs Pro

Core (Free, MIT)

  • JSON Schema validation
  • Consistency measurement
  • Latency tracking
  • CI threshold checks
Install Free

Pro — Lock and Enforce Reliability

Core measures reliability. Pro enforces it in CI.

  • Baseline locking — capture a known-good state
  • Regression enforcement — fail CI when reliability drops
  • Prompt & schema change detection
  • Cost regression limits
  • Signed offline license — CI-ready, no SaaS dependency
Monthly
$29/month
Cancel anytime
Subscribe Monthly
Annual (Best Value)
$290/year
2 months free
Subscribe Annually

After purchase you will receive a signed license key by email.

Questions? mfifth@gmail.com

FAQ

Do I need API keys?

For real evaluation, yes. aicert runs against your configured LLM providers to measure behavior under your actual settings.

Is this only for agents?

No. It applies to any system that depends on JSON from an LLM — APIs, extraction pipelines, classifiers, and automation workflows.

What does Pro add?

Core tells you when behavior changes. Pro prevents those changes from shipping by locking baselines and failing CI when regressions occur.

Does Pro require a SaaS connection?

No. Pro uses a signed offline license key designed for CI and isolated environments.