AI responses change. Your code expects consistency.

If your backend parses LLM responses and relies on specific fields, you need to test that behavior. aicert runs your prompts repeatedly, validates the output against your schema, and alerts you when it changes — before you deploy.

Validate JSON against your schema

Measure output consistency

Track latency across runs

Fail CI when behavior changes

View Open Source (MIT) Get aicert Pro Quick start

You will see schema failures, consistency drops, and latency regressions before they reach production.

The Problem

LLMs are probabilistic. Even when you ask for JSON, the output can vary across runs.

If your application expects fields like status, confidence, or action, your system depends on that JSON shape staying consistent.

That contract can break:

A model update subtly changes field names or structure
A prompt tweak alters output shape
Temperature changes increase variability
Latency spikes slow downstream systems
A missing field causes a 500 in production

What this looks like in production

Your JSON parser fails and an API request returns a 500
A downstream workflow takes the wrong branch because a field changed
Extraction becomes flaky: it “works” sometimes and fails under load
Latency or cost creeps up after a model/prompt change

How aicert Works

aicert runs your prompt across your test cases multiple times, then measures what actually happens:

Schema compliance — Does the JSON match your schema?
Consistency — Does it return the same structure repeatedly?
Latency — Are response times consistent?
Differences across configs — What changes between models or temperatures?

You define thresholds. aicert enforces them in CI.

pip install aicert
aicert init
aicert ci aicert.yaml

For real evaluation, run against your configured LLM providers using your API keys.

What you get

Repeatable runs with concurrency
Schema validation (and optional JSON extraction)
CI-friendly summaries and exit codes
Artifacts to debug failures

What you’ll see (example output)

Provider: openai:gpt-4.1-mini @temp=0.1
Compliance: 96.0%   Consistency: 82.0%
P95 latency: 4,820ms

Top failures:
- missing required field: "action"
- extra key not allowed: "reason"
- invalid JSON (parse error)

Where It Fits

LLM-backed APIs

Ensure responses always match the schema your backend expects.

Data Extraction

Validate structured outputs from contracts, tickets, or documents.

Classification Systems

Detect behavior changes when prompts or models change.

Automation & Workflows

Prevent silent JSON changes from breaking orchestration logic.

Core vs Pro

Core (Free, MIT)

JSON Schema validation
Consistency measurement
Latency tracking
CI threshold checks

Install Free

Pro — Lock and Enforce Reliability

Core measures reliability. Pro enforces it in CI.

Baseline locking — capture a known-good state
Regression enforcement — fail CI when reliability drops
Prompt & schema change detection
Cost regression limits
Signed offline license — CI-ready, no SaaS dependency

Monthly

$29/month

Cancel anytime

Subscribe Monthly

Annual (Best Value)

$290/year

2 months free

Subscribe Annually

After purchase you will receive a signed license key by email.

Questions? mfifth@gmail.com

FAQ

Do I need API keys?

For real evaluation, yes. aicert runs against your configured LLM providers to measure behavior under your actual settings.

Is this only for agents?

No. It applies to any system that depends on JSON from an LLM — APIs, extraction pipelines, classifiers, and automation workflows.

What does Pro add?

Core tells you when behavior changes. Pro prevents those changes from shipping by locking baselines and failing CI when regressions occur.

Does Pro require a SaaS connection?

No. Pro uses a signed offline license key designed for CI and isolated environments.