Tools•checker

GitHub Actions for CI/CD for AI Apps

Validates AI feature tests in CI pipelines by checking for deterministic output patterns and test stability

Try the tool

client runner

AI Test ScriptNon-Determinism ThresholdStrict Output Matching

Validation Results

Run the tool to see output.

Examples

Basic Test Validation

{
  "test_script": "def test_llm_response(): assert 'error' not in generate_text()",
  "non_determinism_threshold": "0.7",
  "strict_mode": "false"
}

Expected output

{ "stability_score": 0.68, "issues": ["Non-deterministic output detected"] }

Strict Mode Check

{
  "test_script": "def test_prompt_consistency(): assert generate_prompt() == 'version_2'",
  "strict_mode": "true"
}

Expected output

{ "stability_score": 0.92, "issues": [] }

How it works

Analyzes AI test scripts by executing them multiple times, measuring output consistency, and comparing against stability thresholds. Identifies flaky tests caused by non-deterministic LLM behavior while allowing configurable flexibility for acceptable variation.

Related tools

GitHub Actions Terraform ArgoCD