Toolschecker

GitHub Actions for CI/CD for AI Apps

Validates AI feature tests in CI pipelines by checking for deterministic output patterns and test stability

Try the tool

client runner

Validation Results

Run the tool to see output.

Examples

Basic Test Validation

{
  "test_script": "def test_llm_response(): assert 'error' not in generate_text()",
  "non_determinism_threshold": "0.7",
  "strict_mode": "false"
}

Expected output

{ "stability_score": 0.68, "issues": ["Non-deterministic output detected"] }

Strict Mode Check

{
  "test_script": "def test_prompt_consistency(): assert generate_prompt() == 'version_2'",
  "strict_mode": "true"
}

Expected output

{ "stability_score": 0.92, "issues": [] }

How it works

Analyzes AI test scripts by executing them multiple times, measuring output consistency, and comparing against stability thresholds. Identifies flaky tests caused by non-deterministic LLM behavior while allowing configurable flexibility for acceptable variation.

Related tools