Tools•checker
GitHub Actions for CI/CD for AI Apps
Validates AI feature tests in CI pipelines by checking for deterministic output patterns and test stability
Try the tool
client runnerValidation Results
Run the tool to see output.
Examples
Basic Test Validation
{
"test_script": "def test_llm_response(): assert 'error' not in generate_text()",
"non_determinism_threshold": "0.7",
"strict_mode": "false"
}Expected output
{ "stability_score": 0.68, "issues": ["Non-deterministic output detected"] }Strict Mode Check
{
"test_script": "def test_prompt_consistency(): assert generate_prompt() == 'version_2'",
"strict_mode": "true"
}Expected output
{ "stability_score": 0.92, "issues": [] }How it works
Analyzes AI test scripts by executing them multiple times, measuring output consistency, and comparing against stability thresholds. Identifies flaky tests caused by non-deterministic LLM behavior while allowing configurable flexibility for acceptable variation.