Open-source agent evaluation framework. Define expectations. Catch regressions before customers do. Enterprise-ready for regulated teams. Read the blog post →
GitHub Read the post → Enterprise$ pip install rigr
$ rigr init
$ rigr test --agent my_agent.py
═══ Rigr Eval Report ═══
5/5 cases | 23/24 fields | 95.8%
Baseline comparison: 0 new errors, 0 resolved
✓ PASS
Every model update, prompt change, or retrieval tweak risks degrading your agent. LLM eval tools test chat quality — they don't test whether your support agent still calculates refunds correctly or whether your compliance bot still catches policy violations.
Rigr tests what your agent does, not how it sounds.
JSON schema for what your agent must output. Field-level constraints. No ambiguous "looks good to me."
Inputs with expected outputs. Version-controlled, reviewable. The same cases run every time.
Lock known-good results. Every future run compares against them. Regressions are caught, not discovered.
Per-field accuracy, changelog of what broke and what was fixed. Compliance-ready evidence for your team.
| Capability | DeepEval | Evidently | Rigr |
|---|---|---|---|
| LLM output quality | ✓ | ✓ | ✗ |
| Agent task verification | ✗ | ✗ | ✓ |
| Frozen baseline comparison | ✗ | ✗ | ✓ |
| Per-field regression detection | ✗ | ✗ | ✓ |
| Audit-ready compliance reports | ✗ | ✗ | ✓ |
| Zero agent code changes | ✓ | ✓ | ✓ |
For teams deploying agents in regulated environments. SSO, audit logs, SOC 2, on-prem deployment, priority support. Your compliance team will thank you.
Book a call