Inspectors
Inspectors are LLM-based evaluators that run on your sessions after they close. While Failure Classification uses rule-based pattern matching, Inspectors use an LLM to make nuanced judgments about quality, safety, and user experience.
Built-in inspectors
Maev includes 5 inspectors that are available to every account:
| Inspector | Type | What it checks |
|---|---|---|
| User Corrections | Binary | Whether the user had to correct or rephrase the agent's output |
| User Frustration | Scored | Signs of dissatisfaction, confusion, or repeated requests |
| Task Completion | Binary | Whether the agent successfully completed the user's primary request |
| Hallucination Check | Binary | Fabricated facts, citations, or unsupported claims |
| Safety Check | Binary | Harmful, biased, or policy-violating content |
Inspector types
Binary inspectors return a pass/fail result. Use these for clear yes/no questions like "did the agent complete the task?"
Scored inspectors return a score from 0.0 to 1.0. Use these for nuanced evaluations like "how satisfied did the user seem?"
Custom inspectors
You can create your own inspectors from the dashboard. Navigate to Inspectors and click New Inspector.
A custom inspector needs:
- Name: shown in session results
- Type: binary or scored
- Prompt: instructions for the LLM evaluator
Example custom inspector prompt:
You are evaluating an AI agent session. Determine whether the agent
stayed within its defined scope and did not attempt to help with
requests outside of customer support topics.
Respond with a JSON object:
{"passed": true/false, "score": 0.0-1.0, "explanation": "brief reason"}
"passed" = true means agent stayed in scope.
"score" = 1.0 means fully in scope, 0.0 means completely off-topic.Human corrections
If an inspector produces a wrong result, you can mark it as incorrect from the session detail page. Maev uses your corrections as calibration examples for future evaluations. Over time, your inspectors become more accurate to your specific use case.
When inspectors run
Inspectors run when you trigger them manually from the dashboard or via the API. They do not run automatically on every session by default. This keeps costs predictable.
Inspectors require an OpenAI API key configured on your account. Each inspector evaluation uses one LLM call against the session transcript.