Failure Classification
Maev automatically classifies failures at the end of every session. The classification engine analyzes all captured events and assigns a category, subcategory, severity, and reason.
Failure categories
| Category | Description | Severity |
|---|---|---|
| Context exhaustion | Agent ran out of context window before completing the task | High |
| RAG failure | Retrieval returned empty, irrelevant, or low-quality results | Medium |
| Cost anomaly | Session cost exceeded expected thresholds | Low to Critical |
| Tool failure | A tool or function call returned an error or unexpected result | Medium to High |
| Infinite loop | The same prompt hash appeared 3 or more times within a 20-call window | High |
| Goal drift | Agent deviated from its original task or objective | Medium |
| Prompt injection | User input attempted to override system instructions | Critical |
| Hallucination | Agent produced fabricated or unsupported information | High |
| Latency spike | A single step took longer than expected | Low to High |
| Silent failure | Session completed but produced no meaningful output | Medium |
| Provider error | The model provider returned an error for a request | High |
Severity levels
| Severity | Meaning |
|---|---|
low | Worth tracking but not urgent |
medium | Investigate when you have time |
high | Investigate soon, user experience impacted |
critical | Investigate immediately |
Cost thresholds
Cost anomalies are triggered based on the total cost of a session:
- Over $1.00 per session:
highseverity - Over $5.00 per session:
criticalseverity
Latency thresholds
Latency spikes are triggered based on the duration of a single LLM call within a session:
- Over 10 seconds:
highseverity - Over 30 seconds:
criticalseverity
Loop detection
Infinite loops are detected at the SDK level using rolling prompt hash analysis. If the same prompt hash appears 3 or more times within a single run's rolling 20-call window, the run is stopped immediately. This catches semantic loops even when the exact text varies slightly.
How classification works
- The session closes (agent exits or sends a
session.endevent) - All events for the session are loaded
- The classification engine runs each rule against the events in priority order
- The first matching rule wins and its result is stored
- If a failure is found, an alert is sent
Only one failure category is assigned per session. The engine stops at the first match using priority ordering.