Failure Classification

Maev automatically classifies failures at the end of every session. The classification engine analyzes all captured events and assigns a category, subcategory, severity, and reason.

Failure categories

Category	Description	Severity
Context exhaustion	Agent ran out of context window before completing the task	High
RAG failure	Retrieval returned empty, irrelevant, or low-quality results	Medium
Cost anomaly	Session cost exceeded expected thresholds	Low to Critical
Tool failure	A tool or function call returned an error or unexpected result	Medium to High
Infinite loop	The same prompt hash appeared 3 or more times within a 20-call window	High
Goal drift	Agent deviated from its original task or objective	Medium
Prompt injection	User input attempted to override system instructions	Critical
Hallucination	Agent produced fabricated or unsupported information	High
Latency spike	A single step took longer than expected	Low to High
Silent failure	Session completed but produced no meaningful output	Medium
Provider error	The model provider returned an error for a request	High

Severity levels

Severity	Meaning
`low`	Worth tracking but not urgent
`medium`	Investigate when you have time
`high`	Investigate soon, user experience impacted
`critical`	Investigate immediately

Cost thresholds

Cost anomalies are triggered based on the total cost of a session:

Over $1.00 per session: high severity
Over $5.00 per session: critical severity

Latency thresholds

Latency spikes are triggered based on the duration of a single LLM call within a session:

Over 10 seconds: high severity
Over 30 seconds: critical severity

Loop detection

Infinite loops are detected at the SDK level using rolling prompt hash analysis. If the same prompt hash appears 3 or more times within a single run's rolling 20-call window, the run is stopped immediately. This catches semantic loops even when the exact text varies slightly.

How classification works

The session closes (agent exits or sends a session.end event)
All events for the session are loaded
The classification engine runs each rule against the events in priority order
The first matching rule wins and its result is stored
If a failure is found, an alert is sent

Only one failure category is assigned per session. The engine stops at the first match using priority ordering.

Agents and Sessions Autopilot