Alerts
CritterWatch implements a fully event-sourced alert lifecycle. Every alert transition — raised, elevated, reduced, resolved, cleared — is stored as an immutable event in Marten, providing a complete audit trail of system health over time.
Alert Lifecycle
States
| State | Description |
|---|---|
| Raised | Threshold first exceeded. Initial alert created. |
| Elevated | Condition persists beyond the escalation period. Severity increased. |
| Reduced | Condition is improving but not yet resolved. |
| Resolved | Condition has cleared automatically (system-condition alerts only). |
| Cleared | Alert acknowledged and closed by an operator. |
System-condition alerts (DLQ counts, projection lag, circuit breakers) auto-resolve when the underlying condition clears. Operational alerts (node ejection, manual actions) require explicit operator acknowledgment.
Alert Types
Dead Letter Queue Alerts
Triggered when the DLQ count for a service or message type exceeds a threshold:
- Warning — DLQ count exceeds
DeadLetterQueueWarningCount - Critical — DLQ count exceeds
DeadLetterQueueCriticalCount
Auto-resolves when DLQ count drops below the threshold.
Projection Stall Alerts
Triggered when a projection shard stops advancing:
- Warning — lag exceeds
ProjectionLagWarningSeconds - Critical — lag exceeds
ProjectionLagCriticalSeconds, or shard appears fully stalled
Auto-resolves when the projection resumes advancing and lag returns below threshold.
Agent Health Alerts
Triggered when an agent reports unhealthy status:
- Warning — agent unhealthy for
AgentUnhealthyWarningCountconsecutive checks - Critical — agent unhealthy for
AgentUnhealthyCriticalCountconsecutive checks
Auto-resolves when the agent reports healthy.
Circuit Breaker Alerts
Triggered immediately when a circuit breaker trips on any endpoint. Severity is always Critical. Auto-resolves when the circuit breaker resets.
Back Pressure Alerts
Triggered when back pressure activates on any endpoint. Auto-resolves when back pressure lifts.
Alerts Page
The Alerts page shows all active alerts across all services, with filters:
- Status — Raised, Elevated, Reduced, Resolved, Cleared, or All
- Severity — Warning or Critical
- Service — scope to a specific service
- Type — DLQ, Projection, Agent, CircuitBreaker, BackPressure
The Active tab shows only open alerts requiring attention. The History tab shows all alerts including resolved and cleared.
Alert Detail
Click an alert to open the detail panel:
State Timeline
A chronological list of all state transitions for this alert, showing:
- Timestamp of each transition
- From state → To state
- The metric value that triggered the transition (e.g., "DLQ count: 47")
- For operator actions: who took the action and any notes
Remediation Actions
Each alert includes context-appropriate action buttons:
| Alert Type | Available Actions |
|---|---|
| DLQ Alert | Replay All, Discard All, View DLQ |
| Projection Stall | Restart Projection, Rebuild Projection, View Projection |
| Agent Unhealthy | View Service, Eject Node |
| Circuit Breaker | View Endpoint |
Acknowledge / Snooze / Clear
Acknowledge — mark the alert as acknowledged without clearing it. The alert remains visible but is no longer considered "unattended."
Snooze — suppress the alert for a specified duration (1 hour, 4 hours, 24 hours). The alert will resurface after the snooze expires.
Clear — close the alert and record the operator action in the audit trail. A note can be added explaining the resolution.
Threshold Configuration
Alert thresholds are configured when setting up the CritterWatch server. See Configuration Reference for the full options.
Thresholds support three levels of specificity, applied in order:
- Per-message-type — most specific, applies to alerts for a specific message type
- Per-service — applies to all alerts for a specific service
- Global defaults — applies to all alerts across all services
See also the preset profiles:
- Production Profile — strict thresholds, suitable for production environments
- Development Profile — relaxed thresholds, reduces noise during development
Alert thresholds are configured via the CritterWatch UI under Settings > Alert Configuration, or programmatically at startup. See Configuration Reference for details.
