Skip to content

Alerts

CritterWatch implements a fully event-sourced alert lifecycle. Every alert transition — raised, elevated, reduced, resolved, cleared — is stored as an immutable event in Marten, providing a complete audit trail of system health over time.

Alert Lifecycle

States

StateDescription
RaisedThreshold first exceeded. Initial alert created.
ElevatedCondition persists beyond the escalation period. Severity increased.
ReducedCondition is improving but not yet resolved.
ResolvedCondition has cleared automatically (system-condition alerts only).
ClearedAlert acknowledged and closed by an operator.

System-condition alerts (DLQ counts, projection lag, circuit breakers) auto-resolve when the underlying condition clears. Operational alerts (node ejection, manual actions) require explicit operator acknowledgment.

Alert Types

Dead Letter Queue Alerts

Triggered when the DLQ count for a service or message type exceeds a threshold:

  • Warning — DLQ count exceeds DeadLetterQueueWarningCount
  • Critical — DLQ count exceeds DeadLetterQueueCriticalCount

Auto-resolves when DLQ count drops below the threshold.

Projection Stall Alerts

Triggered when a projection shard stops advancing:

  • Warning — lag exceeds ProjectionLagWarningSeconds
  • Critical — lag exceeds ProjectionLagCriticalSeconds, or shard appears fully stalled

Auto-resolves when the projection resumes advancing and lag returns below threshold.

Agent Health Alerts

Triggered when an agent reports unhealthy status:

  • Warning — agent unhealthy for AgentUnhealthyWarningCount consecutive checks
  • Critical — agent unhealthy for AgentUnhealthyCriticalCount consecutive checks

Auto-resolves when the agent reports healthy.

Circuit Breaker Alerts

Triggered immediately when a circuit breaker trips on any endpoint. Severity is always Critical. Auto-resolves when the circuit breaker resets.

Back Pressure Alerts

Triggered when back pressure activates on any endpoint. Auto-resolves when back pressure lifts.

Alerts Page

The Alerts page shows all active alerts across all services, with filters:

  • Status — Raised, Elevated, Reduced, Resolved, Cleared, or All
  • Severity — Warning or Critical
  • Service — scope to a specific service
  • Type — DLQ, Projection, Agent, CircuitBreaker, BackPressure

The Active tab shows only open alerts requiring attention. The History tab shows all alerts including resolved and cleared.

Alert Detail

Click an alert to open the detail panel:

State Timeline

A chronological list of all state transitions for this alert, showing:

  • Timestamp of each transition
  • From state → To state
  • The metric value that triggered the transition (e.g., "DLQ count: 47")
  • For operator actions: who took the action and any notes

Remediation Actions

Each alert includes context-appropriate action buttons:

Alert TypeAvailable Actions
DLQ AlertReplay All, Discard All, View DLQ
Projection StallRestart Projection, Rebuild Projection, View Projection
Agent UnhealthyView Service, Eject Node
Circuit BreakerView Endpoint

Acknowledge / Snooze / Clear

Acknowledge — mark the alert as acknowledged without clearing it. The alert remains visible but is no longer considered "unattended."

Snooze — suppress the alert for a specified duration (1 hour, 4 hours, 24 hours). The alert will resurface after the snooze expires.

Clear — close the alert and record the operator action in the audit trail. A note can be added explaining the resolution.

Threshold Configuration

Alert thresholds are configured when setting up the CritterWatch server. See Configuration Reference for the full options.

Thresholds support three levels of specificity, applied in order:

  1. Per-message-type — most specific, applies to alerts for a specific message type
  2. Per-service — applies to all alerts for a specific service
  3. Global defaults — applies to all alerts across all services

See also the preset profiles:

  • Production Profile — strict thresholds, suitable for production environments
  • Development Profile — relaxed thresholds, reduces noise during development

Alert thresholds are configured via the CritterWatch UI under Settings > Alert Configuration, or programmatically at startup. See Configuration Reference for details.

Released under the MIT License.