Introduction
CritterWatch is a purpose-built production monitoring and management console for distributed systems built on the Critter Stack — Wolverine, Marten, and Polecat. It delivers real-time visibility into service health, message flow, dead letter queues, event store projections, and alerting, all from a single unified interface.
The Problem
Modern distributed systems built on messaging, event sourcing, and CQRS patterns introduce operational challenges that general-purpose monitoring tools were never designed to address.
Event-driven architectures have unique blind spots. Event-sourced systems rely on projections and subscriptions to derive read models from append-only event stores. When a projection falls behind the high water mark or stalls entirely, the application continues to accept writes while read models silently go stale. Generic APM tools have no concept of projection lag, subscription health, or event stream continuity.
Dead letter queues are a ticking clock. Every message-driven system produces failed messages. Whether caused by transient faults, schema mismatches, or application bugs, dead-lettered messages represent lost business transactions. Without centralized visibility, teams discover DLQ buildup through customer complaints rather than dashboards. Replaying or discarding those messages typically requires direct database access or custom scripts.
Circuit breakers and back pressure need real-time awareness. Wolverine provides sophisticated error handling — circuit breakers that pause listeners after repeated failures, back pressure that throttles producers when consumers fall behind. These mechanisms protect systems from cascading failure, but operators need to know when they activate, why, and whether the underlying condition has resolved.
Multi-tenant systems multiply complexity. Organizations running multi-tenant Wolverine applications need per-tenant visibility into message processing, projection health, and failure rates. A problem in one tenant's event stream should not require sifting through logs from all tenants to diagnose.
Message routing topology is invisible at runtime. Understanding which handlers process which messages — and how they perform — requires tracing through source code rather than observing the live system.
The Solution
CritterWatch connects directly to your Wolverine-based services through a lightweight observer library. It collects telemetry in real time, stores operational history using Marten event sourcing, and provides a rich web interface for both observing and controlling your distributed system.
Adding CritterWatch to an existing Wolverine application requires a single NuGet package and two lines of configuration. There are no external dependencies beyond the PostgreSQL database your Marten-based services already use.
Key Capabilities
| Capability | Description |
|---|---|
| Service Dashboard | Live health indicators, node/agent status, per-service metrics |
| DLQ Management | Query, filter, replay, edit-and-replay, batch operations |
| Projection Monitoring | Lag tracking, stall detection, rebuild/rewind controls |
| Alerting | Configurable thresholds, full event-sourced lifecycle, auto-resolve |
| Endpoint Management | Pause/restart listeners, circuit breaker visibility, buffer limits |
| Scheduled Messages | View, reschedule, cancel, or edit scheduled messages before delivery |
| Durability Monitor | Inbox/outbox sparklines, persistence queue depth |
| Multi-Tenancy | Dynamic tenant management, per-tenant DLQ and projections |
| Chaos Monkey | Controlled fault injection for resilience testing |
| Message Topology | Visual message routing graph, handler chain visualization |
| Activity Timeline | Real-time audit log of all system events and operator actions |
Architecture Overview
CritterWatch uses a hub-and-spoke architecture. Your monitored services run Wolverine.CritterWatch, a lightweight observer that publishes telemetry via RabbitMQ (or any Wolverine transport) at 1-second intervals. The CritterWatch server receives these updates, projects them into a Marten event store, and relays live updates to the browser over SignalR.
