What Gets Reported
This page describes what Wolverine.CritterWatch publishes from your service to the console — useful when telemetry isn't flowing as expected, or when you want to know what kind of data is leaving the process.
Publish cadence
| Surface | Cadence | Notes |
|---|---|---|
| Telemetry batch | Every 1 second | Always publishes, even if nothing changed (heartbeat). |
| Heartbeat ping | Every 30 seconds | Drives the per-node liveness dot in the UI. |
| Agent health probe | Every 60 seconds | Active probe; catches silent agent failures. |
| Broker health probe | ~60 seconds | One probe per configured transport. |
| Capability snapshot | On startup, then on Wolverine reinit | Topology — handlers, endpoints, stores, tenancy. |
| Source code (handler / HTTP chain) | On demand | Returned when an operator opens the corresponding detail page. |
The 1-second batching is the dominant latency — see Architecture → Message Flow for the full picture.
What's in a telemetry batch
Each batch carries:
- Service identity — service name, label, Wolverine version.
- Endpoint snapshot — every listener and sender with its current status (Accepting / Stopped / TooBusy / Latched / Paused / Draining), transport type, mode.
- Subscription / handler catalog — every message type the service handles or publishes, with handler bindings and routing.
- Recent changes — node added/removed, agent started/stopped, leadership change, circuit breaker tripped/reset, back pressure triggered/lifted, exceptions, since the last batch.
- Agent health snapshot — Healthy / Degraded / Offline for each registered agent.
- Persistence counts — inbox, outbox, scheduled, handled, and dead-letter counts per durability store. Per-tenant for multi-tenant services.
- Shard states — current sequence and high-water mark for each projection shard.
What's not in a telemetry batch
- Message bodies. Bodies are only sent on demand when an operator opens a specific dead-letter or scheduled message for inspection.
- Database connection strings. Database URIs (host + database name) are reported for identification; credentials are not.
- Application data. Your domain events, aggregates, and read models stay in your service's database.
- Application logs / traces. CritterWatch isn't an APM. For traces, configure an OpenTelemetry trace provider in Settings → Trace Providers.
- Per-endpoint and per-handler configuration trees. Reported on demand — see Lazy-fetched detail panes below. Keeping them off the heartbeat path is what lets a service with hundreds of endpoints or tenants stay under the broker / SQS payload cap.
Lazy-fetched detail panes
A handful of detail surfaces in the console are populated by a one-time round trip to the service, not by the 1-second telemetry batch:
| Pane | Wire request | What it fetches |
|---|---|---|
| Handler chain → Source Code | RequestHandlerSourceCode | Generated handler source for the message type |
| HTTP chain → Source Code | RequestHttpChainSourceCode | Generated source for the HTTP chain |
| HTTP chain → OpenAPI | RequestHttpChainOpenApi | OpenAPI operation descriptor for the chain |
| Pipeline tab → endpoint Properties | RequestEndpointProperties | Endpoint Properties + Children config tree |
| Handler chain → Properties | RequestMessageHandlerProperties | Per-handler Properties rows for the message type |
What you see: a brief spinner the first time you open one of these panes after the service starts (or after a service rollout). The response is cached on the console side keyed by service version, so re-opening the same pane is instant for the rest of the session. A rollout invalidates the cache.
If the round trip fails (service unreachable, license refused), the pane surfaces an inline error rather than retrying silently — re-open the page after the underlying cause clears to retry.
When telemetry stops flowing
If a service goes silent in the UI (the heartbeat dot turns red), the most likely causes in order:
- Process is gone. Crashed, killed, or shut down without graceful shutdown. The next telemetry batch never publishes. Check the host's logs.
- Broker is unreachable from the service. Telemetry never reaches the transport. Check broker connectivity from the service's network namespace.
- Console is unreachable, telemetry queueing. The transport buffers messages. The dashboard will show the service as silent until the console drains the backlog after reconnect.
- Service hung but process alive. A deadlock or runaway GC pause stops the publish timer. The 30-second heartbeat is the leading indicator — use the Per-node detail page to confirm.
- Snapshot too big for the transport. Look in the service's own logs for
ServiceUpdates payload exceeded 240 KiB after compression. This is a defensive guard: a snapshot that big is at risk of being silently dropped by SQS (256 KiB cap) or other size-limited transports. The most common cause is a sudden tenant-count explosion — every tenant adds its ownShardStateSnapshotrows +PersistenceCountsentry. If you see this, see Architecture → Multi-tenancy → Snapshot size.
The amber-then-red transition on the heartbeat dot (60s → 150s) gives you ~2.5 minutes to spot a stuck node before the UI calls it dead.
Capability snapshot
On startup, the service advertises its full topology to the console:
- Every registered message type and its handler binding.
- Every messaging endpoint with its configuration (mode, buffering limits, circuit-breaker settings).
- Every Wolverine durability store (inbox/outbox database).
- Every Marten event store and document store the service uses.
- Multi-tenancy mode and the current tenant list (for dynamic tenancy).
- The Wolverine assembly version.
The snapshot replaces the prior shape wholesale on each rollout. So if you redeploy with a model change, the new shape appears the moment the new version checks in — there's no merge or migration logic to worry about.
The snapshot is also re-issued whenever the Wolverine runtime is reinitialized (e.g., after a hot reload during development).
Graceful shutdown
On IAsyncDisposable.DisposeAsync(), the observer publishes one final telemetry batch tagged as a shutdown, cancels the periodic timers, and waits for in-flight publishes to complete. This produces a clean "service stopped" timeline entry rather than a heartbeat-timeout-induced "service silent" entry.
If your service is killed without graceful shutdown (SIGKILL, OOM kill, container forced termination), the final batch is lost — the service goes silent and the heartbeat dot transitions to red after the threshold.
Custom transports
The default integration uses RabbitMQ. The library works with any Wolverine-supported transport, but RabbitMQ is recommended for production:
- Reliable message delivery — telemetry survives a console outage.
- Decouples services from console availability.
- Standard tooling for observing the queue depth on the console side.
In-memory transport is fine for development and tests but loses messages if either process restarts. SQL Server transport works but has higher persistence overhead than RabbitMQ for the telemetry volume.
