Skip to content

Multi-Tenancy

CritterWatch supports services that run multiple tenants on separate databases — a common pattern for SaaS applications using Marten / Wolverine. This page covers what you'll see, what you can do, and the gotchas that come with managing tenants at runtime.

For step-by-step UI walkthrough, see Services → Tenants tab. For the integration code in your monitored service, see Wolverine.CritterWatch → Registration.

What CritterWatch sees

When a multi-tenant service starts up, it advertises its tenancy mode. The console picks this up automatically — there's nothing for the operator to configure on the console side.

The tenancy modes you'll see in the UI:

ModeMeaningTenants tab behavior
None / SingleNot multi-tenant.Tenants tab not shown.
ConjoinedTenancyAll tenants share one database, partitioned by tenant id.Tenants tab shown read-only — tenants are configured at startup.
StaticMultipleTenants pinned to specific databases at startup.Tenants tab shown read-only with the "Read-only" hint.
DynamicMultipleTenants and databases added/removed at runtime.Full lifecycle controls (Add / Disable / Enable / Remove / Hard delete).

Per-tenant scoping

Most operational surfaces have a tenant filter for multi-tenant services:

  • Dead Letter Queue — tenant filter scopes queries and replay/discard operations to one tenant's message store. Replay against acme-corp won't touch globex-inc.
  • Projections — each tenant has its own projection shards with independent sequence tracking. The Projections page renders one row per (shard, tenant) pair. Pause / Restart / Rebuild buttons are per-row: a rebuild of TripSummary:All for acme-corp only rebuilds that tenant's read model. Other tenants' shards keep advancing untouched.
  • Scheduled Messages — tenant filter scopes the list and per-message edits/cancels.
  • Durability Monitor — each tenant's inbox/outbox shows as a separate row.

This is enforced server-side by routing the operation through the target service's command handler with the tenant id attached. Operators can't accidentally cross-pollute data between tenants. The PauseProjection / RestartProjection / RebuildProjection / RewindSubscription commands all carry an optional TenantId for the same reason.

Per-tenant traffic columns

The Tenants tab includes per-tenant traffic columns when the service emits per-tenant Prometheus labels:

  • Executions (1h) — handler executions in the last hour
  • Failures (1h) — handler failures in the last hour (reds when non-zero)
  • DLQ depth — current dead-letter depth (reds when non-zero)

These let you spot the noisy-neighbour pattern at a glance: a tenant whose DLQ depth is climbing while the rest are flat is the one to investigate. See Services → Tenants → When to drill in for the patterns.

If the columns aren't showing, your service either has zero tenants reporting metrics or doesn't yet emit tenant_id labels in its OpenTelemetry / Prometheus config.

Adding a tenant at runtime

For DynamicMultiple services, the Add Tenant button on the Tenants tab opens a dialog with two fields: tenant id and connection string. Submitting it sends the AddTenant command to the service.

The service handler:

  1. Adds the database to its multi-tenant store via Wolverine's tenant management API.
  2. Applies the Marten / EF Core schema to the new database (table creation, etc.).
  3. Reports the new tenant in the next telemetry batch — the row appears in the Tenants tab with Active status.

End-to-end this is typically 1–3 seconds, dominated by the schema-apply step on the new database.

Connection-string security

The connection string travels over the transport (RabbitMQ) from the console to the target service. The console does not persist it — only the database URI (host + database name) for identification. Two practical consequences:

  • Use TLS on your transport (amqps://) in production. The connection string is a credential.
  • If you want a clean audit trail, rotate the database password after cutover. The audit log captures the click but not the secret.

Tenant ID normalization

If your Marten store sets TenantIdStyle to ForceLowerCase or ForceUpperCase, both AddTenant and session-open silently rewrite the tenant id before doing anything with it. The Add Tenant dialog warns the operator inline if the typed id would be rewritten — see Services → Add Tenant dialog.

What this means in practice: you cannot have two tenants whose ids differ only in case. The second AddTenant will resolve to the same row as the first.

The mode the service is using is shown on the Service Overview tab under "Tenant ID Style".

Disable / Re-enable / Remove / Hard delete

The four destructive operations — in increasing severity:

ActionEffectReversible?
DisableSoft toggle. Sessions against the tenant throw UnknownTenantIdException. Data preserved.Yes — re-enable.
RemoveDrops only the master-table row. Per-tenant database left intact.Yes — re-add with the same id + connection string.
Hard deleteDrops the per-tenant database and removes the master row. Permanent.No.

The lifecycle full picture is in Services → Tenants → Lifecycle.

Hard delete is gated behind a typed-id confirmation modal — the operator must type the exact tenant id before the Confirm button enables. The intent is friction proportional to the blast radius. Every hard delete records an extra-verbose audit entry with the typed id, database URI, and confirmation time.

What if I add a tenant outside the console?

If your service code adds tenants programmatically (e.g. during onboarding, from your own app's admin UI), CritterWatch will see the new tenant on the next telemetry batch — within ~1 second.

If the new tenant doesn't appear, click the Refresh button on the Tenants tab. That dispatches a RequestTenantList which forces the service to re-publish its current tenant list.

Multiple tenant sources

A service can register more than one IDynamicTenantSource<string> — for example one source backed by your Marten master-table store and a second backed by a Wolverine-managed config table. When that happens, the Tenants tab disambiguates by showing the source name in a column next to the tenant id. The same tenant id can appear in multiple sources without colliding; CritterWatch keys per-tenant state by (tenantId, sourceName).

This matters operationally for two reasons:

  • Lifecycle commands target one source at a time. Adding acme-corp via the Marten source doesn't add it to the Wolverine source. If you only see the tenant under one source name and expect it under both, the missing registration is on the service side.
  • Per-tenant projection rows multiply by source. A tenant that exists in two sources shows up in two (shard, tenant) rows on the Projections page. This is correct — the two sources back distinct projection shards — but it's worth knowing before you go counting rows.

Snapshot size on very large tenant counts

The per-second telemetry batch (ServiceUpdates) carries one ShardStateSnapshot row per (shard, tenant) plus a PersistenceCounts entry per tenant. On services with hundreds of tenants this can push the compressed payload close to the broker / SQS payload cap (256 KiB on SQS).

The integration ships with a defensive guard: if a snapshot exceeds 240 KiB after compression, the service logs a warning:

warn: Wolverine.CritterWatch[0]
  ServiceUpdates payload exceeded 240 KiB after compression
  (compressed: 244832 bytes, raw: 1834291 bytes).
  Snapshot is at risk of being dropped by size-limited transports.

What to do:

  • If you only see one or two of these. Likely a transient: a churning rebuild that briefly fanned shard rows out, a one-off snapshot during a tenant migration. The next batch will be back under the cap.
  • If you see them sustained. The service has grown past the heartbeat's comfortable working set. The fix is to split it: either federate the monitored service into smaller per-region services, or move very-low-traffic tenants to a separate service with its own telemetry queue. The 256 KiB SQS cap is the hard ceiling — at that point batches will be silently dropped, and the console will show gaps in your shard / persistence-count history.

This guard runs inside the service. It doesn't fire on the console side.

What CritterWatch does not do

  • No tenant onboarding workflow. The Add Tenant button is a low-level admin tool. Production tenant onboarding (with billing, provisioning, SLA assignment, etc.) belongs in your application's own admin UI; use CritterWatch for ad-hoc additions and incident-time fixes.
  • No bulk operations. Each Add / Remove / Hard delete is one tenant at a time. This is deliberate — bulk operations on tenant databases are usually a red flag.
  • No "rebuild for all tenants" button today. Per-tenant projection rebuild is one click per tenant. Bulk rebuild across tenants is tracked on #286; script it against the HTTP API for now if you really need it.
  • No per-tenant alert thresholds today. Alert thresholds cascade global → service → message-type, but not per-tenant. If you need to suppress alerts for one tenant during a known issue, the workaround is a per-shard suppress on the Alert Configuration page.

Released under the MIT License.