Chaos Monkey
Chaos Monkey enables controlled fault injection into monitored services for resilience testing. Use it to validate that your error handling, circuit breakers, dead letter policies, and alert configurations behave as expected under real failure conditions.
Production Warning
Chaos Monkey injects real failures into real message processing. Do not enable it in production unless you fully understand the consequences and have a tested recovery plan. All chaos operations are reversible — disable chaos monkey to immediately stop all fault injection.
How It Works
When Chaos Monkey is enabled on a service, the Wolverine.CritterWatch observer injects faults into the Wolverine message processing pipeline based on the configured rates. Faults are applied randomly with the specified probability on each message.
Chaos operates independently per service. Enabling chaos on trip-service does not affect repair-shop.
Enabling Chaos Monkey
From the Chaos Monkey page, select a service and click Enable Chaos Monkey:
// Enable chaos monkey — no failures yet, just arms the system
await bus.SendAsync(new EnableChaosMonkey(ServiceName: "trip-service"));Enabling chaos does not immediately inject failures — it arms the system. Set specific rates to begin injecting faults.
Failure Types
Handler Failure Rate
A percentage (0–100%) of message handlers will throw an exception:
// 10% of all message handlers will throw an exception
await bus.SendAsync(new SetChaosFailureRate(
ServiceName: "trip-service",
FailureRate: 0.10
));Failed messages enter Wolverine's normal retry pipeline and will eventually dead-letter if retries are exhausted.
Slow Handler Rate
A percentage of handlers will introduce artificial latency:
// 20% of handlers will introduce a 5-second delay
await bus.SendAsync(new SetChaosSlowHandlerRate(
ServiceName: "trip-service",
SlowRate: 0.20,
DelayMilliseconds: 5000
));Useful for testing back pressure behavior, circuit breaker sensitivity to slow consumers, and alert latency thresholds.
Projection Failure Rate
Inject failures into async projection processing:
// Inject failures into async projection processing
await bus.SendAsync(new SetChaosProjectionFailureRate(
ServiceName: "trip-service",
FailureRate: 0.05
));Useful for testing projection stall detection, automatic restart behavior, and stall alerts.
Disabling Chaos Monkey
// Disable chaos monkey — immediately stops all fault injection
await bus.SendAsync(new DisableChaosMonkey(ServiceName: "trip-service"));Disabling chaos monkey immediately stops all fault injection. In-flight messages that already failed will continue through the normal error handling pipeline.
Scripted Scenarios
For structured resilience testing, define a sequence of chaos operations with observation steps between them:
// Run a scripted chaos scenario:
// 1. Start with 5% failure rate
// 2. Verify alerts are raised
// 3. Escalate to 25% failure rate
// 4. Observe circuit breaker behavior
// 5. Disable chaos and verify system recovery
await bus.SendAsync(new EnableChaosMonkey(ServiceName: "trip-service"));
await bus.SendAsync(new SetChaosFailureRate(
ServiceName: "trip-service",
FailureRate: 0.05
));
// ... wait, observe alerts ...
await bus.SendAsync(new SetChaosFailureRate(
ServiceName: "trip-service",
FailureRate: 0.25
));
// ... observe circuit breaker trip ...
await bus.SendAsync(new DisableChaosMonkey(ServiceName: "trip-service"));
// ... verify recovery, alerts resolve ...Chaos Monkey Status
The Chaos Monkey page shows the current configuration for each service:
| Column | Description |
|---|---|
| Service | Monitored service name |
| Enabled | Whether chaos is armed |
| Failure Rate | Current handler failure rate |
| Slow Rate | Current slow handler rate |
| Slow Delay | Delay injected for slow handlers |
| Projection Failure Rate | Current projection failure rate |
All settings are updated in real time as commands are sent.
What to Test
Recommended chaos testing scenarios:
- DLQ accumulation — set a 50% failure rate, watch DLQ counts rise, verify warning/critical alerts fire, disable chaos, verify DLQ count alerts resolve
- Circuit breaker — set a high failure rate, watch a circuit breaker trip, observe listener status change to Latched, verify circuit breaker alert fires, disable chaos, watch breaker recover
- Projection stall — set projection failure rate to 100%, watch a projection lag widen and stall, verify stall alert fires, disable chaos, verify projection resumes
- Back pressure — set slow handler rate to 100% with high delay, watch back pressure activate, verify TooBusy endpoint status appears
