Run controlled failures across a small cloud sandbox.
Trigger a fault, watch the agents diagnose it, then recover the system.
Waiting for backend state updates.