Knowing when to roll back an agent is the operator discipline most underdeveloped in the industry. Most operators either roll back too eagerly (one bad week and the project gets shelved permanently) or not eagerly enough (problems accumulate until the agent is causing more damage than value and the team is too invested to admit it). Neither extreme is right. The decision should be rule-based, not emotional.
The three rollback triggers
What "rollback" actually means
Three levels, depending on severity. Level 1: revert the specific affected run via the agent's rollback procedure (Tier 1 from the rollback design lesson). The agent keeps running, the affected guest gets corrected, the audit log records both the original action and the correction. Level 2: pause this specific workflow, route inbound work to humans, investigate. The agent keeps running on other workflows. Level 3: pause the agent entirely. Rare, but the option must exist.
At the property in Krakow, we hit Level 2 once in 18 months — a sustained quality drop on the group-block management workflow after a Mews API behavior change the vendor had not announced. We paused that workflow for 11 days, investigated, updated the integration, validated against the benchmark, and resumed in week-2-of-rollout mode. The reservation-modification workflow kept running throughout. Total impact: 11 days of human-handled group blocks, zero impact to other workflows.
The political dimension
Rolling back is hard not because the technical procedure is hard — it is straightforward — but because someone has to make the call to a stakeholder. The operator who built the project does not want to admit it is not working. The owner who funded it does not want to feel embarrassed. Neither is wrong to feel that way, but both will lose more if the agent keeps running through known problems.
The cultural pattern that works: in the project charter, explicitly state that rollback is a normal operational tool, not a failure of the team. List the three triggers. Pre-commit to the response. When a trigger fires, the response is the documented one — not a debate. The team that builds rollback into the operating culture from week 1 rolls back when needed and re-launches stronger. The team that treats rollback as a public failure delays it past the point where it would have been useful.
What to do after a rollback
Three steps. First, write a one-page post-incident document: what happened, why, what was fixed, what changed in the agent or the operational process to prevent recurrence. Distribute to the team and the owner. Second, update the benchmark and the drift-detection check to catch this specific failure mode in the future. Third, decide whether to re-launch the affected workflow at the current rollout phase or to back up one phase as a safety margin. The default should be one phase back; over-aggressive re-launches are how the second incident happens.
I have rolled back four times in my career. Two were straightforward Mews API breakages; one was a prompt-drift issue after a model provider update; one was a genuine misjudgment of a workflow's complexity. All four projects are still running. The rollback is not the failure; the failure would have been refusing to roll back.