Most agentic projects I have seen fail are not killed by technical problems; they are killed by operational rollouts that move too fast and lose stakeholder trust before the agent has had time to demonstrate value. The 4-week pilot pattern is the rollout structure that has worked across both regional chains I have led and consistently produces a "go / no-go" answer the owner can defend.
Week 1: Shadow mode
The agent runs on real inputs but does not take any action. It produces what it would have done — the proposed PMS change, the draft email, the rate adjustment — and writes it to the audit log. A human reads each proposed action and either approves (which then triggers the real action manually) or rejects with a written reason.
Week 1 deliverable: 80-150 shadow runs, with the agreement rate (how often the human approved what the agent proposed) tracked daily. Target by end of week 1: 75% agreement. If you are below 60%, the agent is not ready and you spend week 2 iterating on the prompt before moving to week 2 of the rollout.
Week 2: Human-in-loop, agent acts
The agent now takes real action, but only after a human approves the proposed action in a queue. Latency is added (typically 5-30 minutes per action depending on staffing) but the action is real. The agent's audit log captures both the proposal and the human decision.
Week 2 deliverable: 150-300 real runs with human approval. Track: how often the human modified the proposed action before approving (target <10%), how often the human rejected outright (target <5%), and the average time from proposal to human decision (target <30 minutes for routine workflows).
Week 3: Limited autonomy
The agent now acts autonomously on a defined sub-segment of the workflow — typically the lowest-risk subset (e.g., for reservation modifications: changes that do not affect pricing, do not affect VIPs, and are at least 7 days from arrival). The rest of the workflow remains in week-2 mode. Daily review of all autonomous runs by the operations director.
Week 3 deliverable: 100-200 autonomous runs, 200-400 human-in-loop runs. The owner gets a written summary at the end of week 3: how many runs autonomous, error rate, hours saved, escalations.
Week 4: Full rollout decision
At the end of week 4, the team and the owner sit together with the audit log and make the go / no-go call. Three numbers drive the decision: agreement rate (target >92%), hours saved per week (target >12), and incident count (target ≤2 minor, 0 severe). If all three are within target, the rollout expands. If any one is out of target, the rollout pauses for two weeks of focused iteration on that specific metric.
Why 4 weeks (and not 2 or 8)
Two weeks is not enough to see drift, edge cases, or the second-order effects of front-office behavior change. Eight weeks is long enough that the project loses political momentum and the team starts to question whether the agent will ever ship. Four weeks is the minimum that lets the operations team get past the initial novelty and see what the agent actually does, while keeping the stakeholder timeline tight enough to maintain commitment.