When to keep a human in the loop · Agentic Workflows & PMS Integration

"Fully autonomous agent" is a phrase that sells well in vendor pitches and almost never describes the right architecture for hospitality. The question is not whether to keep a human in the loop — you almost always should — but at which step, with what context, and with how much friction. Get this design wrong in either direction and you either drown the team in reviews (defeats the ROI) or skip reviews on the cases that needed them (creates the asymmetric tail risk).

The four placements of the human

How to choose the right placement

Three factors. First: blast radius — how bad is the worst-case mistake. A wrong reservation modification is bounded; an unauthorized refund is not. Second: reversibility — can the action be undone in minutes (a date change, recoverable) or does it create lasting consequences (an email already sent to a VIP, harder). Third: volume — at 600 runs per month, pre-action review is workable; at 6,000 runs per month, you need sampling review and good exception logic.

The pattern that works for most operators starting out: pre-action review for the first 4 weeks, post-action review for the next 8 weeks once the team trusts the agent, then sampling-plus-exception review for production-steady state. The team is in the loop continuously, but with progressively less friction.

The "low-confidence escalation" technique

Modern LLMs can self-report confidence reasonably well if you ask them to. In the agent system prompt, instruct the model: "If you are less than 90% confident in the proposed action, set escalate=true in your final response and provide a one-sentence explanation of the uncertainty." Then route any escalated run to a human. In practice, this catches 60-80% of the runs that would have failed silently, at the cost of 15-25% of runs being escalated unnecessarily. The unnecessary escalations are a small price; the silent failures are not.

What the human is actually doing

The human-in-loop is not "double-checking the AI" in a generic sense. They are answering a specific question that the audit log frames: "Given the input, the proposed action, and the reasoning, is this action correct enough to proceed?" The decision should take 15-45 seconds for routine reviews. If your reviews are taking 5 minutes each, the audit-log presentation is wrong — the human is being asked to reconstruct what the agent did, which is the audit log's job. Fix the presentation, not the human.