Human-in-the-Loop Patterns for Agentic AI in Regulated Industries
Human-in-the-loop is the most over-used and under-designed phrase in regulated AI conversations. Almost every vendor claims their system is human-in-the-loop; almost no vendor pitch describes specifically where the human is in the loop, what the human sees at that moment, and what the firm has done to make sure the review is real rather than nominal. This piece is the practical guide to designing human-in-the-loop checkpoints that the regulator, the second line, and the senior manager accountable for the system can all stand behind.
Where to put the checkpoint
The default mistake is putting the human-in-the-loop checkpoint at the end of an agentic AI workflow, the agent does everything, and a human approves the final output. This pattern is intuitive but usually wrong. By the time the human sees the output, the chain of decisions has already happened, the cost is sunk, and the human review is essentially “does this look right?” rather than “is this right?”
The patterns that actually work in regulated environments put checkpoints at the moments that matter, not at the end.
Confidence-threshold checkpoint. Whenever the agent's confidence falls below a documented threshold, the case routes to a human with the agent's reasoning attached. This is the most common pattern, and the threshold is where the design thinking sits. Too high and everything escalates; too low and risky cases slip through. The threshold is set during the design phase based on the eval results, then tuned in the pilot phase based on real production data.
Scope-boundary checkpoint. Whenever the agent is about to take an action outside a designed-in scope, calling a tool it does not normally call, taking an action with legal effect, escalating to a higher-stakes pathway, a human approves the action before it is taken. This is the pattern most regulators want for material decisions.
Material-decision checkpoint. Decisions that have legal effect on a customer (UK GDPR Article 22), decisions that affect a customer outcome under Consumer Duty, decisions that have material financial or operational consequence, these always have a human in the loop, and the human is making the decision rather than approving the agent's decision.
Periodic sample-review checkpoint. Beyond per-case checkpoints, a designed-in pattern of periodic sample review on cases the agent handled fully. The aim is to catch slow drift that no individual case would surface. Typically structured as 1-5% of cases reviewed, sampled across confidence buckets.
What the human sees at the checkpoint
The most common reason human-in-the-loop reviews are nominal rather than real is that the human sees too little context to make the review meaningful. The pattern that works in production: the human sees the agent's output, the inputs the agent worked from, the chain of intermediate decisions and tool calls (with the reasoning the agent generated), and a clear surfacing of any uncertainty signals the agent flagged.
Equally important is what the human does not see. They do not see a clean, confident-looking output stripped of context, because that nudges them toward rubber-stamp approval. They do not see only the cases the agent already decided to escalate, because that misses the cases the agent should have escalated but did not. The checkpoint UI is a first-class part of the design, not an afterthought.
How to make the review real, not nominal
Three patterns we use consistently in regulated agentic AI deployments.
Visible reasoning, visible uncertainty. The agent's reasoning is shown to the human reviewer, including the points where the agent was uncertain. Calibrated uncertainty is one of the more valuable signals a modern agent can produce, and surfacing it correctly turns rubber-stamp review into informed review.
Forced positive engagement. The reviewer cannot approve in one click. The UI requires acknowledgement of specific aspects of the case, a brief note, a category selection, a confirmation of a key fact. This adds friction, which is exactly the point; it makes the review actively rather than passively performed.
Sampled second-line oversight. A subset of cases reviewed by a senior reviewer or the second line, with disagreement rates tracked over time. Where a primary reviewer's decisions consistently diverge from the senior reviewer's, that is a signal, usually that the threshold is wrong, the reviewer is under-trained, or the case type needs different handling.
UK GDPR Article 22 specifically
For agentic AI that produces decisions with legal or similarly significant effect on individuals, UK GDPR Article 22 applies. The default obligation is that decisions cannot be based solely on automated processing, meaning the human-in-the-loop checkpoint must be real, not nominal, and the firm must be able to explain the decision and provide an avenue for the individual to contest it.
The practical implications: the material-decision checkpoint above is non-optional for any Article 22-relevant case. The reviewer must have meaningful authority to override the agent's recommendation. The decision log must capture the human's reasoning, not just their approval. And the firm must have a documented process for the individual to contest the decision and have it reviewed by a different human.
Consumer Duty considerations
For UK financial services firms specifically, Consumer Duty raises the bar on customer-affecting decisions. The human-in-the-loop checkpoint for agentic AI producing customer-facing outputs needs to be designed around the four cross-cutting Consumer Duty rules, particularly the requirement to enable customers to pursue their financial objectives.
In practice that means the checkpoint UI surfaces information relevant to whether the proposed decision treats the customer fairly, including any vulnerable-customer signals the agent flagged. The reviewer is empowered (and trained) to override the agent on Consumer Duty grounds, not only on factual or technical grounds.
How this shows up at the regulator
When the FCA, the ICO, or the firm's own internal audit examines an agentic AI system, the human-in-the-loop questions concentrate on three things: where the checkpoints are, what the human sees at each one, and how the firm evidences that the review was real. The answers to these are part of the control pack the senior manager maintains; the design decisions are made during the Evolve Workflow Audit and refined through the build and pilot phases.
For more on agentic AI engineering for regulated UK industries, see the agentic AI pillar, the practical companion Building an agentic AI eval harness, or the deployment pattern From pilot to production: a 12-week pattern for agentic AI deployments.
Ready to transform your business with AI?
Book a free strategy session to discuss how Evolve AI can help your organisation harness AI safely and compliantly.
Book Strategy Session