Security & Compliance

Auditable AI Automation: Meeting FCA and ICO Expectations

|10 min read

Auditable AI automation is the difference between an AI deployment that survives a regulator visit and one that produces an unhappy conversation with the firm's second line nine months in. The artefacts the FCA, the ICO, and your internal audit team expect are not especially exotic, but they have to be designed in from the first sprint, not retrofitted under deadline pressure when the supervisory letter arrives. This piece walks through what those artefacts actually are, what good looks like, and what we build by default for regulated UK clients.

What “auditable” actually means

Auditable, in the regulator's sense, means three things working together. The firm can explain what the AI did. The firm can evidence that the AI did it within agreed boundaries. And the firm can show that someone was accountable for the output. Each of these is a distinct engineering and governance commitment, and each has to be in place before the automation goes anywhere near client-affecting work.

For AI automation in regulated UK firms, the translation into concrete artefacts is well-established. The list below is what we build by default; if you are evaluating an AI vendor and any of these is missing, that is the conversation to have first.

The artefacts the FCA expects

Model and prompt version logging. Every output captured with the exact model and prompt version that produced it. Months later, when the firm needs to reconstruct why a particular case was handled a particular way, the trail is the artefact that answers the question. Without it, the firm cannot evidence that the AI was operating inside agreed parameters at the relevant time.

Decision logs with reasoning. For every classification, extraction, draft, or routing decision the AI takes, the input it received, the output it produced, and the reasoning it generated (where the model produces structured reasoning) captured in a queryable log. Decision logs are the single most useful artefact in a regulatory conversation: they let the firm answer specific case questions with specific case data.

Eval test results. A curated test set covering the cases the automation must handle and the failure modes it must avoid, run on every change. The results are the evidence the firm can offer that the system performs to the standard the firm has committed to. The FCA does not want to read your evals; it wants to see that you have them, that they run, and that you act on the results.

Confidence thresholds and escalation paths. Documented thresholds for when the AI handles a case, when it flags for human review, and when it escalates further. Documented as part of the design and tested as part of the eval harness rather than assumed. Consumer Duty in particular makes the thresholds for customer-affecting outputs material to the supervisor.

Rollback procedure. A documented and rehearsed way to disable the AI and fall back to manual handling. Rehearsed before launch, not in the middle of an incident. Operational resilience expectations apply to AI in the same way they apply to any other production system.

What the ICO expects on top of that

For automation processing personal data, the ICO has a separate set of expectations that sit alongside the FCA artefacts.

Lawful basis and processing-purpose mapping. Documented at the start of the engagement, not at the end. The Workflow Audit covers this as part of the discovery work - what personal data is involved, what the lawful basis is, what processing happens, and what the data minimisation decisions look like.

DPIA where required. Article 35 DPIAs are required for high-risk processing, and most regulated AI automation lands in that bracket. The DPIA is one of the early artefacts produced from the audit; without it, the rest of the engagement does not proceed.

Article 22 considerations. Where the automation produces decisions that affect individuals, the firm has obligations around explanation, contestation, and human review. For most regulated AI automation, the answer is a clear human-in-the-loop checkpoint for any case the model handles in a way that has legal effect, and the design has to make that real, not nominal.

Subject-access readiness. The decision logs above also need to be retrievable for subject access requests within the regulatory window. We design log schemas with that retrievability in mind from day one.

What internal audit expects

The firm's second line and internal audit add a layer of practical scrutiny on top of the regulator-facing artefacts. The pattern we recommend is a single document we call the “control pack”, refreshed quarterly, owned by the senior manager accountable for the automation, and built from the underlying logs and eval results.

The control pack includes: model documentation (what the model is, what data it was trained on if relevant, what its known limitations are), prompt and policy versions in operation, eval results from the most recent run, monitoring dashboard summary, incidents and near-misses since the last review, and the planned changes for the coming quarter. It is the artefact the internal audit team reads, the second line uses for ongoing oversight, and the senior manager refers to when the question lands on their desk.

Where this fits in the SM&CR conversation

Senior Manager regimes mean someone has to own the AI outputs. The auditable artefacts above are what makes that ownership practical rather than theoretical, they are the evidence that the senior manager is exercising effective oversight. For firms that have rolled out AI automation without these artefacts, the SM&CR conversation gets uncomfortable quickly, because the senior manager genuinely cannot answer the supervisor's questions.

Building these in from the first sprint is the cheapest place to do it. Retrofitting is the most expensive, typically two or three times the original build cost, and usually under deadline pressure when the regulator is already in the room. The pattern we see most often is firms that built without the artefacts spending year two rebuilding to include them; we prefer to do it once, properly, in year one.

How to evaluate a vendor

Three questions that surface auditability fast at procurement.

1. Show me the audit trail of what happened on a real case. If the trail is conversation logs only, the system is not auditable in the regulated sense. If it is model and prompt version per case, plus inputs and outputs, that is the foundation.

2. Show me the eval test set. If there isn't one, or if it has not been updated since deployment, the system is not being maintained. The eval test set is the living document that tells the firm whether the AI is doing what it is meant to.

3. Walk me through the rollback path. If the answer is “turn it off and use the old process,” the rollback is theoretical. A real rollback path is rehearsed, documented, and owned by a named operator.

For more on how AI automation lands across regulated UK industries, see the AI automation pillar or the industry-specific guide for financial services. For the related question of how to approach agentic AI deployments where the audit bar is higher again, Agentic AI explained: a UK operator's guide is the natural next read.

Ready to transform your business with AI?

Book a free strategy session to discuss how Evolve AI can help your organisation harness AI safely and compliantly.

Book Strategy Session