FCA Compliance Automation with AI: What to Automate, What to Keep Human
FCA compliance automation with AI is no longer a question of whether the technology can do the work -large language models are perfectly capable of reading a financial promotion, summarising a complaint, or scanning a handbook update. The real question is which compliance processes can sensibly be automated, which must stay human, and what governance wrapper makes the automated parts defensible when the supervisor asks how the firm knows the system is behaving. This article works through the compliance functions in a typical UK FCA-regulated mid-market firm, one by one, and sets out what AI does, what stays human, and the controls that hold the whole thing together.
The framing matters because the failure mode is predictable. Firms automate the attractive parts -the volume work, the first-draft work -and discover eighteen months later that they cannot evidence how any of it was governed. The approach we take with regulated clients is governance-first: design the audit trail, the escalation thresholds, and the rollback before automating anything customer-affecting. For the broader regulatory context, our pillar on what compliance officers need to know about the FCA and AI sets out the FCA's stance; this piece is the operational layer beneath it.
The governance wrapper comes first
Before going through individual processes, it is worth being explicit about the wrapper, because the same five controls apply to every automation below. If any of these is missing, the process is not ready to automate regardless of how capable the model is.
- Audit trail. Every AI-assisted output captured with its inputs, the model and prompt version that produced it, the output before any human edit, the human decision on top, and the eventual outcome. This is the artefact that answers a supervisory question nine months later.
- Model and prompt logging. The exact version of the model and the exact version of the prompt or policy in force at the time, so the firm can reconstruct the conditions a decision was made under rather than guessing.
- Escalation thresholds. Documented confidence levels that govern when the AI handles a case, when it flags for human review, and when it escalates further - designed in, not assumed, and tested rather than asserted.
- Eval harness. A curated test set covering the cases the automation must handle and the failure modes it must avoid, run on every change, so the firm can evidence that the system still performs to the committed standard after each update.
- Rollback. A documented and rehearsed route to disable the AI and fall back to manual handling, exercised before launch rather than improvised mid-incident.
These artefacts are covered in more depth in our piece on auditable AI automation for the FCA and ICO. Treat them as the non-negotiable baseline for everything that follows.
Financial-promotions review
Financial promotions are one of the strongest candidates for automation because the work is high-volume, rules-based, and well-documented in COBS 4 and the wider perimeter guidance. AI can triage a queue of promotions, check each against a structured rule set (risk warnings present, balanced presentation of benefits and risks, prominence requirements met, target-market consistency), flag the specific clause that fails, and draft the suggested correction. For a firm processing hundreds of promotions a month, this turns a backlog into a same-day first pass.
What stays human is the sign-off, and the judgement calls the rules cannot fully capture. Whether a promotion is "fair, clear and not misleading" in context, whether a particular claim crosses from puffery into a misleading impression, and the approval of promotions for unauthorised persons under the s21 gateway are decisions a competent person must own. The AI compresses the review; it does not replace the approver. The escalation threshold here is conservative: anything novel, anything aimed at retail clients in a high-risk product, and anything the model scores below a high-confidence band routes to a human before it goes anywhere near publication.
Consumer Duty outcomes monitoring
Consumer Duty generates a monitoring burden that is genuinely well-suited to AI, because the obligation is continuous and the data is spread across systems. The four outcomes - products and services, price and value, consumer understanding, and consumer support - each produce signals that an AI layer can aggregate, summarise, and surface. AI can read complaint narratives and call transcripts to detect understanding failures, cluster outcomes by customer segment to find where vulnerable customers are faring worse, and draft the narrative sections of the annual board report from the underlying data.
The hard line here is between detection and conclusion. AI is good at surfacing "this segment shows a pattern of poorer outcomes"; it is not the right tool to conclude "and that is acceptable" or "and no remediation is required." Those are judgements the firm makes and the accountable senior manager owns. The governance wrapper matters acutely for Consumer Duty because the thresholds for customer-affecting outputs are themselves material to the supervisor: the firm has to be able to show not just that it monitors outcomes, but how it decided what counts as a poor outcome and when monitoring escalates into action. AI that quietly normalises a drifting metric is worse than no automation at all.
Transaction and communications surveillance
Surveillance is already partly automated in most firms, and AI improves the quality of the automation rather than introducing it for the first time. The long-standing problem with rules-based surveillance is the alert volume: a large proportion of flags are false positives that analysts close without action. AI can read the surrounding context -the full communication thread, the trading pattern, the relationship history -and prioritise alerts by genuine risk, draft the rationale for closing the low-risk ones, and surface the handful that warrant real scrutiny. For market-abuse surveillance under MAR and for financial-crime monitoring, this is a material efficiency gain.
What does not change is who decides to file. The decision to submit a Suspicious Activity Report or to escalate a potential market-abuse case is a human one, made by a person with the relevant authority, and the AI's role stops at preparing the case. There is a specific governance trap here worth naming: an AI that closes alerts must never silently close a true positive. The eval harness for surveillance has to include known true-positive cases, and the escalation threshold has to be biased towards human review. It is better to over-refer than to train the firm to trust an automation that occasionally lets a real case through.
Complaints handling and root-cause analysis
Complaints handling splits cleanly into a part that automates well and a part that does not. AI handles the intake and the analysis: extracting the substance of a complaint from a long, emotive, or rambling message, categorising it under DISP, identifying the products and the timeline, pulling the relevant account history, and drafting a structured summary for the handler. It is also genuinely strong at root-cause analysis across the complaints book -reading hundreds of resolved complaints and clustering them into the underlying causes, which is work that is otherwise slow and inconsistent when done by hand.
The resolution and the redress decision stay human. A final response that determines whether the firm upholds a complaint and what redress is due is a decision with direct customer consequences and a clear Consumer Duty dimension, and it belongs to a person. Where AI drafts customer-facing wording, that wording passes through the same validation as any human-authored response, and arguably more, given the known tendency of language models to produce fluent but inaccurate text. A confidently worded but wrong final response is a DISP breach and a poor outcome in one move. The root-cause output, by contrast, feeds the firm's thematic reviews and Consumer Duty work, where its value is highest and its risk is lowest.
RegData and regulatory reporting preparation
Regulatory reporting is where the language "automating FCA compliance" needs the most care, because the obvious target -the regulatory return itself -is the part you automate last. AI is well-suited to the preparation around a return: reconciling source data, flagging values that fall outside expected ranges, comparing this period against prior submissions to catch anomalies, drafting the commentary, and assembling the supporting evidence pack. This is the slow, error-prone manual work that surrounds RegData submissions, and compressing it is a clear win.
The submitted numbers, and the attestation that they are correct, stay firmly human. A regulatory return is a formal representation to the FCA; the firm cannot delegate responsibility for its accuracy to a model, and an AI that "mostly" reconciles figures is not good enough when the figures are the thing being attested to. The right pattern is AI as preparer and checker, with deterministic validation on the numbers themselves, and a named individual who reviews and signs. The audit trail here is doubly important: if a return is later found to be wrong, the firm needs to show exactly which figures the AI touched, what it changed, and who confirmed the final values.
SM&CR evidence collation
The Senior Managers and Certification Regime runs on evidence, and collating that evidence is administrative work that AI handles well. It can assemble the material that supports a senior manager's "reasonable steps" position -the meeting records, the decisions taken, the risks considered, the actions tracked -into a coherent pack, keep statements of responsibilities aligned with what is actually happening, and draft the certification and fitness-and-propriety documentation from underlying HR and training data. For a compliance team that spends real time chasing and formatting this material, the relief is immediate.
What AI does not do is exercise the accountability. The FCA is explicit that a senior manager cannot delegate accountability to an algorithm, and that applies to the AI doing the collation as much as to any AI doing regulated work. The senior manager still has to engage with the substance, make the judgements, and own the reasonable-steps position. The AI makes the evidence easier to maintain; it does not make the decision for the person whose name is on the statement of responsibilities. There is also a neat recursion here worth noting: the AI that collates SM&CR evidence is itself a regulated automation, so it needs its own named owner and its own place in the firm's control framework.
Horizon-scanning of regulatory change
Keeping up with regulatory change is a perennial drain on compliance teams, and horizon-scanning is one of the lowest-risk, highest-value automations available. AI can monitor FCA publications, consultation papers, Dear CEO letters, policy statements, and the wider stream of speeches and feedback statements, then summarise what is new, classify it by relevance to the firm's permissions and business lines, and route the material that matters to the right owner. It can also map a new requirement against the firm's existing policies to flag where a gap has opened.
The interpretation of what a change means for the firm, and the decision on how to respond, stay human. AI summarising a consultation paper is useful; AI deciding that the firm does or does not need to change a process in response is a judgement that requires understanding the firm's specific permissions, client base, and risk appetite. The governance bar here is lower than for customer-affecting automation -no customer is directly affected by a horizon-scanning summary -but the audit-trail discipline still applies, because the firm will want to evidence that it identified and considered a change at the time, rather than reconstructing that claim after the fact.
A pattern across all seven
Step back and the same shape recurs in every process above. AI takes the volume work, the first-draft work, the reading-and-summarising work, and the detection work. The human keeps the decision, the sign-off, the attestation, and the accountability. The boundary is always drawn at the point where an output has a legal effect on a customer, a formal standing with the regulator, or a consequence that someone has to answer for personally.
"The firms that automate compliance well are not the ones with the most ambitious models -they are the ones that drew the human-machine boundary deliberately and built the audit trail before they crossed it. Automation without that boundary is not efficiency, it is undocumented risk waiting for a supervisory letter."
That boundary is also where the eval harness earns its keep. For each process, the test set should encode exactly the cases that must escalate to a human, so that a change to the model or the prompt cannot quietly move the boundary without the firm noticing. An automation that drifts -closing alerts it should escalate, signing off promotions it should query -is far more dangerous than one that simply stops working, because it fails silently and on the wrong side of the line.
Sequencing the work
For a mid-market firm starting out, the sensible order is to begin where the risk is lowest and the value is clear, then work towards the customer-affecting processes as the governance muscle develops. Horizon-scanning and SM&CR evidence collation are good first projects: real time saved, no direct customer impact, and a chance to prove the audit-trail and logging discipline on something forgiving. Surveillance triage and financial-promotions review come next, with their escalation thresholds tuned conservatively. Consumer Duty monitoring and complaints analysis follow, and the attestation-bearing parts of regulatory reporting are approached last and most cautiously.
Throughout, the cheapest place to build the governance wrapper is at the start. Retrofitting audit trails, eval harnesses, and rollback procedures onto a live automation typically costs several times the original build, and it usually happens under deadline pressure when the regulator is already asking questions. We have seen firms spend their second year rebuilding what should have been designed in during their first; doing it once, properly, is both cheaper and far less stressful.
Where this lands
The honest summary is that most FCA compliance processes have an automatable half and a human half, and the value comes from automating the first while protecting the second with a governance wrapper the firm can actually evidence. The technology is ready; the discipline around it is what separates a deployment that survives scrutiny from one that becomes a liability. AI governance, in this sense, is not a constraint on automation -it is the thing that makes automation defensible enough to be worth doing.
We work with FCA-regulated firms across financial services to identify which compliance processes to automate, design the governance wrapper around them, and build the automation itself. If you want to map your own compliance functions against the human-machine boundary above, our AI automation work is the place to start. Get in touch to discuss where AI can sensibly take work off your compliance team, and where it should not.
Ready to transform your business with AI?
Book a free strategy session to discuss how Evolve AI can help your organisation harness AI safely and compliantly.
Book Strategy Session