How an AI Model Audit Works for FCA-Regulated Firms
When the FCA asks an authorised firm to explain a decision an AI model made about a customer, the answer cannot be "the model decided." The regulator expects the firm to evidence how the model was built, validated, monitored, and governed -and to do so on demand. An AI model audit is the structured process that produces that evidence. It is the difference between a firm that can defend its use of AI to a supervisor and one that is exposed. This article sets out exactly how an AI model audit works for an FCA-regulated firm, step by step, and how it maps onto the regulatory frameworks already in force across UK financial services.
The audience here is compliance, risk, and technology leaders who need to audit and explain machine-learning models to the regulator -whether those models drive credit decisioning, pricing, fraud detection, suitability assessment, or customer communications. For the wider regulatory context, the pillar piece on what compliance officers need to know about the FCA and AI is the place to start. This piece goes deeper on the audit itself.
What an AI model audit actually is
An AI model audit is a systematic review of a model across its full lifecycle, designed to answer three questions a supervisor will eventually ask. Does the model do what the firm claims it does? Can the firm explain and justify the decisions it produces? And is the firm governing the model with the rigour its impact on customers warrants? The audit is not a one-off technical test. It is an ongoing discipline that generates a documented body of evidence -the kind of evidence that underpins the "reasonable steps" defence under the Senior Managers and Certification Regime.
The FCA has not published bespoke AI model risk rules. It regulates AI through existing frameworks: SM&CR for accountability, Consumer Duty for outcomes, operational resilience (PS21/3) for continuity, and the systems and controls requirements in SYSC. For dual-regulated firms, the PRA's supervisory statement SS1/23 on model risk management sets the most developed expectations, and although it was written with traditional and capital models in mind, its four principles -identification, governance, lifecycle management, and independent validation -translate cleanly onto AI and machine-learning models. A credible AI model audit is built around those principles.
Step one: build and maintain the model inventory
You cannot audit what you have not catalogued. The first step is a complete inventory of every AI and machine-learning model in use across the firm, including models embedded in third-party products and models employees have introduced informally. SS1/23 makes a model inventory an explicit expectation, and the FCA applies the same logic through SYSC: a firm must understand its own systems.
For each model the inventory should record the following.
- Purpose and scope: what the model does, which regulated activity it touches, and what decisions it makes or influences.
- Materiality tier: a risk rating that drives the depth of validation and frequency of review. A model that sets retail credit limits sits in a higher tier than one that drafts internal meeting summaries.
- Inputs and data lineage: the data the model consumes, its source, and any features that could proxy for protected characteristics.
- Ownership: the model owner, the validator, and -critically -the named senior manager accountable for it under SM&CR.
- Lifecycle status: in development, in production, under review, or retired, with version history.
The inventory is the spine of the audit. Everything that follows -validation evidence, monitoring results, change records -hangs off the inventory entry for each model. A firm that cannot produce a current inventory on request has effectively conceded the first point of the audit before it begins.
Step two: validate the model independently
Validation is the technical core of the audit, and the principle that matters most is independence. The team that built the model should not be the team that signs off its fitness. SS1/23 is explicit on this, and the FCA expects an equivalent separation of development from challenge. For a mid-market firm without a large internal validation function, this is often where independent external review earns its place.
A robust validation covers four dimensions.
Accuracy and performance
Does the model perform as claimed against an out-of-sample test set that reflects real production conditions? Validation should report the metrics that matter for the use case -not a single headline accuracy figure that masks poor performance on important sub-populations. For a credit model, that means precision, recall, and calibration assessed across customer segments, not just in aggregate.
Bias and fairness
This is the dimension the FCA scrutinises most closely, because it connects directly to Consumer Duty and the fair treatment of vulnerable customers. Validation must test whether the model produces systematically different outcomes across protected and vulnerable groups, including where it does so through proxy variables the firm did not intend to use. Postcode, for instance, can act as a proxy for ethnicity. The audit should document the fairness metrics chosen, the thresholds applied, and the rationale for both.
Robustness
How does the model behave when inputs drift from the training distribution, when data is missing, or when it encounters cases it has never seen? A model that degrades gracefully and signals low confidence is materially safer than one that produces a confident answer regardless of input quality. Robustness testing probes these edges deliberately rather than waiting for production to find them.
Adversarial testing
For any model exposed to user input -particularly generative models and large language models -the audit must test resilience to deliberate manipulation: prompt injection, attempts to extract training data, and attempts to coax the model outside its scope. This is where AI model audit and evaluation engineering converge. The structured test sets described in our guide to building an agentic AI eval harness are precisely the artefact a validator uses to evidence adversarial robustness on an ongoing basis, not just at the point of sign-off.
Step three: establish explainability and transparency
Satisfying the FCA on AI model transparency is less about exposing the mathematics and more about being able to justify outcomes in terms a customer, a complaints handler, and a supervisor can understand. The regulator's position is outcome-focused: it cares whether the firm can explain why a particular decision was made and demonstrate that the decision was fair. Transparency operates at two levels, and the audit must address both.
Global transparency is the firm's understanding of how the model behaves overall -which features drive its decisions, where it is reliable, and where it is not. This is documented in model documentation that a non-specialist senior manager can follow. Local explainability is the ability to explain an individual decision -why this customer was declined, why this transaction was flagged. Techniques such as SHAP and LIME generate feature-attribution explanations for individual cases, and for inherently interpretable models the explanation may be the model itself.
A word of caution the audit should record: post-hoc explanation techniques approximate the model's reasoning; they do not reveal it with certainty. A firm that presents a SHAP plot as definitive truth has misunderstood the tool. The honest position -and the one the FCA respects -is that the firm uses these techniques as one input into a governed, human-overseen decision, with documented limitations. Where a use case demands a level of explanation the model cannot support, the right answer is sometimes a simpler, more interpretable model, even at the cost of marginal accuracy.
Step four: evidence the audit trail and decision logging
Transparency is meaningless without a record. The audit must confirm that the firm captures, for every material AI-assisted decision, a tamper-evident trail that connects the model's input to the final outcome. This is both a regulatory expectation and the practical foundation for handling complaints, redress, and supervisory enquiries.
A sound decision-logging standard captures the following for each material decision.
- Inputs: the data supplied to the model, with source and timestamp.
- Model version: the exact version and configuration used, so the decision can be reproduced against the model that actually made it.
- Output: the raw model output before any post-processing or human override.
- Explanation: the feature attribution or rationale generated for that decision.
- Human review: who reviewed the output, what they changed, and why.
- Final decision and outcome: the action taken and, where relevant, the subsequent outcome for the customer.
Retention periods should align with the firm's record-keeping obligations and the limitation period for the relevant activity. The audit should test that logs are immutable, access-controlled, and recoverable -not simply that logging is switched on. This is materially easier to evidence when models run within infrastructure the firm controls. Public AI APIs frequently offer limited or short-lived logging that will not survive a decision being challenged eighteen months later. Our Secure AI Platform is built around this requirement, with full logging and retention in the firm's own environment.
Step five: ongoing monitoring and drift detection
A model that was validated as fair and accurate at launch does not stay that way. The world changes, customer behaviour changes, and the data the model sees in production diverges from the data it was trained on. The audit must confirm that the firm monitors live models continuously rather than treating validation as a one-time gate. This is the single most common gap we see: firms validate thoroughly at deployment and then stop looking.
Effective monitoring tracks several signals against defined thresholds.
- Data drift: changes in the distribution of input features relative to the training data, which can quietly erode performance.
- Concept drift: changes in the relationship between inputs and the thing being predicted, where the model's assumptions stop holding.
- Performance decay: the model's accuracy or calibration falling below the threshold set at validation.
- Outcome drift: shifts in the outcomes the model produces across customer segments -the Consumer Duty early-warning signal that a model has become systematically less fair to a group of customers.
Each signal needs a defined threshold and a defined response: an alert, a human review, a re-validation, or a controlled rollback. Monitoring without thresholds is theatre. Monitoring with thresholds and a documented escalation path is a control the firm can evidence.
Step six: change management
Models change -through retraining, fine-tuning, prompt revisions, or an update to the underlying foundation model pushed by a third-party provider. Each change can alter behaviour in ways that invalidate prior validation, and the audit must confirm that no material change reaches production without going through a formal change-management process. That process should require appropriate re-testing, a record of who approved the change and on what basis, and a rollback path if the change misbehaves.
The provider-driven model update deserves particular attention, because it can change a firm's model behaviour without the firm doing anything. A firm relying on a hosted model must know when the underlying model version changes, and must re-run its validation evidence against the new version before continuing to rely on it. Pinning to a specific model version, where the provider supports it, gives the firm control over when that re-validation happens rather than having behaviour change underneath it.
Step seven: the SM&CR "reasonable steps" evidence
Under the Senior Managers and Certification Regime, accountability for a model cannot be delegated to the model. If an AI system causes customer harm or breaches a rule, the accountable senior manager is answerable for the governance and controls that should have prevented it. The reasonable steps defence is not a verbal assurance that the manager was diligent -it is documented evidence that they engaged with the right questions and put proportionate controls in place.
The AI model audit is what produces that evidence. The audit outputs assemble into a control pack the accountable senior manager reviews and signs: the current model inventory, the independent validation reports, the monitoring dashboards and any breached thresholds, the change log, and the record of decisions taken in response to issues. A senior manager who reviews this pack on a defined cadence, challenges it, and records that challenge has taken reasonable steps. A senior manager relying on informal reassurance from the development team has not.
"The question is never whether the model was perfect. Models fail. The question the FCA is really asking is whether the firm had a structured, documented, independent process for knowing how its models behave -and acted on what that process told it. That process is the audit, and it is the firm's strongest defence."
How an AI model audit maps to a supervisory conversation
It helps to picture the supervisory conversation the audit is designed to win. A supervisor asks how the firm uses AI; the model inventory answers it. The supervisor asks whether a given model is fair to vulnerable customers; the validation and the outcome-drift monitoring answer it. The supervisor asks the firm to explain a specific declined application; the decision log and the local explanation answer it. The supervisor asks who is accountable; the SM&CR mapping answers it. The supervisor asks what happens when the model fails; the operational-resilience fallback procedures and the change-management rollback path answer it.
In each case the audit has already produced the artefact before the question is asked. That is the point. A firm that assembles its evidence in response to a supervisory request looks reactive and exposed. A firm that can hand over a current, structured control pack looks governed -and governed firms are treated very differently by their supervisors.
Proportionality: matching audit depth to model risk
None of this means every model gets the same treatment. The FCA's approach is proportionate, and so should the audit be. A model that drafts internal documents needs light-touch oversight. A model that decides whether a customer gets credit, what they pay, or whether their claim is paid sits at the top tier and warrants the full discipline above -independent validation, fairness testing, decision-level logging, and senior-manager review. The materiality tier in the inventory is what drives this, and getting the tiering right is itself something the audit should test. Over-engineering low-risk models wastes effort; under-governing high-risk ones is where firms get hurt.
For mid-market firms in particular, the practical path is to start with the highest-risk models -those touching customer outcomes or sensitive data -and build the audit discipline around them first, extending it as capacity allows. Evidence of a structured, prioritised approach is far more credible to the FCA than a thin, uniform process applied everywhere.
Bringing it together
An AI model audit for an FCA-regulated firm is not a single event but a repeating cycle: inventory the models, validate them independently, make them explainable, log their decisions, monitor them in production, control their changes, and assemble the evidence the accountable senior manager needs to discharge their SM&CR responsibilities. Done well, it satisfies the regulator on model transparency not through a single document but through a living body of evidence that demonstrates the firm knows how its models behave and acts when they drift.
We work with FCA-regulated firms across UK financial services to design and run AI model audits, build the validation and monitoring infrastructure they depend on, and deploy models within secure, fully logged environments through our Secure AI Platform. If you need to be able to explain and defend your models to the regulator -before the regulator asks - get in touch to discuss your firm's position.
Ready to transform your business with AI?
Book a free strategy session to discuss how Evolve AI can help your organisation harness AI safely and compliantly.
Book Strategy Session