AI Governance Evidence Packs: What an Ethics Committee Should Review Before Approval

This article supports Deep Dive B: AI Ethics Committee Simulation, the extra deep-dive hours attached to our CPD Event B: Full-Day AI, Technical Privacy & Emerging Technology Training programme on XpertAcademy. Event B provides the seven-hour CPD programme; the Deep Dive material is there for organisations and learners who want to go one level deeper through practical exercises, real-world evidence packs and an AI ethics committee simulation. Completion and certification are tied to the relevant XpertAcademy learning activity, rather than to reading this article on its own.

An AI ethics committee should not approve an AI-enabled system from a slide deck.

Slides can explain the business case. They can help the committee understand the user journey. They can summarise the supplier's claims. But approval should rest on evidence: data maps, DPIAs or AI impact assessments, model and workflow documentation, bias and fairness review, transparency material, human oversight design, supplier due diligence, cloud and security controls, records and review triggers.

For Deep Dive B, the evidence pack relates to an AI-enabled triage tool in a regulated or essential-service environment. The tool uses online form information, previous interaction history, internal case notes and operational status data to assign a priority score and recommend the next action for staff. The organisation says the tool supports staff decisions rather than replaces them.

That kind of system can affect people even where the final action is taken by a human. Priority scores can influence delay, escalation, service access, complaint handling and how staff interpret a person's situation. The committee should therefore ask for evidence before approval, not reassurance after launch.

Start with the decision the committee is being asked to make

The evidence pack should make the decision clear. Is the committee being asked to approve discovery work, a controlled pilot, a limited live deployment, full deployment, expansion to a new population, or a material change to an existing system?

Those stages need different evidence. A discovery review may focus on purpose, necessity, data sources and obvious red flags. A pilot needs real safeguards, staff guidance, monitoring and clear limits. Full deployment needs much stronger evidence on data protection, fairness, human oversight, supplier controls, operational readiness and review cadence.

For the triage tool, the committee should be wary of vague asks such as "approve AI triage". A better decision request is:

Approve a three-month controlled pilot of the AI-enabled triage recommendation tool for standard service requests only, excluding complaints, safeguarding, urgent hardship and vulnerable-person cases, subject to the conditions in the decision note.

That wording lets the committee match evidence to scope.

Evidence pack contents

The pack should be concise enough to review, but complete enough to support the decision. The committee does not need every technical artefact in the meeting pack. It does need a clear index and access to the underlying records.

For the Deep Dive scenario, the evidence pack should normally include:

Evidence item	Why the committee needs it	Red flags
Use-case statement	Shows purpose, affected workflow, expected benefit and limits of use.	Purpose described only as efficiency; no out-of-scope uses; no alternative considered.
Data map	Shows data sources, fields, flows, storage, access, retention and parties.	Case notes used without review; unclear operational status data; unknown supplier storage.
DPIA / AI impact assessment	Brings privacy, fairness, transparency, human oversight, security and governance risks together.	DPIA still generic; no affected-person view; risks listed without controls.
Necessity and proportionality note	Explains why AI-enabled scoring is needed and why less intrusive options are not enough.	"Because the vendor offers it" or "because AI is faster" as the main justification.
Anonymisation / pseudonymisation claims	Tests whether data protection obligations have genuinely changed or whether data remains linked or linkable.	Vendor says "anonymised" but still processes case IDs, user IDs or reversible tokens.
Bias and fairness evidence	Shows how unfair outcomes have been assessed, mitigated and monitored.	Testing limited to aggregate accuracy; no group or case-type analysis; no monitoring plan.
Significant effects analysis	Tests whether the score affects access, priority, timing or treatment in a material way.	Project says "advisory only" without analysing staff reliance or workflow pressure.
Transparency and challenge materials	Shows what affected people and staff are told, and how outcomes can be queried.	No public wording; no route to correct bad data or challenge triage.
Human oversight design	Shows how staff review, override, record and escalate the recommendation.	Override exists in theory but is hard, discouraged or not monitored.
Supplier and cloud evidence	Shows hosting, access, subprocessors, security, logs, support access, training-use and change control.	Supplier evidence is marketing material only; no DPA or subprocessor position.
Operational readiness plan	Shows staff training, fallback process, incident route, complaints handling and monitoring ownership.	Training planned after go-live; no fallback if the tool fails or is paused.
Draft decision note	Shows proposed decision, conditions, evidence gaps and review triggers.	Decision written before evidence gaps are resolved.

This is not bureaucracy for its own sake. Each item answers a question the committee should not guess.

Worked scenario: what the pack reveals

The project team provides a data map. It shows the tool will use online form answers, previous interaction history, internal case notes and operational status data. At first glance, the data sources look reasonable. The service already uses them for manual triage.

On closer review, the committee sees several issues.

The internal case notes contain free text written by staff over several years. Some notes include subjective comments such as "difficult caller", "failed to engage" or "likely to complain". The notes were not written for model input and may reflect inconsistent staff practice. Some include health, hardship or family details that the form did not request.

The previous interaction history includes missed appointments, call outcomes and complaint markers. Those fields may be relevant in some contexts, but they may also penalise people with access needs, unstable housing, language barriers or caring responsibilities.

The operational status data includes backlog level and staff availability. That may help route work, but it could also mean that people's priority changes because the organisation is under pressure rather than because their need has changed.

The supplier says personal data is pseudonymised before model processing. The pack does not yet explain who holds the key, whether case IDs remain stable, whether outputs are linked back to the person, or whether the supplier can see support logs containing identifiers.

The committee does not need to reject the project automatically. It does need to record that the evidence is not yet enough for full deployment.

Data map and data categories

The data map should show the full processing path, not just the input screen.

For the triage tool, it should cover:

online form data submitted by the person;
previous interaction history;
internal case notes and staff comments;
operational status data;
model inputs and outputs;
priority score and recommended next action;
staff override reasons;
logs, monitoring data and audit records;
supplier and cloud stores;
exported records into case management systems.

The map should classify personal data, special category data, inferred data, staff data, vulnerable-person information and confidential operational data. It should also identify where data quality is weak. Historical case notes are often messy. That does not make them unusable, but it does mean the committee needs evidence of review, minimisation and mitigation.

DPIA and AI impact assessment

The DPIA or AI impact assessment should not be a separate paperwork island. It should drive the committee's decision.

The assessment should explain the purpose, lawful basis, necessity, proportionality, data minimisation, fairness risks, transparency position, rights handling, human oversight, security, supplier controls, residual risks and review plan. If the system may affect people in a regulated or essential-service context, the assessment should also address whether the recommendation has practical effects even if a human makes the final decision.

The committee should look for controls that match the risk. A risk entry saying "bias may occur" is not enough. The pack should say what bias is being tested, which groups or case types are considered, what threshold triggers investigation and who owns corrective action.

Bias, fairness and significant effects

Accuracy is not the same as fairness. A tool can be accurate on average and still perform poorly for particular groups, channels, languages, case types or vulnerable people.

The evidence pack should explain how the organisation tested for unfair or biased outcomes. That may include analysis by case type, service channel, language, location, protected characteristic where lawfully and appropriately assessed, accessibility need, vulnerability indicator, complaint history or proxy variables. The committee should not demand impossible certainty, but it should demand a credible plan.

The significant-effects analysis should be practical. The question is not only whether the system is solely automated. The committee should ask whether the priority score affects timing, escalation, staff attention, access to a service or how the person is treated. If the human review is thin, rushed or culturally discouraged, the effect may be more significant than the design document suggests.

Transparency, challenge and human oversight

The evidence pack should include draft transparency wording for affected people and operational guidance for staff.

Affected people may not need a full technical explanation of the model, but they should receive clear information about AI-assisted triage where appropriate, including what data is used, the purpose of the system, whether staff review recommendations and how to raise concerns or correct information.

Staff need different information. They need to know what the score means, what it does not mean, when to override it, how to record the reason and when to escalate. They should not be left to infer model authority from interface design.

Human oversight evidence should include workflow screenshots or process notes, training material, override logging, review sampling and management expectations. If staff are penalised for taking longer to review or override, the oversight design may fail in practice.

Supplier, cloud and records evidence

Supplier evidence should be more than a security brochure.

The committee should expect the DPA, subprocessor list, hosting and transfer position, access controls, support access process, log and retention information, training-use or improvement settings, change-notification process, incident support route and audit or assurance materials.

Where a supplier claims data is anonymised or pseudonymised, the evidence should explain the method and limits. EDPB Opinion 28/2024 is a useful reminder that anonymity claims in AI contexts require case-by-case assessment. The committee should avoid treating labels as conclusions.

Records evidence should include the draft decision note, evidence index, conditions tracker, issue log, monitoring plan and review schedule. The aim is that someone can later see what was reviewed, what was missing and why the decision was made.

How this supports the simulation

In Deep Dive B Section 3, learners review decision points and evidence. The simulation is strongest when participants do not simply say that AI governance matters. They should be able to open an evidence pack and identify what is strong, what is weak, what is missing and what decision follows.

For the triage tool, a credible committee may approve a limited pilot with strict conditions. It may pause approval until case-note data quality, human oversight and supplier evidence are improved. It may reject a design that relies on unfair historical proxies. The important point is that the decision should come from evidence, not mood.

That is the working skill: read the pack, find the gap, decide what the gap means.

This article is intended to support the extra learning covered in Deep Dive B: AI Ethics Committee Simulation. The seven-hour CPD programme is covered through Event B on XpertAcademy, with the Deep Dive hours available for organisations and learners who want more applied depth. You can return to CPD Event B and the Deep Dive B materials here: CPD Event B: Full-Day AI, Technical Privacy & Emerging Technology Training.

Sources

Information Commissioner's Office, Guidance on AI and data protection: https://ico.org.uk/for-organisations/uk-gdpr-guidance-and-resources/artificial-intelligence/guidance-on-ai-and-data-protection/
Information Commissioner's Office, Governance and accountability in AI: https://ico.org.uk/for-organisations/advice-and-services/audits/data-protection-audit-framework/toolkits/artificial-intelligence/governance-and-accountability-in-ai/
Information Commissioner's Office, What about fairness, bias and discrimination?: https://ico.org.uk/for-organisations/uk-gdpr-guidance-and-resources/artificial-intelligence/guidance-on-ai-and-data-protection/how-do-we-ensure-fairness-in-ai/what-about-fairness-bias-and-discrimination/
Information Commissioner's Office, Explaining decisions made with AI – what goes into an explanation: https://ico.org.uk/for-organisations/uk-gdpr-guidance-and-resources/artificial-intelligence/explaining-decisions-made-with-artificial-intelligence/part-1-the-basics-of-explaining-ai/what-goes-into-an-explanation/
European Data Protection Board, Opinion 28/2024 on certain data protection aspects related to the processing of personal data in the context of AI models: https://www.edpb.europa.eu/documents/opinion-of-the-board-art-64/opinion-282024-on-certain-data-protection-aspects-related-to_en
European Commission, Guidelines for providers and deployers of AI high-risk systems: https://digital-strategy.ec.europa.eu/en/policies/guidelines-ai-high-risk-systems
Data Protection Commission, AI, Large Language Models and Data Protection: https://dataprotection.ie/en/dpc-guidance/blogs/AI-LLMs-and-Data-Protection

Publication verification notes:

Re-check the ICO AI pages before publication because the pages checked on 2026-06-25 carried a live UK legal update banner.
Re-check the European Commission high-risk guidance page before publication because the page checked on 2026-06-25 described the guidelines as draft and not legally binding.
Confirm the Deep Dive B public route and CPD wording before loading. Do not link to Moodle management pages in the article body.