GovCompass
AI governance

Art. 10 EU AI Act: data and data governance for high-risk AI

By GovCompass.ai· Last verified June 2026· Aligned with the consolidated EU AI Act, including the 2026 Omnibus amendments.

Art. 10 requires that the training, validation, and testing data for high-risk AI systems meets quality criteria: relevant, sufficiently representative, and as free of errors and complete as possible for the intended purpose. It also requires documented data governance practices covering collection, preparation, bias examination, and gap mitigation, and it permits the limited processing of special-category data where strictly necessary to detect and correct bias, under safeguards.

Updated: June 2026

This is an explicit providerproviderThe actor who develops an AI system (or has it developed) and places it on the market or into service under its own name — carrying manufacturer-style duties: design controls, documentation, conformity.Open full entry → obligation under the EU AI Act. It falls on whoever develops or places the high-riskriskIn the EU AI Act's terms, the combination of the probability that a harm occurs and the severity of it if it does. The link between a principle (via the harm that would breach it) and a control (the measure that reduces it). Naming the harm and assessing its risk is required by Art. 9 before any mitigation measure is chosen. See harm, control, residual risk.Open full entry → AI systemAI systemA machine-based system that, for explicit or implicit objectives, infers from input how to generate outputs — predictions, content, recommendations or decisions — that can influence physical or virtual environments. The OECD-style definition followed by the EU AI Act.Open full entry → on the market. Deployers carry a related input-data duty under Art. 26.4.

Introduction: data as the root of most AI risk

Most of the failure modes the EU AI Act is concerned with originate in data. A biased outcome is usually a biased dataset expressed through a model. A privacy exposure is usually data that should not have been collected, retained, or used. A performance failure is often a training set that no longer represents the population the system serves. Art. 10 is the obligation that addresses risk at its source, by setting quality and governancegovernanceThe system through which an organization steers itself: corporate governance, risk management, compliance, lines of accountability, risk appetite, and the operating model. It exists across everything the organization does, before and beyond AI. AI governance is this same system extended for AI. See AI governance, governance design, execution level.Open full entry → requirements on the data that high-risk AI systems are built and run on.

Art. 10 applies primarily to providers, who develop the system and controlcontrolThe concrete, testable measure that reduces a specific risk, and through that risk protects the principle behind it. Also called a risk management measure, risk response, or risk treatment. Always traceable to the risk it addresses: under EU AI Act Art. 9 every control must map back to a specific risk, and controls recorded separately from their risks is a recognized compliance failure. It works in one of three types: preventive, detective, or corrective. See risk, control types, evidence.Open full entry → its training. But its logic reaches deployers too, because the input data a deployerdeployerAn organization using an AI system under its own authority in its activities — carrying operator duties: use per instructions, oversight, input relevance, monitoring, notices.Open full entry → supplies in operation must meet the conditions the provider specified, a duty that appears separately as the deployer obligation in Art. 26.4.

What the data must be

Art. 10 requires that training, validation, and testing datasets are subject to appropriate data governance and meet quality criteria. The datasets must be:

  • Relevant to the intended purpose of the system.
  • Sufficiently representative of the persons and situations the system will be used on, so the system does not perform well on one group and poorly on another.
  • As free of errors and as complete as possible in view of the intended purpose.
  • Appropriate in their statistical properties, including for the groups the system is intended to affect.

These are not absolute standards. The article qualifies them with "to the best extent possible" and "in view of the intended purpose", which means the provider must make and document a reasoned judgement about what level of quality is adequate for the stakes of the use case, rather than meeting a fixed numerical bar.

What the governance must cover

Beyond the quality of the data itself, Art. 10 requires documented data governance and management practices. These cover the design choices and data origin, the collection process and provenanceprovenanceThe documented origin and history of data or content, used to establish where it came from and whether it can be trusted or lawfully used.Open full entry →, the preparation operations such as labelling and cleaning, the formulation of assumptions about what the data measures, an assessment of whether the data is available, suitable, and sufficient, and an examination for possible biases that could affect health, safety, or fundamental rights, together with measures to detect, prevent, and mitigate those biases.

This is the governance trail a conformity assessmentconformity assessmentThe pre-market process demonstrating a high-risk AI system meets the EU AI Act's requirements, leading to CE marking and registration.Open full entry → expects: not just a clean dataset, but a documented account of where it came from, how it was prepared, what was assumed, and how bias was looked for and addressed.

The special-category data provision and the real-time data angle

Art. 10(5) contains an important and often-misread provision. To detect and correct bias, providers may exceptionally process special categories of personal data, the sensitive data the GDPR otherwise restricts, but only where strictly necessary, and under safeguards: the bias cannot be detected by processing other data, the data is subject to technical limits on reuse, security and privacy-preserving measures apply, and the data is deleted once the bias is corrected or its retention period ends.

This provision is also where the operational angle of data protection at the point of use comes in. For systems that process data in real time, the discipline of minimizing and masking sensitive data before it reaches the model is the operational expression of the same principleprincipleOne of the seven responsible-AI values a governed system should live up to (fairness, safety and reliability, privacy, security and robustness, transparency and explainability, accountability, human oversight). A principle is abstract: it states an outcome, not a lever you can pull. It becomes governable by naming the harm that would breach it, assessing the risk that harm carries, and placing controls against that risk. When GovCompass holds a principle this way it calls it a pillar. See pillar, harm, risk.Open full entry →: process the least sensitive data necessary, protect what must be processed, and document why. Sensitive data masked or redacted at the input level, before it reaches the model, is a concrete control that serves both the Art. 10 data governance obligation and the GDPR's minimization principle.

Why it matters

Data governance failures are doubly exposed, because the same dataset can carry both a fairnessfairnessThe responsible-AI principle that systems should not create or reinforce unjust discrimination; operationalized through bias testing, representative data and per-group thresholds — with multiple, mutually incompatible mathematical definitions.Open full entry → defect and a privacy defect, and the two are policed by different parts of the law. A training set that over-represents one group creates an Art. 10 quality failure and a fairness risk under the risk management system, while the same set, if it contains personal data that should not have been collected, creates a GDPR exposure. Addressing data governance well closes several risks at once; neglecting it opens several at once.

Governing data quality and governance

The controls treat data as a managed asset with a documented lifecycle, not a raw input that happens to be available.

The core artefact is a data sheet for each dataset, recording its origin and provenance, its size and population characteristics, the preparation and labelling operations applied, the assumptions made, the bias examination performed and its findings, and the known limitations. This sheet becomes part of the technical documentationtechnical documentationRecords a provider must compile and keep for a high-risk AI system to demonstrate conformity, covering its design, data, testing, risk management and monitoring.Open full entry → and is the evidenceevidenceThe concrete proof that a control is designed, implemented, and working: a test report, an audit trail, an impact assessment, a monitoring log. Each link in the governance chain produces an artifact, and together they are what an organization hands to its own board, a regulator, a customer, or an affected person to show, not say, that a system is governed. Its absence is itself the failure: a risk register without test results, or a mitigation claimed without validation, is a governance gap, not a paperwork one. The closing link of the governance chain. See control, governance.Open full entry → a conformity assessment examines.

For systems processing personal data, the data governance controls integrate with the organization's GDPR controls rather than running in parallel: one minimization discipline, one lawful-basis analysis, one retention schedule, applied to the AI data lifecycle. Where special-category dataspecial-category dataGDPR Article 9 data: health, ethnicity, political opinions, religion, sexual orientation, biometrics for identification — processable only on narrow grounds. Inferring these traits creates them.Open full entry → is processed under the Art. 10(5) exception, the strict-necessity justification and the safeguards are documented before processing begins, not reconstructed afterward.

Compliance checklist

  1. Is there a documented data sheet for each training, validation, and testing dataset, covering provenance, preparation, and limitations?
  2. Has each dataset been assessed for relevance, representativenessrepresentativenessHow well training data reflects the population and conditions the system will face in deployment — the fitness-for-purpose core of AI data quality.Open full entry →, error rate, and completeness against the intended purpose, with the judgement documented?
  3. Has each dataset been examined for biases that could affect health, safety, or fundamental rights, with mitigation measures recorded?
  4. Where special-category data is processed to detect or correct bias, is the strict-necessity justification documented and are the Art. 10(5) safeguards in place?
  5. For systems processing personal data in real time, is sensitive data minimized or masked before it reaches the model?
  6. Do the data governance controls integrate with the organization's GDPR controls rather than duplicate them?
Legal referencesArt. 10Art. 9GDPR
Share Share on LinkedIn

More on Accountability

Art. 12 EU AI Act: record-keeping and logging for high-risk AI

Reference

Art. 12 requires high-risk AI systems to technically allow for the automatic recording of events (logs) over their lifetime. The logging must enable traceability of the system's functioning at a level appropriate to its intended purpose, support post-market monitoring, and help identify situations that may lead to risk or substantial modification. It is a design obligation on the provider that makes the system auditable by construction.

Art. 19 EU AI Act: keeping the automatically generated logs

Reference

Art. 19 requires providers of high-risk AI systems to keep the logs that the system automatically generates (under Art. 12) for as long as they control them, for a period appropriate to the intended purpose and at least six months unless other law requires longer. It is the retention counterpart to the Art. 12 logging capability, and it works alongside the deployer retention duty in Art. 26.6.

Art. 26.1 EU AI Act: following provider instructions as a deployer

Reference

Art. 26.1 requires deployers to use high-risk AI systems strictly in accordance with the provider's instructions for use. This means using the system only for its intended purpose, within its specified technical configuration, and by qualified users, and documenting that compliance. Deviating from the instructions can shift liability entirely to the deployer.

Art. 26.6 EU AI Act: log retention and audit trail obligations

Reference

Art. 26.6 requires deployers of high-risk AI to retain the system-generated logs for at least six months, unless other law requires longer. The logs are the primary evidence that the system was used in accordance with its instructions.

More on Fairness

More on Privacy

Art. 26.4 EU AI Act: input data quality for deployers

Reference

Art. 26.4 requires deployers of high-risk AI to ensure that input data is relevant and sufficiently representative for the system's intended purpose. The deployer is responsible for data quality in operation, even though the provider sets the specifications under Art. 10.

Art. 26.9 EU AI Act: DPIA obligation for high-risk AI

Reference

Art. 26.9 links the EU AI Act to the GDPR: where a data protection impact assessment (DPIA) is required under GDPR Art. 35, deployers of high-risk AI must use the information from the provider's documentation to support that assessment.

Control-level compliance: the EU AI Act as an instrumented system

Analysis

Control-level compliance means satisfying the EU AI Act through engineered, evidenced controls rather than policy documents. The technical articles translate directly into system controls: immutable logs (Art. 12, 19), a kill switch (Art. 14(4)(e)), data masking before the model (Art. 10), configurable block policies (Art. 26), risk scoring and incident reporting within deadline (Art. 9, 73), and workspace isolation with role-based access (Art. 14, 26). Compliance at this level is an instrumented system, not a policy as PDF.

EU AI Act and GDPR: how do the two regulations relate?

Guide

The EU AI Act and the GDPR create overlapping but distinct obligations for AI systems that process personal data. They align on data quality, impact assessments, transparency, and individual rights, but differ in scope, accountability roles, and incident-reporting timelines, so the efficient approach is integrated compliance, such as a combined DPIA/FRIA.