ReferenceAccountability Fairness Privacy

Art. 10 EU AI Act: data and data governance for high-risk AI

By Michel Venniker· 25 Jun 2026· Aligned with the consolidated EU AI Act, including the 2026 Omnibus amendments.

Art. 10 requires that the training, validation, and testing data for high-risk AI systems meets quality criteria: relevant, sufficiently representative, and as free of errors and complete as possible for the intended purpose. It also requires documented data governance practices covering collection, preparation, bias examination, and gap mitigation, and it permits the limited processing of special-category data where strictly necessary to detect and correct bias, under safeguards.

Updated: June 2026

This is an explicit provider obligation under the EU AI Act. It falls on whoever develops or places the high-risk AI system on the market. Deployers carry a related input-data duty under Art. 26.4.

Introduction: data as the root of most AI risk

Most of the failure modes the EU AI Act is concerned with originate in data. A biased outcome is usually a biased dataset expressed through a model. A privacy exposure is usually data that should not have been collected, retained, or used. A performance failure is often a training set that no longer represents the population the system serves. Art. 10 is the obligation that addresses risk at its source, by setting quality and governance requirements on the data that high-risk AI systems are built and run on.

Art. 10 applies primarily to providers, who develop the system and control its training. But its logic reaches deployers too, because the input data a deployer supplies in operation must meet the conditions the provider specified, a duty that appears separately as the deployer obligation in Art. 26.4.

What the data must be

Art. 10 requires that training, validation, and testing datasets are subject to appropriate data governance and meet quality criteria. The datasets must be:

Relevant to the intended purpose of the system.
Sufficiently representative of the persons and situations the system will be used on, so the system does not perform well on one group and poorly on another.
As free of errors and as complete as possible in view of the intended purpose.
Appropriate in their statistical properties, including for the groups the system is intended to affect.

These are not absolute standards. The article qualifies them with "to the best extent possible" and "in view of the intended purpose", which means the provider must make and document a reasoned judgement about what level of quality is adequate for the stakes of the use case, rather than meeting a fixed numerical bar.

What the governance must cover

Beyond the quality of the data itself, Art. 10 requires documented data governance and management practices. These cover the design choices and data origin, the collection process and provenance, the preparation operations such as labelling and cleaning, the formulation of assumptions about what the data measures, an assessment of whether the data is available, suitable, and sufficient, and an examination for possible biases that could affect health, safety, or fundamental rights, together with measures to detect, prevent, and mitigate those biases.

This is the governance trail a conformity assessment expects: not just a clean dataset, but a documented account of where it came from, how it was prepared, what was assumed, and how bias was looked for and addressed.

The special-category data provision and the real-time data angle

Art. 10(5) contains an important and often-misread provision. To detect and correct bias, providers may exceptionally process special categories of personal data, the sensitive data the GDPR otherwise restricts, but only where strictly necessary, and under safeguards: the bias cannot be detected by processing other data, the data is subject to technical limits on reuse, security and privacy-preserving measures apply, and the data is deleted once the bias is corrected or its retention period ends.

This provision is also where the operational angle of data protection at the point of use comes in. For systems that process data in real time, the discipline of minimising and masking sensitive data before it reaches the model is the operational expression of the same principle: process the least sensitive data necessary, protect what must be processed, and document why. Sensitive data masked or redacted at the input level, before it reaches the model, is a concrete control that serves both the Art. 10 data governance obligation and the GDPR's minimisation principle.

Why it matters

Data governance failures are doubly exposed, because the same dataset can carry both a fairness defect and a privacy defect, and the two are policed by different parts of the law. A training set that over-represents one group creates an Art. 10 quality failure and a fairness risk under the risk management system, while the same set, if it contains personal data that should not have been collected, creates a GDPR exposure. Addressing data governance well closes several risks at once; neglecting it opens several at once.

Governing data quality and governance

The controls treat data as a managed asset with a documented lifecycle, not a raw input that happens to be available.

The core artefact is a data sheet for each dataset, recording its origin and provenance, its size and population characteristics, the preparation and labelling operations applied, the assumptions made, the bias examination performed and its findings, and the known limitations. This sheet becomes part of the technical documentation and is the evidence a conformity assessment examines.

For systems processing personal data, the data governance controls integrate with the organisation's GDPR controls rather than running in parallel: one minimisation discipline, one lawful-basis analysis, one retention schedule, applied to the AI data lifecycle. Where special-category data is processed under the Art. 10(5) exception, the strict-necessity justification and the safeguards are documented before processing begins, not reconstructed afterward.

Compliance checklist

Is there a documented data sheet for each training, validation, and testing dataset, covering provenance, preparation, and limitations?
Has each dataset been assessed for relevance, representativeness, error rate, and completeness against the intended purpose, with the judgement documented?
Has each dataset been examined for biases that could affect health, safety, or fundamental rights, with mitigation measures recorded?
Where special-category data is processed to detect or correct bias, is the strict-necessity justification documented and are the Art. 10(5) safeguards in place?
For systems processing personal data in real time, is sensitive data minimised or masked before it reaches the model?
Do the data governance controls integrate with the organisation's GDPR controls rather than duplicate them?

Legal referencesArt. 10 Art. 9 GDPR

More on Accountability

Art. 12 EU AI Act: record-keeping and logging for high-risk AI

Reference

Art. 12 requires high-risk AI systems to technically allow for the automatic recording of events (logs) over their lifetime. The logging must enable traceability of the system's functioning at a level appropriate to its intended purpose, support post-market monitoring, and help identify situations that may lead to risk or substantial modification. It is a design obligation on the provider that makes the system auditable by construction.

Art. 19 EU AI Act: keeping the automatically generated logs

Reference

Art. 19 requires providers of high-risk AI systems to keep the logs that the system automatically generates (under Art. 12) for as long as they control them, for a period appropriate to the intended purpose and at least six months unless other law requires longer. It is the retention counterpart to the Art. 12 logging capability, and it works alongside the deployer retention duty in Art. 26.6.

Art. 26.1 EU AI Act: following provider instructions as a deployer

Reference

Art. 26.1 requires deployers to use high-risk AI systems strictly in accordance with the provider's instructions for use. This means using the system only for its intended purpose, within its specified technical configuration, and by qualified users, and documenting that compliance. Deviating from the instructions can shift liability entirely to the deployer.

Art. 26.6 EU AI Act: log retention and audit trail obligations

Reference

Art. 26.6 requires deployers of high-risk AI to retain the system-generated logs for at least six months, unless other law requires longer. The logs are the primary evidence that the system was used in accordance with its instructions.

More on Fairness

Art. 5 EU AI Act: all 8 prohibited AI practices explained

Reference

Art. 5 lists the eight prohibited AI practices, including subliminal manipulation, exploitation of vulnerable groups, social scoring, and untargeted facial-recognition scraping. These prohibitions are absolute, apply to every organisation regardless of size, and have been in force since 2 February 2025.

AI in recruitment: risks, bias and what the EU AI Act already requires

Analysis

AI recruitment systems fall under Annex III of the EU AI Act as high-risk, which triggers the full deployer obligations of Article 26, human oversight, data quality, monitoring, log retention, and a Fundamental Rights Impact Assessment under Article 27. These duties cannot be transferred to the software vendor.

FRIA step by step: how to conduct a Fundamental Rights Impact Assessment

Guide

A Fundamental Rights Impact Assessment (FRIA) under Art. 27 is conducted step by step: describe the system and its purpose, identify affected persons, assess the impact on each fundamental rights dimension, define mitigation measures, and document the residual risk before deployment.

More on Privacy

Art. 26.4 EU AI Act: input data quality for deployers

Reference

Art. 26.4 requires deployers of high-risk AI to ensure that input data is relevant and sufficiently representative for the system's intended purpose. The deployer is responsible for data quality in operation, even though the provider sets the specifications under Art. 10.

Art. 26.9 EU AI Act: DPIA obligation for high-risk AI

Reference

Art. 26.9 links the EU AI Act to the GDPR: where a data protection impact assessment (DPIA) is required under GDPR Art. 35, deployers of high-risk AI must use the information from the provider's documentation to support that assessment.

Control-level compliance: the EU AI Act as an instrumented system

Analysis

Control-level compliance means satisfying the EU AI Act through engineered, evidenced controls rather than policy documents. The technical articles translate directly into system controls: immutable logs (Art. 12, 19), a kill switch (Art. 14(4)(e)), data masking before the model (Art. 10), configurable block policies (Art. 26), risk scoring and incident reporting within deadline (Art. 9, 73), and workspace isolation with role-based access (Art. 14, 26). Compliance at this level is an instrumented system, not a policy as PDF.

EU AI Act and GDPR: how do the two regulations relate?

Guide

The EU AI Act and the GDPR create overlapping but distinct obligations for AI systems that process personal data. They align on data quality, impact assessments, transparency, and individual rights, but differ in scope, accountability roles, and incident-reporting timelines, so the efficient approach is integrated compliance, such as a combined DPIA/FRIA.