GovCompass
Knowledge base

Art. 10 EU AI Act: data and data governance for high-risk AI

By Michel Venniker· · Aligned with the consolidated EU AI Act, including the 2026 Omnibus amendments.

Art. 10 requires that the training, validation, and testing data for high-risk AI systems meets quality criteria: relevant, sufficiently representative, and as free of errors and complete as possible for the intended purpose. It also requires documented data governance practices covering collection, preparation, bias examination, and gap mitigation, and it permits the limited processing of special-category data where strictly necessary to detect and correct bias, under safeguards.

Updated: June 2026

This is an explicit providerproviderThe actor who develops an AI system (or has it developed) and places it on the market or into service under its own name — carrying manufacturer-style duties: design controls, documentation, conformity.Open full entry → obligation under the EU AI Act. It falls on whoever develops or places the high-risk AI systemAI systemA machine-based system that, for explicit or implicit objectives, infers from input how to generate outputs — predictions, content, recommendations or decisions — that can influence physical or virtual environments. The OECD-style definition followed by the EU AI Act.Open full entry → on the market. Deployers carry a related input-data duty under Art. 26.4.

Introduction: data as the root of most AI risk

Most of the failure modes the EU AI Act is concerned with originate in data. A biased outcome is usually a biased dataset expressed through a model. A privacy exposure is usually data that should not have been collected, retained, or used. A performance failure is often a training set that no longer represents the population the system serves. Art. 10 is the obligation that addresses risk at its source, by setting quality and governance requirements on the data that high-risk AI systems are built and run on.

Art. 10 applies primarily to providers, who develop the system and control its training. But its logic reaches deployers too, because the input data a deployerdeployerAn organization using an AI system under its own authority in its activities — carrying operator duties: use per instructions, oversight, input relevance, monitoring, notices.Open full entry → supplies in operation must meet the conditions the provider specified, a duty that appears separately as the deployer obligation in Art. 26.4.

What the data must be

Art. 10 requires that training, validation, and testing datasets are subject to appropriate data governance and meet quality criteria. The datasets must be:

  • Relevant to the intended purpose of the system.
  • Sufficiently representative of the persons and situations the system will be used on, so the system does not perform well on one group and poorly on another.
  • As free of errors and as complete as possible in view of the intended purpose.
  • Appropriate in their statistical properties, including for the groups the system is intended to affect.

These are not absolute standards. The article qualifies them with "to the best extent possible" and "in view of the intended purpose", which means the provider must make and document a reasoned judgement about what level of quality is adequate for the stakes of the use case, rather than meeting a fixed numerical bar.

What the governance must cover

Beyond the quality of the data itself, Art. 10 requires documented data governance and management practices. These cover the design choices and data origin, the collection process and provenance, the preparation operations such as labelling and cleaning, the formulation of assumptions about what the data measures, an assessment of whether the data is available, suitable, and sufficient, and an examination for possible biases that could affect health, safety, or fundamental rights, together with measures to detect, prevent, and mitigate those biases.

This is the governance trail a conformity assessmentconformity assessmentThe pre-market process demonstrating a high-risk AI system meets the EU AI Act's requirements, leading to CE marking and registration.Open full entry → expects: not just a clean dataset, but a documented account of where it came from, how it was prepared, what was assumed, and how bias was looked for and addressed.

The special-category data provision and the real-time data angle

Art. 10(5) contains an important and often-misread provision. To detect and correct bias, providers may exceptionally process special categories of personal data, the sensitive data the GDPR otherwise restricts, but only where strictly necessary, and under safeguards: the bias cannot be detected by processing other data, the data is subject to technical limits on reuse, security and privacy-preserving measures apply, and the data is deleted once the bias is corrected or its retention period ends.

This provision is also where the operational angle of data protection at the point of use comes in. For systems that process data in real time, the discipline of minimising and masking sensitive data before it reaches the model is the operational expression of the same principle: process the least sensitive data necessary, protect what must be processed, and document why. Sensitive data masked or redacted at the input level, before it reaches the model, is a concrete control that serves both the Art. 10 data governance obligation and the GDPR's minimisation principle.

Why it matters

Data governance failures are doubly exposed, because the same dataset can carry both a fairnessfairnessThe responsible-AI principle that systems should not create or reinforce unjust discrimination; operationalised through bias testing, representative data and per-group thresholds — with multiple, mutually incompatible mathematical definitions.Open full entry → defect and a privacy defect, and the two are policed by different parts of the law. A training set that over-represents one group creates an Art. 10 quality failure and a fairness risk under the risk management system, while the same set, if it contains personal data that should not have been collected, creates a GDPR exposure. Addressing data governance well closes several risks at once; neglecting it opens several at once.

Governing data quality and governance

The controls treat data as a managed asset with a documented lifecycle, not a raw input that happens to be available.

The core artefact is a data sheet for each dataset, recording its origin and provenance, its size and population characteristics, the preparation and labelling operations applied, the assumptions made, the bias examination performed and its findings, and the known limitations. This sheet becomes part of the technical documentation and is the evidence a conformity assessment examines.

For systems processing personal data, the data governance controls integrate with the organisation's GDPR controls rather than running in parallel: one minimisation discipline, one lawful-basis analysis, one retention schedule, applied to the AI data lifecycle. Where special-category dataspecial-category dataGDPR Article 9 data: health, ethnicity, political opinions, religion, sexual orientation, biometrics for identification — processable only on narrow grounds. Inferring these traits creates them.Open full entry → is processed under the Art. 10(5) exception, the strict-necessity justification and the safeguards are documented before processing begins, not reconstructed afterward.

Compliance checklist

  1. Is there a documented data sheet for each training, validation, and testing dataset, covering provenance, preparation, and limitations?
  2. Has each dataset been assessed for relevance, representativenessrepresentativenessHow well training data reflects the population and conditions the system will face in deployment — the fitness-for-purpose core of AI data quality.Open full entry →, error rate, and completeness against the intended purpose, with the judgement documented?
  3. Has each dataset been examined for biases that could affect health, safety, or fundamental rights, with mitigation measures recorded?
  4. Where special-category data is processed to detect or correct bias, is the strict-necessity justification documented and are the Art. 10(5) safeguards in place?
  5. For systems processing personal data in real time, is sensitive data minimised or masked before it reaches the model?
  6. Do the data governance controls integrate with the organisation's GDPR controls rather than duplicate them?
Legal referencesArt. 10Art. 9GDPR

More on Accountability

Art. 12 EU AI Act: record-keeping and logging for high-risk AI

Reference

Art. 12 requires high-risk AI systems to technically allow for the automatic recording of events (logs) over their lifetime. The logging must enable traceability of the system's functioning at a level appropriate to its intended purpose, support post-market monitoring, and help identify situations that may lead to risk or substantial modification. It is a design obligation on the provider that makes the system auditable by construction.

Art. 19 EU AI Act: keeping the automatically generated logs

Reference

Art. 19 requires providers of high-risk AI systems to keep the logs that the system automatically generates (under Art. 12) for as long as they control them, for a period appropriate to the intended purpose and at least six months unless other law requires longer. It is the retention counterpart to the Art. 12 logging capability, and it works alongside the deployer retention duty in Art. 26.6.

Art. 26.1 EU AI Act: following provider instructions as a deployer

Reference

Art. 26.1 requires deployers to use high-risk AI systems strictly in accordance with the provider's instructions for use. This means using the system only for its intended purpose, within its specified technical configuration, and by qualified users, and documenting that compliance. Deviating from the instructions can shift liability entirely to the deployer.

Art. 26.6 EU AI Act: log retention and audit trail obligations

Reference

Art. 26.6 requires deployers of high-risk AI to retain the system-generated logs for at least six months, unless other law requires longer. The logs are the primary evidence that the system was used in accordance with its instructions.

More on Fairness

More on Privacy

Art. 26.4 EU AI Act: input data quality for deployers

Reference

Art. 26.4 requires deployers of high-risk AI to ensure that input data is relevant and sufficiently representative for the system's intended purpose. The deployer is responsible for data quality in operation, even though the provider sets the specifications under Art. 10.

Art. 26.9 EU AI Act: DPIA obligation for high-risk AI

Reference

Art. 26.9 links the EU AI Act to the GDPR: where a data protection impact assessment (DPIA) is required under GDPR Art. 35, deployers of high-risk AI must use the information from the provider's documentation to support that assessment.

Control-level compliance: the EU AI Act as an instrumented system

Analysis

Control-level compliance means satisfying the EU AI Act through engineered, evidenced controls rather than policy documents. The technical articles translate directly into system controls: immutable logs (Art. 12, 19), a kill switch (Art. 14(4)(e)), data masking before the model (Art. 10), configurable block policies (Art. 26), risk scoring and incident reporting within deadline (Art. 9, 73), and workspace isolation with role-based access (Art. 14, 26). Compliance at this level is an instrumented system, not a policy as PDF.

EU AI Act and GDPR: how do the two regulations relate?

Guide

The EU AI Act and the GDPR create overlapping but distinct obligations for AI systems that process personal data. They align on data quality, impact assessments, transparency, and individual rights, but differ in scope, accountability roles, and incident-reporting timelines, so the efficient approach is integrated compliance, such as a combined DPIA/FRIA.