Art. 26.4 EU AI Act: input data quality for deployers
Art. 26.4 requires deployers of high-risk AI to ensure that input data is relevant and sufficiently representative for the system's intended purpose. The deployer is responsible for data quality in operation, even though the provider sets the specifications under Art. 10.
Updated: June 2026
Introduction: data quality as compliance obligation
Art. 26.4 obliges deployers to "take appropriate technical and organisational measures to ensure that data used to operate and monitor high-risk AI systems is relevant, sufficiently complete, and fit for purpose, and has appropriate levels of accuracy, robustnessrobustnessA system's ability to perform reliably under realistic conditions including noise, edge cases and adversarial pressure — the engineering core of the safety-and-reliability principle.Open full entry →, and security."
This obligation acknowledges a fundamental truth about AI systems: garbage in, garbage out. An AI systemAI systemA machine-based system that, for explicit or implicit objectives, infers from input how to generate outputs — predictions, content, recommendations or decisions — that can influence physical or virtual environments. The OECD-style definition followed by the EU AI Act.Open full entry → operating on degraded, biased, or incomplete input data will produce degraded, biased, or incomplete outputs, regardless of how well the model was built. The deployerdeployerAn organization using an AI system under its own authority in its activities — carrying operator duties: use per instructions, oversight, input relevance, monitoring, notices.Open full entry → bears legal responsibility for ensuring the data environment the AI operates in meets quality standards.
What data does Art. 26.4 cover?
Art. 26.4 applies to all data that enters the AI system during deployment:
- Operational input data: The data the system processes to produce outputs (e.g. applicant CVs in a screening system, transaction data in a fraud detection system)
- Reference data: Baseline data the system compares against (e.g. historical customer profiles)
- Feedback data: Data fed back to the system to improve or adjust performance during deployment
- Monitoring data: Data used to assess system performance
Training data quality is primarily a providerproviderThe actor who develops an AI system (or has it developed) and places it on the market or into service under its own name — carrying manufacturer-style duties: design controls, documentation, conformity.Open full entry → obligation under Art. 10. However, if a deployer fine-tunes or retrains a model, they assume provider-like obligations for that data.
The four quality dimensions
1. relevance
Input data must be relevant to the decision the AI system is making. Using demographic proxies that have no causal relationship to the predicted outcome is a relevance failure, and may also implicate GDPR data minimisationdata minimisationProcessing only data that is adequate, relevant and necessary — in ML, implemented through pseudonymisation, feature selection, synthetic data and privacy-enhancing techniques.Open full entry → requirements.
2. completeness
Data must be sufficiently complete for the system to make valid predictions. Many AI systems perform significantly worse on incomplete data, but do not always signal this clearly. Deployers must understand their system's minimum data requirements (documented in the provider's instructions) and have processes to handle incomplete data submissions.
3. accuracy and robustness
Data must accurately represent the real-world situation. Stale data, data entry errors, and data transformation errors all degrade accuracy. For high-risk AI, deployers should have input validation processes that catch common data quality errors before they reach the model.
4. security
Input data must be protected against unauthorised access and manipulation. For high-risk AI systems, data poisoningdata poisoningAn attack that corrupts training data so the model learns attacker-chosen behaviour; a core adversarial-ML threat to the data pipeline.Open full entry →, the deliberate injection of manipulated data to influence AI outputs, is a significant security threat that deployers must address in their information security framework.
Practical measures
- Automated data validation rules that flag incomplete or out-of-range inputs
- Regular data quality audits comparing input distributions against expected baselines
- Data quality SLAs in supplier contracts for data that comes from external sources
- Drift detection: monitoring whether input data distributions change over time in ways that may degrade model performance
- Documentation of data quality incidents and remediation actions
Compliance checklist
- Have you documented the minimum data quality requirements for each high-risk AI system (from provider instructions)?
- Is there an automated validation process for input data before it reaches the AI system?
- Is there a process for handling incomplete or low-quality input data?
- Are data quality incidents logged and tracked?
- Is input data security addressed in your information security framework?
- Do you monitor for data drift that could affect AI system performance?