Art. 26.4 EU AI Act: Input Data Quality for Deployers
Updated: June 2026 — full revision to Validai quality standard
Introduction: Data Quality as Compliance Obligation
Art. 26.4 obliges deployers to "take appropriate technical and organisational measures to ensure that data used to operate and monitor high-risk AI systems is relevant, sufficiently complete, and fit for purpose, and has appropriate levels of accuracy, robustness, and security."
This obligation acknowledges a fundamental truth about AI systems: garbage in, garbage out. An AI system operating on degraded, biased, or incomplete input data will produce degraded, biased, or incomplete outputs — regardless of how well the model was built. The deployer bears legal responsibility for ensuring the data environment the AI operates in meets quality standards.
What Data Does Art. 26.4 Cover?
Art. 26.4 applies to all data that enters the AI system during deployment:
- Operational input data: The data the system processes to produce outputs (e.g. applicant CVs in a screening system, transaction data in a fraud detection system)
- Reference data: Baseline data the system compares against (e.g. historical customer profiles)
- Feedback data: Data fed back to the system to improve or adjust performance during deployment
- Monitoring data: Data used to assess system performance
Training data quality is primarily a provider obligation under Art. 10. However, if a deployer fine-tunes or retrains a model, they assume provider-like obligations for that data.
The Four Quality Dimensions
1. Relevance
Input data must be relevant to the decision the AI system is making. Using demographic proxies that have no causal relationship to the predicted outcome is a relevance failure — and may also implicate GDPR data minimisation requirements.
2. Completeness
Data must be sufficiently complete for the system to make valid predictions. Many AI systems perform significantly worse on incomplete data — but do not always signal this clearly. Deployers must understand their system's minimum data requirements (documented in the provider's instructions) and have processes to handle incomplete data submissions.
3. Accuracy and Robustness
Data must accurately represent the real-world situation. Stale data, data entry errors, and data transformation errors all degrade accuracy. For high-risk AI, deployers should have input validation processes that catch common data quality errors before they reach the model.
4. Security
Input data must be protected against unauthorised access and manipulation. For high-risk AI systems, data poisoning — the deliberate injection of manipulated data to influence AI outputs — is a significant security threat that deployers must address in their information security framework.
Practical Measures
- Automated data validation rules that flag incomplete or out-of-range inputs
- Regular data quality audits comparing input distributions against expected baselines
- Data quality SLAs in supplier contracts for data that comes from external sources
- Drift detection: monitoring whether input data distributions change over time in ways that may degrade model performance
- Documentation of data quality incidents and remediation actions
Compliance Checklist
- Have you documented the minimum data quality requirements for each high-risk AI system (from provider instructions)?
- Is there an automated validation process for input data before it reaches the AI system?
- Is there a process for handling incomplete or low-quality input data?
- Are data quality incidents logged and tracked?
- Is input data security addressed in your information security framework?
- Do you monitor for data drift that could affect AI system performance?