Article 10: Data and Data Governance
Article 10 sets data governance rules for training, validation, and testing data used by high-risk AI systems. Providers must ensure datasets are relevant, sufficiently representative, free of errors to the best extent possible, and complete regarding the intended purpose—including appropriate statistical properties for protected groups where relevant.
Who does this apply to?
- -Providers of high-risk AI systems that rely on data-driven training or tuning
Scenarios
A hiring model is trained on five years of successful hires from one country only, then deployed EU-wide.
A provider cannot access sensitive labels needed for testing.
Core data principles
Training, validation, and testing data sets must follow Article 10(2):
- Relevant and representative of the intended purpose and geographical context
- As free of errors and complete as possible in light of the intended purpose
- Appropriate statistical properties for persons or groups—pay attention to outcomes for protected characteristics where relevant
- Examine for possible biases that could affect health, safety, fundamental rights, or lead to discrimination
You must be able to justify choices in technical documentation.
Special categories of data
Processing of special categories of personal data under GDPR Article 9 is tightly constrained. Article 10(5) allows narrow processing for bias detection and correction when other means are insufficient—subject to suitable safeguards for rights and freedoms, including technical limitations, pseudonymisation, and security. Coordinate with your DPO.
Record-keeping linkage
Data governance decisions should be traceable in logs and documentation required under Article 11 (technical documentation) and Article 12 (record-keeping).
How Article 10 connects to Articles 7–9 and Section 2
- Article 8 — Data rules are part of the Section 2 package Article 8 frames.
- Article 9 — Testing and metrics in Article 9 should be fed by defensible dataset choices under Article 10.
- Article 7 — If Annex III scope shifts, re-validate data representativeness for the new harm context.
- Article 6 + Annex III — Data governance should mirror the intended purpose and geography of the high-risk use case.
- Article 15 — Accuracy and robustness evidence ties back to dataset quality.
- Article 113 — Application dates for operational data governance.
Official wording: Article 10 — Data and data governance
The following reproduces Article 10 in full from the English consolidated text of Regulation (EU) 2024/1689.
1. High-risk AI systems which make use of techniques involving the training of AI models with data shall be developed on the basis of training, validation and testing data sets that meet the quality criteria referred to in paragraphs 2 to 5 whenever such data sets are used.
3. Training, validation and testing data sets shall be relevant, sufficiently representative, and to the best extent possible, free of errors and complete in view of the intended purpose. They shall have the appropriate statistical properties, including, where applicable, as regards the persons or groups of persons in relation to whom the high-risk AI system is intended to be used. Those characteristics of the data sets may be met at the level of individual data sets or at the level of a combination thereof.
2. Training, validation and testing data sets shall be subject to data governance and management practices appropriate for the intended purpose of the high-risk AI system. Those practices shall concern in particular:\n\n(a) the relevant design choices;\n\n(b) data collection processes and the origin of data, and in the case of personal data, the original purpose of the data collection;\n\n(c) relevant data-preparation processing operations, such as annotation, labelling, cleaning, updating, enrichment and aggregation;\n\n(d) the formulation of assumptions, in particular with respect to the information that the data are supposed to measure and represent;\n\n(e) an assessment of the availability, quantity and suitability of the data sets that are needed;\n\n(f) examination in view of possible biases that are likely to affect the health and safety of persons, have a negative impact on fundamental rights or lead to discrimination prohibited under Union law, especially where data outputs influence inputs for future operations;\n\n(g) appropriate measures to detect, prevent and mitigate possible biases identified according to point (f);\n\n(h) the identification of relevant data gaps or shortcomings that prevent compliance with this Regulation, and how those gaps and shortcomings can be addressed.\n\n4. Data sets shall take into account, to the extent required by the intended purpose, the characteristics or elements that are particular to the specific geographical, contextual, behavioural or functional setting within which the high-risk AI system is intended to be used.\n\n5. To the extent that it is strictly necessary for the purpose of ensuring bias detection and correction in relation to the high-risk AI systems in accordance with paragraph (2), points (f) and (g) of this Article, the providers of such systems may exceptionally process special categories of personal data, subject to appropriate safeguards for the fundamental rights and freedoms of natural persons. In addition to the provisions set out in Regulations (EU) 2016/679 and (EU) 2018/1725 and Directive (EU) 2016/680, all the following conditions must be met in order for such processing to occur:\n\n(a) the bias detection and correction cannot be effectively fulfilled by processing other data, including synthetic or anonymised data;\n\n(b) the special categories of personal data are subject to technical limitations on the re-use of the personal data, and state-of-the-art security and privacy-preserving measures, including pseudonymisation;\n\n(c) the special categories of personal data are subject to measures to ensure that the personal data processed are secured, protected, subject to suitable safeguards, including strict controls and documentation of the access, to avoid misuse and ensure that only authorised persons have access to those personal data with appropriate confidentiality obligations;\n\n(d) the special categories of personal data are not to be transmitted, transferred or otherwise accessed by other parties;\n\n(e) the special categories of personal data are deleted once the bias has been corrected or the personal data has reached the end of its retention period, whichever comes first;\n\n(f) the records of processing activities pursuant to Regulations (EU) 2016/679 and (EU) 2018/1725 and Directive (EU) 2016/680 include the reasons why the processing of special categories of personal data was strictly necessary to detect and correct biases, and why that objective could not be achieved by processing other data.\n\n6. For the development of high-risk AI systems not using techniques involving the training of AI models, paragraphs 2 to 5 apply only to the testing data sets.
Recitals (preamble) on EUR-Lex
The recitals in the same consolidated AI Act on EUR-Lex contextualise data quality, bias, and fundamental rights safeguards for high-risk AI. Use the official preamble on EUR-Lex—do not rely on unofficial recital lists without checking sequence and wording against the authentic text.
Compliance checklist
- Inventory all datasets (train/val/test), sources, licences, and retention periods.
- Document representativeness gaps and mitigations (reweighting, augmentation, collection fixes).
- Run bias and robustness evaluations aligned with the intended purpose and Annex III harms.
- Align Article 10(5) processing with GDPR Article 9 grounds and organisational policies.
- Version data snapshots used for each model release tied to conformity documentation.
See how your data story fits high-risk requirements—start the free assessment.
Start Free AssessmentRelated Articles
Article 6: Classification Rules for High-Risk Systems
Article 7: Amendments to Annex III
Article 8: Compliance with the requirements
Article 9: Risk Management System
Article 11: Technical Documentation
Article 12: Record-keeping
Article 13: Transparency and provision of information to deployers
Article 15: Accuracy, robustness and cybersecurity
Annex III: High-Risk AI System Areas
Annex IV: Technical Documentation for High-Risk AI Systems
Article 113: Entry into Force and Application Dates
Related annexes
- Annex IV — Technical documentation
Frequently asked questions
Do we need new consent for training data?
The AI Act does not replace GDPR. Lawful basis, transparency, and purpose limitation still come from GDPR; Article 10 tells you what quality and governance you must demonstrate for high-risk systems.
What about synthetic data?
Synthetic data can help with representativeness or privacy, but you must validate that it preserves realistic failure modes and does not hide biases introduced by the simulator.
Can providers use special category personal data for bias detection under Article 10?
Yes, under strict conditions. Article 10(5) permits processing of special categories of personal data (e.g. ethnicity, health) to the extent strictly necessary for bias monitoring and detection, subject to appropriate safeguards including pseudonymisation, technical limitations on re-use, and compliance with the GDPR and LED.