Chapter III, Section 2 — Requirements for high-risk AI systemsArticle 10

Article 10: Data and Data Governance

Applies from 2 Aug 20267 min readEUR-Lex verified Apr 2026

Article 10 sets data governance rules for training, validation, and testing data used by high-risk AI systems. Providers must ensure datasets are relevant, sufficiently representative, free of errors to the best extent possible, and complete regarding the intended purpose—including appropriate statistical properties for protected groups where relevant.

Start free assessment All articles

Who does this apply to?

-Providers of high-risk AI systems that rely on data-driven training or tuning

Scenarios

A hiring model is trained on five years of successful hires from one country only, then deployed EU-wide.

Representativeness and bias risks under Article 10; likely need rebalancing and fairness testing.

Ref. Art. 10(2)(f)

A provider cannot access sensitive labels needed for testing.

Article 10(5) allows exceptional use of sensitive data subject to strict conditions (including appropriate safeguards); document legal basis and DPIA alignment.

Ref. Art. 10(5)

Core data principles

Training, validation, and testing data sets must follow Article 10(2):

Relevant and representative of the intended purpose and geographical context
As free of errors and complete as possible in light of the intended purpose
Appropriate statistical properties for persons or groups—pay attention to outcomes for protected characteristics where relevant
Examine for possible biases that could affect health, safety, fundamental rights, or lead to discrimination

You must be able to justify choices in technical documentation.

Special categories of data

Processing of special categories of personal data under GDPR Article 9 is tightly constrained. Article 10(5) allows narrow processing for bias detection and correction when other means are insufficient—subject to suitable safeguards for rights and freedoms, including technical limitations, pseudonymisation, and security. Coordinate with your DPO.

Record-keeping linkage

Data governance decisions should be traceable in logs and documentation required under Article 11 (technical documentation) and Article 12 (record-keeping).

How Article 10 connects to Articles 7–9 and Section 2

Article 8 — Data rules are part of the Section 2 package Article 8 frames.
Article 9 — Testing and metrics in Article 9 should be fed by defensible dataset choices under Article 10.
Article 7 — If Annex III scope shifts, re-validate data representativeness for the new harm context.
Article 6 + Annex III — Data governance should mirror the intended purpose and geography of the high-risk use case.
Article 15 — Accuracy and robustness evidence ties back to dataset quality.
Article 113 — Application dates for operational data governance.

Official wording: Article 10 — Data and data governance

The following reproduces Article 10 in full from the English consolidated text of Regulation (EU) 2024/1689.

1. High-risk AI systems which make use of techniques involving the training of AI models with data shall be developed on the basis of training, validation and testing data sets that meet the quality criteria referred to in paragraphs 2 to 5 whenever such data sets are used.

3. Training, validation and testing data sets shall be relevant, sufficiently representative, and to the best extent possible, free of errors and complete in view of the intended purpose. They shall have the appropriate statistical properties, including, where applicable, as regards the persons or groups of persons in relation to whom the high-risk AI system is intended to be used. Those characteristics of the data sets may be met at the level of individual data sets or at the level of a combination thereof.

2. Training, validation and testing data sets shall be subject to data governance and management practices appropriate for the intended purpose of the high-risk AI system. Those practices shall concern in particular:\n\n(a) the relevant design choices;\n\n(b) data collection processes and the origin of data, and in the case of personal data, the original purpose of the data collection;\n\n(c) relevant data-preparation processing operations, such as annotation, labelling, cleaning, updating, enrichment and aggregation;\n\n(d) the formulation of assumptions, in particular with respect to the information that the data are supposed to measure and represent;\n\n(e) an assessment of the availability, quantity and suitability of the data sets that are needed;\n\n(f) examination in view of possible biases that are likely to affect the health and safety of persons, have a negative impact on fundamental rights or lead to discrimination prohibited under Union law, especially where data outputs influence inputs for future operations;\n\n(g) appropriate measures to detect, prevent and mitigate possible biases identified according to point (f);\n\n(h) the identification of relevant data gaps or shortcomings that prevent compliance with this Regulation, and how those gaps and shortcomings can be addressed.\n\n4. Data sets shall take into account, to the extent required by the intended purpose, the characteristics or elements that are particular to the specific geographical, contextual, behavioural or functional setting within which the high-risk AI system is intended to be used.\n\n5. To the extent that it is strictly necessary for the purpose of ensuring bias detection and correction in relation to the high-risk AI systems in accordance with paragraph (2), points (f) and (g) of this Article, the providers of such systems may exceptionally process special categories of personal data, subject to appropriate safeguards for the fundamental rights and freedoms of natural persons. In addition to the provisions set out in Regulations (EU) 2016/679 and (EU) 2018/1725 and Directive (EU) 2016/680, all the following conditions must be met in order for such processing to occur:\n\n(a) the bias detection and correction cannot be effectively fulfilled by processing other data, including synthetic or anonymised data;\n\n(b) the special categories of personal data are subject to technical limitations on the re-use of the personal data, and state-of-the-art security and privacy-preserving measures, including pseudonymisation;\n\n(c) the special categories of personal data are subject to measures to ensure that the personal data processed are secured, protected, subject to suitable safeguards, including strict controls and documentation of the access, to avoid misuse and ensure that only authorised persons have access to those personal data with appropriate confidentiality obligations;\n\n(d) the special categories of personal data are not to be transmitted, transferred or otherwise accessed by other parties;\n\n(e) the special categories of personal data are deleted once the bias has been corrected or the personal data has reached the end of its retention period, whichever comes first;\n\n(f) the records of processing activities pursuant to Regulations (EU) 2016/679 and (EU) 2018/1725 and Directive (EU) 2016/680 include the reasons why the processing of special categories of personal data was strictly necessary to detect and correct biases, and why that objective could not be achieved by processing other data.\n\n6. For the development of high-risk AI systems not using techniques involving the training of AI models, paragraphs 2 to 5 apply only to the testing data sets.

Recitals (preamble) on EUR-Lex

The recitals in the same consolidated AI Act on EUR-Lex contextualise data quality, bias, and fundamental rights safeguards for high-risk AI. Use the official preamble on EUR-Lex—do not rely on unofficial recital lists without checking sequence and wording against the authentic text.

Compliance checklist

Inventory all datasets (train/val/test), sources, licences, and retention periods.
Document representativeness gaps and mitigations (reweighting, augmentation, collection fixes).
Run bias and robustness evaluations aligned with the intended purpose and Annex III harms.
Align Article 10(5) processing with GDPR Article 9 grounds and organisational policies.
Version data snapshots used for each model release tied to conformity documentation.

Read the official text on EUR-Lex

See how your data story fits high-risk requirements—start the free assessment.

Start Free Assessment

Article 6: Classification Rules for High-Risk Systems

Article 7: Amendments to Annex III

Article 8: Compliance with the requirements

Article 9: Risk Management System

Article 11: Technical Documentation

Article 12: Record-keeping

Article 13: Transparency and provision of information to deployers

Article 15: Accuracy, robustness and cybersecurity

Annex III: High-Risk AI System Areas

Annex IV: Technical Documentation for High-Risk AI Systems

Article 113: Entry into Force and Application Dates

Related annexes

Annex IV — Technical documentation

Frequently asked questions

Do we need new consent for training data?

The AI Act does not replace GDPR. Lawful basis, transparency, and purpose limitation still come from GDPR; Article 10 tells you what quality and governance you must demonstrate for high-risk systems.

What about synthetic data?

Synthetic data can help with representativeness or privacy, but you must validate that it preserves realistic failure modes and does not hide biases introduced by the simulator.

Can providers use special category personal data for bias detection under Article 10?

Yes, under strict conditions. Article 10(5) permits processing of special categories of personal data (e.g. ethnicity, health) to the extent strictly necessary for bias monitoring and detection, subject to appropriate safeguards including pseudonymisation, technical limitations on re-use, and compliance with the GDPR and LED.

On this page

Key terms

Representative dataset: A dataset whose statistical distribution sufficiently mirrors the operational deployment context so performance claims are credible.
Bias: Systematic distortion that produces unfair outcomes for groups or individuals relative to the intended purpose.
Bias mitigation: Technical and organisational measures to detect and reduce systematic errors in training, validation, or test data that could lead to discriminatory AI outputs.

Maximum penalties (Art. 99)

3% of turnover / Up to €15 million or 3% of global annual turnover (undertakings) for many infringements under Article 99
SMEs / start-ups: Lower caps apply to SMEs and start-ups under Article 99(2)
Data-related failures can also trigger GDPR enforcement separately from the AI Act.

Timeline

2 Aug 2026
Typical application for high-risk data governance obligations (verify Article 113).

At a glance

Article: 10
Status: Upcoming
Timeline: 2 Aug 2026
Updated: 11 Apr 2026

Recommended next step

Run the short assessment to map risk class and obligations to your specific AI use case in minutes.

Start assessment now