Article 10, Data and data governance, Artificial Intelligence Act (Proposal 25.11.2022)
1. High-risk AI systems which make use of techniques involving the training of models with data shall be developed on the basis of training, validation and testing data sets that meet the quality criteria referred to in paragraphs 2 to 5.
2. Training, validation and testing data sets shall be subject to appropriate data governance and management practices.
Those practices shall concern in particular,
(a) the relevant design choices;
(b) data collection processes;
(c) relevant data preparation processing operations, such as annotation, labelling, cleaning, enrichment and aggregation;
(d) the formulation of relevant assumptions, notably with respect to the information that the data are supposed to measure and represent;
(e) a prior assessment of the availability, quantity and suitability of the data sets that are needed;
(f) examination in view of possible biases that are likely to affect health and safety of natural persons or lead to discrimination prohibited by Union law;
(g) the identification of any possible data gaps or shortcomings, and how those gaps and shortcomings can be addressed.
3. Training, validation and testing data sets shall be relevant, representative, and to the best extent possible, free of errors and complete. They shall have the appropriate statistical properties, including, where applicable, as regards the persons or groups of persons on which the high-risk AI system is intended to be used. These characteristics of the data sets may be met at the level of individual data sets or a combination thereof.
4. Training, validation and testing data sets shall take into account, to the extent required by the intended purpose, the characteristics or elements that are particular to the specific geographical, behavioural or functional setting within which the high-risk AI system is intended to be used.
5. To the extent that it is strictly necessary for the purposes of ensuring bias monitoring, detection and correction in relation to the high-risk AI systems, the providers of such systems may process special categories of personal data referred to in Article 9(1) of Regulation (EU) 2016/679, Article 10 of Directive (EU) 2016/680 and Article 10(1) of Regulation (EU) 2018/1725, subject to appropriate safeguards for the fundamental rights and freedoms of natural persons, including technical limitations on the re-use and use of state-of-the-art security and privacy-preserving measures, such as pseudonymisation, or encryption where anonymisation may significantly affect the purpose pursued.
6. For the development of high-risk AI systems not using techniques involving the training of models, paragraphs 2 to 5 shall apply only to the testing data sets.
Important note: This is not the final text of the Artificial Intelligence Act. This is the text of the proposal from the Council of the European Union (25.11.2022).
The Articles of the EU Artificial Intelligence Act, proposal from the Council of the European Union (25.11.2022):