France's Data Protection Authority (the “Commission Nationale de l'Informatique et des Libertés” or “CNIL”) has issued comprehensive recommendations intended to assist businesses that are develop artificial intelligence ("AI") systems to ensure compliance with the EU's General Data Protection Regulation ("GDPR"). These recommendations are designed to help AI professionals and organizations balance innovation with the protection of individual rights, particularly when personal data is involved in AI model training and development.
Scope and Applicability
The CNIL’s guidance applies to all AI systems that process personal data, including machine learning models, general-purpose AI, and systems that learn from usage data. The recommendations focus on the development phase—covering system design, database creation, and model training—rather than deployment and as a result should be used as a preemptive measure before training or actual use of an AI product.
Key Steps for GDPR Compliance in AI Development
The recommendations are split into a number of different elements, which overall lay out a pathway towards collecting data to be fed to an AI model, as well as the basis for using that data otherwise in connection with an AI and related protections thereof. Each of the below elements are provided with a fact sheet which CNIL has published to provide further information and guidance for each aspect.
- Define a Clear Purpose: Every AI system using personal data must have a well-defined, explicit, and legitimate objective. This purpose should be established at the project’s outset and guide the selection and use of data, ensuring only necessary data is processed.
- Determine Responsibilities: Organizations must clarify their role as either data controllers (deciding the “why” and “how” of data use) or data processors (acting on behalf of a controller). Joint controllers must define their respective obligations, often through contractual agreements.
- Establish a Legal Basis: Processing personal data for AI development requires a valid legal basis under the GDPR. Common bases include legitimate interest (for private entities), public interest (for public bodies), consent, or contractual necessity. The chosen basis affects both obligations and the rights of data subjects.
- Data Minimization: Only data that is adequate, relevant, and necessary for the defined purpose should be collected and used. This principle is especially critical for sensitive data. Organizations are encouraged to use synthetic or anonymized data where possible and to regularly review and update data sets to maintain relevance and accuracy.
- Retention and Deletion: Personal data must not be stored indefinitely. Retention periods should be defined based on the purpose of processing, and data should be deleted or archived when no longer needed. Extended retention may be justified for audit or bias measurement purposes, but must be accompanied by enhanced security.
- Transparency and Information: Individuals must be informed about how their data is used, the purpose of processing, and their rights. Information should be accessible, clear, and provided in a timely manner. For large-scale or indirect data collection (e.g., web scraping), general information notices may suffice if individual notification is disproportionate.
- Facilitating Data Subject Rights: Mechanisms must be in place to allow individuals to exercise their rights (access, rectification, erasure, objection, etc.) over both the training data and the AI model, unless the model is truly anonymized. Organizations should anticipate technical challenges in identifying data subjects and provide clear communication about the feasibility of rights exercise.
- Security Measures: Robust security controls are required to protect personal data throughout the AI system’s lifecycle. This includes access controls, encryption, data partitioning, and regular security audits. Special attention should be paid to vulnerabilities in software, interfaces, and backups.
- Risk Assessment and DPIA: CNIL strongly recommends that businesses perform a Data Protection Impact Assessment ("DPIA"), especially for high-risk AI systems. The DPIA should identify and mitigate risks related to data confidentiality, misuse, discrimination, and other ethical concerns.
Conclusion
The CNIL’s recommendations provide a structured approach for integrating GDPR principles into AI development and goes towards the ongoing interactions between the data privacy concerns of the GDPR and the business development goals set forth in the EU AI Act. By defining clear purposes, minimizing data use, ensuring transparency, and implementing strong security and governance measures, organizations can foster responsible AI innovation while safeguarding individual rights.