All articles
GDPR Compliance for AI Companies: Getting Started
GDPR

GDPR Compliance for AI Companies: Getting Started

A practical GDPR compliance guide for AI companies — covering lawful bases, training data, automated decisions, DPIAs, and AI Act alignment.

Legalithm Team18 min read
Share

Getting Started with GDPR Compliance for AI: What Every AI Company Must Know

The General Data Protection Regulation (GDPR) has been enforceable since May 2018, but AI-specific enforcement is accelerating. In 2025 alone, EU data protection authorities issued over EUR 1.5 billion in fines, with several major actions targeting algorithmic decision-making, facial recognition, and AI training data practices.

If you build, deploy, or integrate AI systems that process personal data of EU residents, the GDPR applies to you — and with the EU AI Act now layering additional obligations on top, getting GDPR compliance right for AI is no longer optional. It is foundational.

This guide translates the GDPR's core requirements into practical terms for AI teams: data scientists, product managers, MLOps engineers, and the compliance professionals supporting them.

TL;DR — GDPR compliance essentials for AI companies

  • Every AI system processing personal data needs a lawful basis (Article 6) — consent, legitimate interest, or another of the six grounds.
  • AI training on personal data requires a GDPR-compliant approach: lawful collection, purpose limitation, data minimisation, and storage limitation all apply.
  • Automated decision-making with legal or significant effects triggers Article 22 — individuals have the right not to be subject to solely automated decisions and the right to an explanation.
  • A Data Protection Impact Assessment (DPIA) is mandatory for high-risk processing, which includes most AI systems that profile individuals or make automated decisions.
  • Special category data (race, health, religion, sexual orientation) requires explicit consent or another Article 9 exception before it can be processed — even for bias detection.
  • The GDPR and the AI Act are complementary. Organisations must comply with both where they overlap. A DPIA does not replace an AI Act risk management system.
  • Appointing a Data Protection Officer (DPO) is mandatory for organisations whose core activities involve large-scale systematic monitoring of individuals or processing of special category data.

GDPR fundamentals for AI teams

The six lawful bases (Article 6)

Every act of personal data processing — including collection, storage, use for model training, inference, and output generation — requires one of six lawful bases:

Lawful basisWhen it applies to AIPractical considerations
Consent (Art. 6(1)(a))User explicitly agrees to their data being used for a specified AI purposeMust be freely given, specific, informed, unambiguous. Difficult to rely on for model training at scale because withdrawal rights create operational challenges.
Contract (Art. 6(1)(b))Processing is necessary to perform a contract with the data subjectApplies to AI features integral to a contracted service (e.g., personalised recommendations in a subscription product). Cannot stretch to cover all AI processing loosely connected to the service.
Legal obligation (Art. 6(1)(c))Processing required by lawRelevant for AI systems performing regulatory screening, AML checks, or statutory reporting.
Vital interests (Art. 6(1)(d))Protecting someone's lifeNarrow. Potentially relevant for emergency medical AI.
Public interest (Art. 6(1)(e))Processing for a task carried out in the public interestAvailable to public authorities and entities performing delegated public functions.
Legitimate interest (Art. 6(1)(f))Processing necessary for a legitimate interest, balanced against data subjects' rightsThe most commonly invoked basis for AI. Requires a documented Legitimate Interest Assessment (LIA). Not available to public authorities for core tasks.

Real-world example: A cybersecurity startup trains an anomaly-detection model on network log data that contains employee IP addresses and behavioural patterns. The company relies on legitimate interest (Art. 6(1)(f)), documenting in a Legitimate Interest Assessment that the security purpose outweighs the minimal privacy impact, given that the data is pseudonymised before model training and employees are informed through the privacy policy.

Data subject rights relevant to AI

The GDPR grants eight rights to individuals. Several are particularly challenging for AI systems:

  1. Right to be informed (Arts. 13–14) — You must explain that AI is being used, what data feeds it, and why. See the privacy policy checklist for AI companies.
  2. Right of access (Art. 15) — Individuals can request a copy of their personal data, including data derived by AI (e.g., inferred scores, classifications).
  3. Right to rectification (Art. 16) — If an AI system produces an incorrect output based on faulty personal data, the individual can demand correction.
  4. Right to erasure (Art. 17) — The "right to be forgotten" can be technically complex for AI: removing data from a trained model may require retraining.
  5. Right to restrict processing (Art. 18) — Individuals can halt AI processing of their data while a dispute is resolved.
  6. Right to data portability (Art. 20) — Data subjects can request their data in a structured, machine-readable format.
  7. Right to object (Art. 21) — Individuals can object to processing based on legitimate interest, including profiling for direct marketing.
  8. Rights related to automated decision-making (Art. 22) — See the dedicated section below.

DPIA requirements for AI

A Data Protection Impact Assessment is mandatory under Article 35 when processing is "likely to result in a high risk to the rights and freedoms of natural persons." This explicitly includes:

  • Systematic and extensive profiling with significant effects.
  • Large-scale processing of special category data.
  • Systematic monitoring of publicly accessible areas.

In practice, most AI systems that make decisions about individuals or profile them at scale will require a DPIA. National data protection authorities have published their own lists of processing operations requiring DPIAs, and AI-based profiling appears on nearly all of them.

A DPIA must include:

  1. A systematic description of the processing, its purposes, and the legitimate interest pursued.
  2. An assessment of the necessity and proportionality of the processing.
  3. An assessment of the risks to the rights and freedoms of data subjects.
  4. The measures to address those risks, including safeguards, security, and mechanisms to ensure data protection.

Real-world example: A health-tech company uses a machine-learning model to triage patient messages and assign urgency scores. The processing involves health data (special category) at scale and produces automated classifications with significant effects on individuals. A DPIA is mandatory and must address risks such as misclassification, bias against certain demographics, and the availability of human review.

Is your AI system high-risk?

Find out in 2 minutes — free, no signup required.

Take the free assessment

How the GDPR applies to AI training data

AI model training raises specific GDPR challenges that traditional data processing does not:

Purpose limitation

Training data collected for one purpose cannot be reused for an incompatible purpose without a new lawful basis. A dataset collected for product analytics cannot automatically be repurposed to train a recommendation engine.

The GDPR's "compatibility test" (Article 6(4)) considers: the link between purposes, the context of collection, the nature of the data, possible consequences, and the existence of safeguards such as pseudonymisation. Documenting this assessment is essential.

Data minimisation and AI

Article 5(1)(c) requires that data be "adequate, relevant and limited to what is necessary." This creates a genuine tension with machine learning, where model performance often improves with more data.

Practical approaches to reconciling data minimisation with ML:

  • Feature selection — Use statistical methods to identify which features actually contribute to model performance, and drop those that do not.
  • Synthetic data — Generate synthetic datasets that preserve statistical properties without containing real personal data.
  • Federated learning — Train models across decentralised data sources without centralising raw personal data.
  • Differential privacy — Add mathematical noise to training data so individual records cannot be reverse-engineered from the model.
  • Aggregation and pseudonymisation — Aggregate data before training where feasible, or pseudonymise so identifiers are separated from features.

Storage limitation

Training data cannot be retained indefinitely. Define and document a retention period. If the model has been trained and the raw training data is no longer needed, delete it — or anonymise it beyond the point of re-identification.

Accuracy

Training data must be accurate. Inaccurate training data can lead to biased or incorrect model outputs, which in turn can produce decisions that harm individuals. Implement data quality checks, deduplication, and validation pipelines.

Real-world example: An e-commerce company scrapes publicly available product reviews to train a sentiment analysis model. The reviews contain usernames that can be linked to real identities. The company must assess whether it has a lawful basis for this processing, whether the individuals were informed (Art. 14 applies when data is not collected directly from the data subject), and whether the processing is proportionate to the purpose.

Automated decision-making under Article 22

Article 22 provides individuals the right not to be subject to a decision based solely on automated processing — including profiling — that produces legal effects or similarly significant effects.

When Article 22 applies

Three conditions must all be met:

  1. The decision is based solely on automated processing (no meaningful human involvement).
  2. The processing includes profiling or another form of automated analysis.
  3. The decision produces legal effects (e.g., contract termination, benefits denial) or similarly significant effects (e.g., credit refusal, job application rejection).

Exceptions

Automated decisions are permitted if:

  • The decision is necessary for a contract between the data subject and the controller.
  • The decision is authorised by EU or Member State law with suitable safeguards.
  • The data subject has given explicit consent.

Even where an exception applies, the controller must implement suitable safeguards including the right to obtain human intervention, to express a point of view, and to contest the decision.

The right to explanation

While Article 22 does not explicitly use the phrase "right to explanation," Recitals 63 and 71 and the Article 29 Working Party guidance establish that data subjects have the right to "meaningful information about the logic involved" in automated decisions. This means:

  • Explain the key factors the model considers.
  • Explain the general logic (not necessarily the full source code or model architecture).
  • Explain the significance and envisaged consequences of the processing.

For AI teams, this means building explainability into the system from the design phase — whether through interpretable model architectures, SHAP values, LIME explanations, or decision-factor summaries.

Real-world example: An insurance company uses a gradient-boosted model to set premium prices. A customer whose premium increased significantly requests an explanation. Under Article 22 and Recital 71, the company must be able to explain which factors (e.g., claim history, postcode, vehicle type) most influenced the pricing decision — not simply state that "the algorithm decided."

This is one of the most debated questions in AI data protection. Each basis has trade-offs:

FactorConsentLegitimate interest
ControlData subject has full control, including withdrawalController controls, subject to balancing test
ScalabilityDifficult at training-data scale (millions of records)More practical for large datasets, but requires documented LIA
Withdrawal riskWithdrawal may require model retraining or deletion of derived dataNo withdrawal right, but data subjects can object under Art. 21
Transparency burdenMust explain purpose at time of consent collectionMust explain in privacy notice; LIA must be available on request
Regulatory perceptionSeen as gold standard by DPAsAccepted if balancing test is thorough and documented

Practical recommendation: For AI training data sourced directly from users (e.g., platform interaction data), legitimate interest is often more appropriate than consent if the use is within reasonable user expectations and a thorough LIA is documented. For sensitive data or unexpected secondary uses, consent provides stronger legal footing.

Bias detection and special category data

AI fairness and bias detection create a GDPR paradox: to detect and mitigate bias in model outputs across protected groups, you often need to process special category data (race, ethnicity, health status, religion, sexual orientation, political opinions) — which is prohibited under Article 9(1) unless an exception applies.

Available exceptions for bias detection include:

  • Explicit consent (Art. 9(2)(a)) — Individuals explicitly agree to their sensitive data being used for fairness analysis.
  • Substantial public interest (Art. 9(2)(g)) — Some Member States have enacted laws recognising anti-discrimination monitoring as a substantial public interest. Check your national implementation.
  • Statistical and research purposes (Art. 9(2)(j)) — Processing for statistical purposes with appropriate safeguards, including pseudonymisation and data minimisation.

Best practices:

  1. Use aggregated, anonymised demographic data for bias auditing where possible.
  2. If individual-level sensitive data is needed, pseudonymise it and restrict access to the bias-testing team.
  3. Document the necessity and proportionality of processing sensitive data for bias detection in your DPIA.
  4. Delete or anonymise the sensitive data once the bias audit is complete.

Cross-border transfers of training data

AI training data often originates in multiple jurisdictions. The GDPR restricts transfers of personal data outside the EU/EEA (Chapter V):

Transfer mechanisms

MechanismUse case
Adequacy decisionTransfer to a country the Commission has deemed adequate (e.g., UK, Japan, Republic of Korea, under the EU-US Data Privacy Framework for certified US organisations)
Standard Contractual Clauses (SCCs)Transfer to countries without adequacy, using EU-approved contractual templates
Binding Corporate Rules (BCRs)Intra-group transfers within a multinational, approved by a supervisory authority
Derogations (Art. 49)Explicit consent, contractual necessity, or important public interest — limited, case-by-case

Real-world example: A Berlin-based AI company sends annotated training data to a labelling team in the Philippines. No adequacy decision covers the Philippines, so the company must execute SCCs with the data processor and conduct a Transfer Impact Assessment to evaluate whether Philippine law provides adequate protection in practice.

Cloud and infrastructure considerations

AI workloads commonly run on cloud infrastructure operated by US hyperscalers (AWS, Google Cloud, Azure). Ensure:

  • The cloud provider has EU data residency options and data is processed within the EEA.
  • If data leaves the EEA, appropriate transfer mechanisms (typically SCCs) are in place.
  • The provider's data processing agreement (DPA) meets GDPR Article 28 requirements.

DPO requirements for AI companies

A Data Protection Officer must be appointed if:

  1. The organisation is a public authority (except courts acting in judicial capacity).
  2. Core activities consist of processing operations requiring regular and systematic monitoring of individuals on a large scale.
  3. Core activities consist of large-scale processing of special category data.

Most AI companies whose core product involves profiling, behavioural analytics, biometric processing, or health data will meet one of these thresholds. Even if not strictly required, appointing a DPO is considered best practice and signals regulatory maturity.

The DPO must be independent, report directly to the highest management level, and cannot be dismissed or penalised for performing their tasks. They can be an internal employee or an external service provider.

GDPR compliance checklist for AI companies

Use this checklist to assess your current state:

Data inventory and mapping

  • All personal data processed by AI systems is documented in a Record of Processing Activities (ROPA) under Article 30.
  • Data flows are mapped end-to-end: collection, storage, processing (including model training), output, sharing, and deletion.
  • Third-party data sources (APIs, scraped data, purchased datasets) are identified and their lawful basis documented.

Lawful basis

  • A lawful basis is identified and documented for every processing activity involving personal data.
  • Where legitimate interest is relied upon, a Legitimate Interest Assessment is completed and available.
  • Consent mechanisms meet GDPR standards (freely given, specific, informed, unambiguous, withdrawable).

Transparency

  • Privacy policy covers AI-specific processing, including automated decision-making and profiling. See the privacy policy checklist.
  • Individuals are informed about AI processing at or before the point of data collection.
  • Where data was not obtained from the data subject (Art. 14), notice is provided within one month.

Data subject rights

  • Processes are in place to respond to access, rectification, erasure, portability, restriction, and objection requests within one month.
  • There is a process to provide meaningful information about automated decisions under Article 22.
  • Human review is available for automated decisions with legal or significant effects.

DPIAs and risk management

  • DPIAs are completed for all high-risk AI processing activities.
  • DPIAs are reviewed and updated when processing changes materially.
  • DPIA outputs are integrated with AI Act risk management where applicable.

Data quality and minimisation

  • Training data is reviewed for accuracy, relevance, and representativeness.
  • Features that do not contribute to model performance are removed.
  • Retention periods are defined and enforced for training data, inference logs, and model outputs.

Security

  • Appropriate technical and organisational measures are implemented (Art. 32): encryption, pseudonymisation, access controls, incident response.
  • Model security addresses risks of data extraction, model inversion, and adversarial attacks.
  • Data breach notification procedures are in place (72-hour reporting to supervisory authority under Art. 33).

Bias and fairness

  • Bias detection and mitigation processes are in place, with documented lawful basis for any special category data processing.
  • Model outputs are monitored for disparate impact across protected groups.

International transfers

  • All cross-border data transfers are covered by an appropriate mechanism (adequacy, SCCs, BCRs).
  • Transfer Impact Assessments are documented for transfers to non-adequate countries.
  • Cloud and infrastructure providers have GDPR-compliant DPAs and EU data-residency options.

Governance

  • A DPO is appointed if required (or voluntarily as best practice).
  • Staff handling personal data receive regular GDPR and AI-specific training.
  • Data processing agreements are in place with all processors (Art. 28).

Coordinating GDPR and AI Act compliance

The GDPR and the EU AI Act are complementary regulations with overlapping requirements. Coordinating them avoids duplication and gaps:

GDPR requirementAI Act equivalentIntegration opportunity
DPIA (Art. 35)Risk management system (Art. 9) + FRIA (Art. 27)Conduct a joint risk assessment covering data protection, fundamental rights, and AI-specific risks
Data quality (Art. 5(1)(d))Data governance (Art. 10)Align data quality frameworks for both regulations
Transparency (Arts. 13–14)Transparency obligations (Art. 50)Single privacy notice covering both GDPR and AI Act disclosures
Automated decision-making (Art. 22)Human oversight (Art. 14)Unified human-in-the-loop process that satisfies both
Security (Art. 32)Cybersecurity (Art. 15)Single security framework addressing both
Record-keeping (Art. 30)Logging (Art. 12)Integrated record-keeping system

For a detailed comparison, see EU AI Act vs. GDPR: differences and overlap.

Common mistakes in GDPR compliance for AI

  1. Treating model outputs as non-personal data. If an AI system generates a credit score, risk classification, or behavioural profile linked to an identifiable individual, that output is personal data under the GDPR.
  2. Relying on consent when legitimate interest is more appropriate (and vice versa). Consent gives users maximum control but is brittle at training-data scale. Legitimate interest is more practical but requires a thorough, documented balancing test.
  3. Ignoring Article 14 for scraped or third-party data. When personal data is not collected directly from the data subject, Article 14 requires providing a privacy notice within one month of obtaining the data — a requirement many AI companies miss.
  4. Conducting a DPIA as a one-time checkbox exercise. DPIAs must be reviewed and updated when processing changes. A DPIA done at model v1 may not cover the risks introduced in v3.
  5. Assuming pseudonymisation equals anonymisation. Pseudonymised data is still personal data under the GDPR. Only truly anonymised data (where re-identification is not reasonably possible) falls outside GDPR scope.
  6. Failing to plan for erasure requests. If a data subject requests deletion and their data was used to train a model, you need a strategy — whether that is retraining, machine unlearning, or documenting why erasure is technically disproportionate under the available exceptions.
  7. Not documenting the lawful basis before starting to process. The GDPR requires you to determine and document your lawful basis before processing begins. Retroactive justification is a compliance failure.

Frequently asked questions

Do I need GDPR compliance if my AI system does not process personal data?

If your AI system genuinely does not process any personal data — for example, a weather-prediction model trained exclusively on meteorological sensor data — the GDPR does not apply to that specific processing. However, be careful: data that appears non-personal may become personal in context (e.g., location data combined with timestamps can identify individuals).

Can I use publicly available data to train my AI model?

Data being publicly available does not exempt it from the GDPR. You still need a lawful basis, must provide a privacy notice (Art. 14), and must respect data subject rights. The fact that data was made public by the data subject may support a legitimate interest argument, but this must be assessed case by case.

How do I handle right-to-erasure requests for data already used in model training?

This is one of the hardest GDPR challenges for AI. Options include: retraining the model without the individual's data, applying machine-unlearning techniques, documenting that the data has been anonymised to the point where it is no longer personal data within the model weights, or invoking an exception where erasure would render the processing impossible or seriously impair the achievement of a research purpose (Art. 17(3)(d)).

Does GDPR Article 22 apply to AI-assisted (not fully automated) decisions?

Article 22 applies to decisions based solely on automated processing. If there is meaningful human involvement — where a human genuinely reviews the AI output and exercises independent judgment before the decision is made — Article 22 does not apply. However, rubber-stamping an AI recommendation does not constitute meaningful human involvement.

How should I coordinate my DPIA with the AI Act's risk management requirements?

Build a single integrated risk assessment that covers both GDPR DPIA requirements and AI Act Article 9 risk management. The DPIA focuses on data protection risks; the AI Act risk management system covers broader safety, fundamental rights, and technical risks. A joint process avoids duplication and ensures consistency. See the complete AI Act guide for the full regulatory picture.

Is a Legitimate Interest Assessment the same as a DPIA?

No. A Legitimate Interest Assessment (LIA) justifies your choice of lawful basis under Article 6(1)(f). A DPIA assesses the overall risk of the processing activity to data subjects' rights and freedoms. If you rely on legitimate interest for high-risk AI processing, you will likely need both a documented LIA and a DPIA.

Next steps

GDPR compliance is one half of the regulatory picture for AI companies. To understand how the AI Act adds to your obligations — and where the two frameworks overlap — run the free AI Act assessment and review the complete AI Act guide.

Legalithm is an AI-assisted compliance workflow tool — not legal advice. Final compliance decisions should be reviewed by qualified legal counsel.

GDPR
Compliance
Privacy
Data Protection
AI Act
Getting Started

Check your AI system's compliance

Free assessment — no signup required. Get your risk classification in minutes.

Run free assessment