The business wants AI agents. The use cases look compelling. The technology is evolving faster than most organisations can comfortably track. Then someone from QA or regulatory asks the question that matters most: 'Can we actually do this in a GxP environment?'
With EU GMP Annex 22, regulators have made the answer clearer than before. AI is allowed in regulated environments, but only under specific conditions. The challenge is understanding where AI fits, where it doesn’t, and what compliant design looks like in practice.
Not all AI belongs in the same category
One of the biggest mistakes in pharma AI discussions is treating AI as a single category. In reality, the level of control required depends entirely on intended use. A practical way to think about this is through three categories.
1. General AI agents
These operate outside GxP processes. They automate tasks, coordinate tools, and make autonomous decisions within defined boundaries.This is where AI currently has the most freedom and where organisations can create fast efficiency gains, especially in support functions that do not affect regulated records or decisions. From a compliance perspective, these use cases sit outside the scope of Annex 22’s strictest requirements.
2. GxP‑adjacent or GxP‑supporting AI
This is where constraints become much stricter. AI can assist, draft, summarize, or route work, but it cannot make autonomous GxP decisions. Human review at defined control points is mandatory. Outputs must remain traceable, predictable, and linked to approved data sources. Large language models can support these workflows, but not without safeguards around them.
3. Deterministic Automations
Deterministic automations are predefined sequences of steps and rules with no AI involved. When organisations map out AI use cases carefully, they often realise that some problems are better solved with automation rather than AI. In GxP contexts, that is not a compromise. Fully deterministic behaviour is fully predictable, and predictability is exactly what many regulated processes require.
This three‑tier model forces the right question to the front, before any technology is selected: what is the intended use, and how much control does the process genuinely require?
A real example: the Quality Impact Recall Agent
To make this concrete, let's walk through a use case we've actually built: a recall orchestration agent.
A recall workflow is a useful example because it is time-critical, high-risk, and heavily regulated.
The first steps, trigger detection and traceability analysis, should be deterministic automations. A failed quality result triggers the workflow, which traces affected batches, inventory, production runs, and customers. This process needs complete reliability, so traditional automation is the right choice.
AI becomes useful in the next stage: drafting communications. An LLM can prepare customer notifications based on ERP and traceability data, while mandatory fields such as batch numbers and affected products remain fixed. The draft then moves into a human review workspace before anything is sent.
AI can also help draft impact assessments or proposed actions based on policies and available data. This only works if outputs remain tied to approved source data and the model cannot invent information. Retrieval Augmented Generation (RAG) is important here because it grounds outputs in company-specific data rather than the model’s general training.
The most critical step is human approval. A qualified person reviews the proposal, makes the decision, and approves it in the validated system of record. The approval must be identity-linked, time-stamped, and documented with rationale.
Once approved, execution becomes deterministic again: notifications are sent, ERP or EQMS records are updated, actions are logged, and downstream workflows are triggered in a validated process.
Across the workflow, the pattern stays consistent: AI prepares. Humans decide. Systems execute.
That is the core design principle for compliant GxP AI workflows.
The reality of LLMs in GMP environments
One question comes up constantly: can LLMs be used in GxP processes at all? Yes, but within limits.
Annex 22 makes it clear that critical GMP decisions require deterministic and explainable control. LLMs are probabilistic systems, which means they should not act as autonomous decision-makers in critical workflows. They can still create value in preparation layers: drafting communications, summarizing records, supporting investigations, or proposing actions for human review.
RAG can reduce risk by grounding outputs in company policies and validated data sources, but it does not make an LLM deterministic. Explainability also remains a challenge. Regulators increasingly expect companies to explain how AI-generated outputs were produced, and current LLMs are still difficult to fully interpret. For now, the safest design keeps LLMs out of final decision-making and execution layers.
A practical way to assess where you stand -1.png?width=321&height=201&name=Ebook%20-%20GxP%20Ai%20Readiness%20Assessment%20(1)-1.png)
The GxP AI Readiness Checklist covers 50 questions across every domain that matters for regulated AI: intended use, validation documentation, human-in-the-loop controls, audit trails, model versioning, cybersecurity, and more. It's designed to be worked through before you build, not after, so that gaps show up in your planning rather than in an inspection.
If you’re mapping real AI use cases against these GxP boundaries, the next question becomes ownership: who defines intended use, who signs off, and who maintains control over time. That’s what we cover next. Read: Who Owns AI in Pharma? Why Governance Must Be Cross‑Functional.