Guest Column | October 3, 2025

Trust But Verify: Validating AI In Pharma's GxP World

By Sindhuri Korrapati, Intuitive Surgical

Artificial intelligence, AI, digital-GettyImages-2151606282

The pharmaceutical manufacturing industry is undergoing a significant transformation as advancements in digital technologies reshape traditional practices. The integration of artificial intelligence (AI) into good practice (GxP) environments is enabling more robust process monitoring, predictive quality control, and data-driven decision-making across the product life cycle. AI-driven models can analyze large, complex data sets in real time, allowing manufacturers to detect deviations earlier, optimize production parameters, and reduce batch failures while maintaining strict regulatory compliance. These innovations not only enhance operational efficiency but also strengthen patient safety by ensuring consistent product quality. As regulatory agencies increasingly recognize the role of advanced analytics, companies are focusing on building validated, transparent, and auditable AI systems that align with GxP principles, paving the way for smarter and more resilient manufacturing processes.

In the past, prior to the implementation of the computer software assurance (CSA) risk-based approach, traditional validation meant binders filled with test cases and signatures neatly inked in blue. Today, as artificial intelligence infiltrates almost every GxP corner from aseptic line inspections to clinical data classification, the binders have given way to dashboards, pipelines, and neural networks. What hasn’t changed is our duty to demonstrate that systems are fit for intended use, protect patients, and ensure data integrity. That is the essence of AI validation in regulated pharmaceutical industries.

Regulators worldwide now expect us to extend our validation lens beyond traditional software models into the world of models and data sets. The European Medicines Agency (EMA)’s reflection paper¹ stresses transparency, human oversight, and governance across the life cycle. The FDA’s draft framework on AI credibility² and its manufacturing discussion paper³ both underline data lineage, traceability, and ongoing performance monitoring. Together with GAMP 5 (Second Edition),⁴ ISPE’s AI Guide,⁵ Annex 11,⁶ Part 11,⁷ and ICH Q9(R1),⁸ we already have enough guidance to act responsibly.

What AI Validation Really Means

Validation in pharma always has been about showing through evidence that a system consistently meets requirements in its operational environment and is fit for its intended use. The shift with AI is not in principle but in practice. Risk is no longer only in software logic; it resides in data quality, bias, drift, and the opacity of models. With all these changes and enhancements, Annex 11⁶ and Part 11⁷ still apply, but now we must extend their controls into model training pipelines, cloud platforms, and retraining events.

A Practical Risk-Based Blueprint

Successfully deploying AI systems in GxP environments and in clinical operations depends on applying familiar GxP principles in new contexts. A structured approach can look like this:

Define Intended Use and Context of Use (COU). When integrating AI functions into GxP systems, it is essential to explicitly link the AI’s intended use to the type of risk it introduces, whether the risk directly impacts patient safety, product performance, or the integrity of underlying data. This linkage provides the foundation for determining the appropriate level of scrutiny and validation.

For example, AI functions that generate outputs directly used in patient diagnosis, prognosis, or therapeutic decision-making inherently carry a high patient risk. In such cases, validation must be as rigorous as that required for high-risk medical devices, including comprehensive clinical evaluation, stress testing across diverse patient populations, and ongoing performance monitoring.

Conversely, AI functions embedded in supporting workflows, such as optimizing inventory management, automating documentation, or enhancing user interfaces, primarily influence product efficiency or user experience rather than patient safety. These functions still warrant validation, but the scale and depth of testing can be proportionate to their potential impact.

By explicitly mapping AI function to the domain of risk, validation can be tiered and proportionate, balancing patient protection, regulatory compliance, and development efficiency. This risk-proportionate framework not only ensures safety and trust but also avoids overburdening low-risk applications with unnecessary validation requirements.

Treat Data as a Controlled Asset. Ensuring trustworthy AI in biomedical and healthcare applications requires treating data with the same rigor as any regulated product component. A core principle is that data quality directly determines model quality. Flawed, incomplete, or poorly managed data inevitably translates into unreliable or biased AI outputs.

To achieve this, the ALCOA+ principles Attributable, Legible, Contemporaneous, Original, Accurate, plus Complete, Consistent, Enduring, and Available must be systematically applied to every stage of the data life cycle. Each record should be attributable to its source, legible and interpretable by both humans and machines, and contemporaneous to the event it describes. Data integrity further depends on preserving original records while ensuring accuracy, completeness, and consistency over time. For AI specifically, "enduring" and "available" principles extend beyond static recordkeeping: they demand that training data sets, annotations, and preprocessing steps remain retrievable and auditable across the entire model life cycle.

Specify Model Design Inputs. For AI systems, it is critical to define performance targets, acceptable error thresholds, and potential failure modes early in development. Establishing these parameters at the outset ensures that the validation strategy is aligned with clinical relevance, regulatory expectations, and patient safety considerations.

Acceptable error levels require careful calibration. No model is error-free, but the tolerance for error depends on the clinical context. For instance, a model that triages radiology images for urgent findings may require near-zero false negatives, while modest false positives may be tolerable if they only increase workload without endangering patients. Explicitly defining these trade-offs at the design stage provides clarity for both developers and regulators and prevents retrospective rationalization of poor performance.

Maintain Training Discipline. To ensure the credibility and reproducibility of AI model evaluation, it is essential to lock data set splits, document training conditions, and protect test sets from contamination. Once training, validation, and test partitions are established, these splits must remain fixed and version-controlled to prevent inadvertent data leakage that could inflate performance estimates. Training conditions including preprocessing steps, feature engineering methods, and hardware/software environments should be meticulously documented, enabling others to replicate results or identify sources of variability. Equally critical is safeguarding the test set: it should serve as a true proxy for unseen data and must never be reused for iterative model tuning, as this risks overfitting to the benchmark rather than reflecting real-world generalizability. Together, these practices preserve the integrity of model evaluation and provide a defensible foundation for both scientific reporting and regulatory submission.

Apply Classic System Controls. Robust governance of AI systems in healthcare and life sciences requires that access management, audit trails, backup/restore procedures, and change control remain nonnegotiable elements of compliance. Access to data, models, and infrastructure must be role-based and tightly controlled to prevent unauthorized modification or misuse, thereby safeguarding both patient confidentiality and system integrity. Comprehensive audit trails should capture all interactions with data and models, including creation, modification, and deletion events, so that every action is fully attributable and can be reconstructed in the event of investigation. Reliable backup and restore mechanisms ensure continuity of operations, protecting against data loss from hardware failures, cyberattacks, or accidental corruption.

Monitor Performance Continuously. The integration of dashboards and automated alerts into AI oversight frameworks provides a structured mechanism for continuous monitoring and rapid response. When linked to established SOPs, these tools transform raw performance metrics into actionable signals that can be systematically addressed. Dashboards should track key indicators such as model accuracy, drift, data quality, and system uptime in near real time, enabling stakeholders to visualize deviations from expected behavior. Automated alerts, when properly calibrated, ensure that anomalies trigger predefined escalation pathways embedded within SOPs, minimizing reliance on ad hoc decision-making. Coupling these alerts with CAPA workflows closes the loop: deviations are not only corrected promptly but also analyzed for root causes, and preventive measures are implemented to reduce recurrence. This integration of monitoring technology with formal quality systems strengthens both regulatory compliance and scientific rigor, ensuring AI systems remain safe, reliable, and auditable throughout their life cycle.

Invest in People and Governance. The safe and effective deployment of AI depends not only on technical rigor but also on robust human and organizational frameworks. User training is essential to ensure that clinicians, researchers, and operational staff understand both the capabilities and the limitations of AI outputs, including appropriate interpretation of uncertainty and confidence measures. Equally important are override protocols, which must be clearly defined to preserve human authority in decision-making, particularly in high-stakes clinical contexts where erroneous AI recommendations could endanger patient safety. These protocols should specify when and how human operators may suspend, correct, or bypass AI-generated outputs, ensuring accountability and transparency. Finally, the establishment of AI governance boards provides a structured forum for oversight, bringing together multidisciplinary expertise from clinical practice, data science, ethics, quality assurance, and regulatory affairs. Such boards are tasked with reviewing performance, adjudicating risks, and guiding policy, thereby embedding AI systems within a culture of continuous improvement and ethical responsibility.

The Hard Truth: Challenges In AI Validation

Despite robust frameworks, applying them to AI introduces unique hurdles. These are the challenges repeatedly encountered on projects, with potential solutions:

1. Data Integrity and Bias

Challenge: AI models inherit flaws from their data. Nonrepresentative data sets can lead to hidden bias across lots, sites, or patient subgroups. Inconsistent curation practices create lineage gaps.
Solution: Enforce rigorous data set governance, and treat data sets like GxP-controlled configuration items. Document lineage, apply stratified sampling, and audit for representativeness. Involve statisticians early to design balanced data sets.

2. Model Drift

Challenge: Unlike static systems, AI models degrade when input data distributions shift. Drift may go unnoticed until performance collapses.
Solution: Deploy automated drift detection pipelines. Define clear retraining triggers (e.g., drop in recall by X%). Preauthorize retraining SOPs with acceptance gates and change control. Maintain locked models unless drift management is essential.

3. Supplier Transparency

Challenge: Many vendors sell AI as “proprietary,” with minimal disclosure. In GxP, insufficient visibility makes validation impossible.
Solution: Demand supplier assurance and validation-ready documentation. Establish clear supplier quality agreements for visibility into vendor documentation. If transparency is denied, relegate the model to non-GxP functions or switch providers. Use Annex 11⁶ supplier guidance as your leverage.

4. Cloud Complexity

Challenge: AI often runs on cloud platforms with distributed pipelines. Shared responsibility can blur boundaries of accountability.
Solution: Define clear roles in supplier agreements. Document data residency, access, and cybersecurity measures. Qualify vendors and keep independent evidence of control effectiveness.

5. Scaling Validation Effort

Challenge: Teams either over-engineer validation, causing paralysis, or under-engineer it, creating compliance risk.
Solution: Apply a proportionate, risk-based approach per ICH Q9(R1).⁸ Focus effort where patient/product impact is highest, while simplifying validation for low-risk, supportive AI functions.

6. Evolving Regulatory Expectations

Challenge: Guidance is emerging but still maturing. Companies fear validating under today’s standards only to be outdated tomorrow.
Solution: Anchor decisions to existing well-established principles (Annex 11,⁶ GAMP) and treat emerging guidances as directional. Document rationales and demonstrate continuous improvement. Regulators reward transparency and intent.

Documentation That Wins Audits

Inspectors respond well when you provide:

data inventories with lineage
model cards with COU, metrics, and limitations
validation reports linked to acceptance criteria
monitoring dashboards tied to SOPs
change control records for retraining events
classic system controls (access, audit trail, backup/restore evidence).

ISPE’s AI Guide⁵ provides practical examples you can adapt.

The Road Ahead

The direction is clear: AI in pharma is here to stay. But responsible adoption depends on building trust through structured, risk-based validation.

Establish an AI inventory of all models touching GxP decisions; label COU, risk, data owners, and monitoring status.
Close the data governance gap through ALCOA+ checks, lineage capture, and access control on training/validation data sets.
Set up a lightweight AI governance forum with QA and business owners; mandate COU/acceptance criteria before development starts.
Implement a retraining SOP with triggers, acceptance metrics, and release approvals.
Pilot an end-to-end machine learning operations (MLOps) pipeline (even for a low-risk use case) that auto-captures artifacts and audit trails.
Educate teams on the current regulatory direction and your regional data integrity rules.

Look for opportunities to improve in every challenge. If we apply critical thinking, embed risk-based controls, and document transparently, AI will not be a regulatory burden but a driver of both compliance and innovation.

In pharma, smart models and safe medicines must go hand in hand. AI validation is how we ensure that promise.

References

European Medicines Agency. Reflection paper on the use of Artificial Intelligence (AI) in the medicinal product life cycle. 2024/2025. EMA/CHMP/CVMP/83833/2023. European Medicines Agency (EMA)
U.S. FDA. Considerations for the Use of Artificial Intelligence to Support Regulatory Decision-Making for Drug and Biological Products (Draft Guidance). January 2025. U.S. Food and Drug Administration+1 Federal Register
U.S. FDA/CDER. Artificial Intelligence in Drug Manufacturing (Discussion Paper). March 2023. U.S. Food and Drug Administration
ISPE. GAMP 5 Guide: A Risk-Based Approach to Compliant GxP Computerized Systems (Second Edition). July 2022. ISPE
ISPE. GAMP Guide: Artificial Intelligence. 2025 (and earlier editions). ISPE
European Commission. EudraLex Volume 4, Annex 11: Computerised Systems. 2011 (and ongoing revision). Public Health
U.S. FDA. Part 11, Electronic Records; Electronic Signatures — Scope and Application (Guidance for Industry). 2003; web page updated 2018. U.S. Food and Drug Administration+1
ICH. Q9(R1) Quality Risk Management (Guidance for Industry). 2023/2025. U.S. Food and Drug Administration, European Medicines Agency (EMA)

About The Author:

Sindhuri Korrapati provides quality oversight for IT and digital applications at Intuitive Surgical and has over a decade of experience as a quality professional in the medical device and pharmaceutical industries. She is a proven leader in implementing computer software assurance (CSA) for IT GxP systems and has a strong knowledge in CAPA management, software validation, equipment qualification, and test method validation.