Moving Beyond Human Error In Biopharma Investigations And CAPA Programs
A conversation with John Wilkes (AstraZeneca), Clifford Berry (Takeda), Amy D. Wilson, Ph.D. (Biogen), and Jim Morris (NSF Health Sciences)
This article is the second part of a two-part roundtable Q&A on the topic of human performance in pharmaceutical operations. Part 1 evaluated the underpinnings of human performance and provided advice to those individuals managing rapid production scale-up to support COVID-19 production demand. Here in Part 2, we consider human performance in the context of investigation and CAPA programs. The subject matter experts participating in this Q&A are:
- John Wilkes is the human performance lead for biologics at AstraZeneca. He has more than 25 years of experience in industry, with experience in manufacturing operations, operational excellence, quality systems, and quality control.
- Clifford Berry is the head of business excellence for Takeda at its Massachusetts Biologics Operations site. He has been a human and organizational performance practitioner since 1999, with experiences in commercial nuclear electrical generation, electric transmission and distribution, and biopharma.
- Amy D. Wilson, Ph.D., is the global human performance lead for Biogen. She has more than 20 years of experience in biopharma manufacturing, with focuses on human and organizational performance, operational excellence, risk management, and technical training.
- Jim Morris, executive director at NSF Health Sciences, with over 30 years’ pharmaceutical operations experience in quality and manufacturing, is often leading consulting and training projects in investigation and CAPA management.
The participants provided responses to each question independently. As noted in Part 1, there is consistency in their advice, which suggests that a new mindset is required when conducting investigations and improving human performance.
In Part 1 it was stated that a new mindset is needed that characterizes people as a necessary driver of success rather than as a variable that can cause failure through error. What advice can you offer when investigating a deviation that involves human error?
Amy D. Wilson, Ph.D., Biogen: A great first step is to recognize that human error is always a symptom — a starting point — for an investigation and is never the answer or cause. When investigating a deviation that involves human error, I would advise first to ensure that you clear your mind of “if only they had…” and “they should have…” thinking. With your mind open, meet with the people directly involved in what happened and seek to understand how things unfolded. Ask questions like “Tell me about your experiences with this situation/process.”; “When things go well, what does this look like?”; and “What is unpredictable about this process?” You may be surprised at what you learn. In every event that appears to be a simple story of someone not doing something or forgetting something, there is rich context that tells us much more about how operations actually work. There are several cause analysis approaches that are helpful for human related deviations. My suggestions would be to apply event and causal factor charting and learning teams.
John Wilkes, AstraZeneca: Reframe the objective from investigating a failure to learning about work. Be less interested in the “error.” Become more interested in the context of work as done and the performance of the systems and processes in place to manage the risk of the error that was observed. Remember, context influences worker behavior while systems and processes determine operational outcomes. Rather than searching for a couple of “root causes” using simple problem-solving tools such as Five-Whys or speculative methods such as fishbone diagrams to explain the error, focus instead on building a description of how work is normally performed successfully. This is not a picture of “what happened” and “why it happened,” but rather a picture of “what was happening and how it was happening.” Use the understanding to identify opportunities to improve the quality of systems and processes used to perform work. Doing so improves the effectiveness of the management performance risk and the frequency of successful outcomes.
Clifford Berry, Takeda: The term “human error” is a label to describe the outcome of an action rather than a reason for the outcome. When a person uses the term human error to describe an undesired outcome it is sign that the person does not understand how work that goes well most of the time ended in failure this time. If an organization is landing on human error as a cause, including the institutionalized practice of assigning blame for operational failures to frontline staff through use of deviation management system cause codes, stop it or accept that the energy expended on operational improvement will be mostly wasted. Rather, consider failure as an occasional outcome of imperfect complex systems and focus on learning about the context of the work and the conditions in the workplace and organization that need to be adjusted in order to improve reliability and enhance operational resilience.
Jim Morris, NSF Health Sciences: Drill down on the underlying contributing factors for error. This discipline will help drive investigations to arrive at the actual root cause (s) of an error and therefore help identify CAPAs that will make a difference. Furthermore, recurring contributing factors that point to organizational weaknesses such as poor change management or poor training should be addressed. These are indicators of persistent systemic weaknesses the organization must confront to get ahead of avoidable recurring failures attributed to “human error.” Best advice: remove “human error” from your vocabulary and consider the “error” to be the starting point of your investigation.
Takeaways For Improvement:
- Keep in mind that with every event there is rich context that tells us much more about how operations actually work.
- Become less interested in the error and more interested in the systems and processes in place to manage the risk of error occurring and its impact.
- If an organization is landing on human error as a cause code — stop it!
- Reoccurring contributing factors point to organizational weaknesses to be addressed.
Why do you think CAPAs often fail to address the underlying root cause(s) when an investigation involves human error?
Wilson: In my opinion, there are two primary reasons why CAPAs often fail to address the underlying causes when investigations involve human error. The first is reason is that the “human error” investigation is incomplete and focuses too much on human behavior. When investigation conclusions stop with human behavior, actions identified will have limited impact. The second reason is that CAPAs to address underlying causes are often harder to do. Underlying causes exist at the organizational level and often require changes in business process, sometimes changes in electronic systems that support operations, and sometimes changes in work practices. These types of efforts typically take more resources and longer to implement.
Berry: Use of the term “root cause” by organizations is indicative of the mental model that puts them into the CAPA failure trap. When undesired outcomes occur, there are often multiple conditions that must be changed in order to improve future operational performance. The belief that there is a single magical cause that can be fixed by a single corrective action is largely a fallacy. It is a fallacy that is leaned on by organizations to close investigations quickly and apply simplistic fixes that target symptoms. The word “underlying” is also part of that trap, leading people to believe that this single magical cause is hidden and can be found using a reductionist approach by going “down and in,” when in reality the conditions that need improvement are in plain sight using a systems approach by going “up and out” and learning about the workplace local factors and system/organizational factors.
Morris: All too often, investigations fail to arrive at the underlying causes and, as a result, corrective and preventive actions fail to hit the mark. Therefore, improving root cause analysis and investigative technique is fundamentally important. Secondly, once the underlying causes are well understood, it is important to consider a principle such as the CAPA hierarchy to ensure that actions taken are sufficiently robust to prevent reoccurrence. The ideal CAPA will eliminate the failure mode entirely. For instance, this could include automating an activity or replacing the step with a step that is less prone to variation.
Wilkes: The simplest reason is these investigations often stop short — going no deeper than calling out attention deficits or failures to follow procedure as explanations for error. While compelling investigators to “go deeper” seems like a reasonable countermeasure, the effectiveness of this response will be limited until there is a fundamental shift in the mindset of the organization toward root cause, error, and event learning. There is no such thing as an underlying root cause of human error, and human performance is complex. When one considers the multitude of factors that create conditions for error and its consequences, it should be clear that speedy investigations limit learning opportunities and quick fixes will not be effective in changing or sustaining performance improvement.
Takeaways For Improvement:
- Underlying causes are often at the organizational level, which are more difficult to address.
- When undesired outcomes occur, there are multiple conditions that must be changed in order to improve operational performance.
- Select CAPAs that will eliminate the failure mode entirely (where possible).
- Speedy investigations limit learning opportunities.
Some pharma sites experience high numbers of deviations that require investigation. How is an approach that requires a deeper understanding of the context of work and multiple systemic contributors to failure achievable?
Berry: Pharma can learn something from the commercial nuclear electric generating industry. The commercial nuclear industry uses low-threshold/high-volume open reporting systems for problem identification and resolution. Single-unit nuclear reactor sites have seen typically between 2,000 and 4,000 open reports annually. These open reports include significant adverse events, adverse events, defects, near-misses, and risks. The industry uses a risk- and severity-based approach to determine how each open report will be dispositioned. Some open reports require a thorough and intensive investigation, some receive a less intensive investigation, and some will get corrections and/or corrective actions and no requirement to identify causes. The volume of deviations at typical pharma manufacturing sites is much lower than that of commercial nuclear, so while we do not face the same challenge, we can still learn from the nuclear industry. Some pharma companies have already learned and built deviation management systems that use a risk-based approach to determine the required level of investigation. In addition to developing a risk-based approach, pharma sites that have high numbers of deviations will benefit by developing a dedicated team of investigators who are provided training in both human performance and advanced investigation methods. The investigation model where floor and lab supervisors are assigned as investigators will not lead to the learning necessary to reduce the rate, severity, and recurrence of deviations. A more effective model is where dedicated, skilled investigators collaborate with floor and lab supervisors to perform deviation investigations.
Wilson: Having a high number of deviations that are challenging to manage is a great indicator that taking a systems approach is needed. To get started, there are three things that I would suggest. First, target specific deviations for a deeper analysis by applying a risk-based approach. Identify the deviations that elucidate bigger issues — for example, those that highlight the possibility of product loss, gaps in quality management systems, gaps in management of product that leaves your control, or gaps in critical controls. By applying a systems approach to those you will likely solve some problems that may be contributing to the higher number of deviations in general. Secondly, I would implement learning teams as a standard practice in response to deviations. Learning teams help you learn a lot quickly and will be a fast way to gather information about context and multiple systemic contributors while still satisfying the general compliance requirement to understand causes. Third, I would suggest paying a lot of attention to the CAPAs that are being generated from the totality of deviations. If you have a high number of deviations, you may also be generating a high number of CAPAs that will not actually prevent recurrence or address underlying problems. As you begin to apply a systems approach to higher-risk deviations, ensure that each CAPA you create is directly tied to an identified cause. This will ensure that the efforts you are expending to correct and prevent will also yield desired results.
Wilkes: In one word, discipline. In my opinion, this approach starts with how the quality system is defining deviations. Leveraging first principles of GMP, there must be good organizational rigor and discipline about what is considered as presenting risk to product quality and patient safety. This is the domain of deviations and the need for their investigation is without dispute. I like to say, “all deviations are problems, but not all problems are deviations.” This statement provides good delineation between the organization’s compliance obligations and where it has flexibility for reactive operational learning. There will always be constraints within the organization between production activities and learning. As a result, the organization must be further disciplined in deciding which problems present real value for learning over those that are more suitable as an exercise of occurrence documentation and correction. The criteria for what will determine that learning value is likely different between organizations. However, the following are some suggestions for criteria to distinguish problems with potential learning value: unique or novel problems, problems with significant failures in critical controls, problems with low consequence where critical controls were missing or poorly understood, and frequently occurring problems within a specific process or system.
Morris: I agree that an in-depth analysis described earlier is not needed with every investigation. In my view, it comes down to triaging investigation on the basis of risk. In other words, determining which investigations present the greatest risk to product quality and patient safety and devoting sufficient effort to arrive at root cause (s) and CAPAs that will prevent reoccurrence. Investigations considered to present less risk warrant attention but not the same in-depth understanding, until a pattern of recurrence is established. Once a pattern is observed, investigators must treat the event as a more significant issue and use rigorous root cause analysis to identify the contributing factors and determine which factors, if eliminated, replaced, or altered will reduce the probability of reoccurrence.
Takeaways For Improvement:
- Apply a risk-based approach to determine the required level of investigation.
- Develop a dedicated team of investigators who are provided training in both human performance and advanced investigation methods.
- Implement learning teams to gather information about context and multiple systems contributors.
- Determine which problems present real value for operational learning.
Robust investigation and root cause analysis should lead to improvements in human performance. What is the most common obstacle to achieving human performance improvement that you have observed?
Berry: There are two significant obstacles that organizations must monitor and work to improve: the inability to effectively apply systems thinking principles when learning about our imperfect complex systems and insufficient psychological safety that inhibits openness and trust in operational learning. Applying a mental model like the sharp end/blunt end complex system model, as presented in the book Behind Human Error, is essential to avoiding linear, simplistic thinking when trying to understand all the workplace local factors and system organizational factors that influence operational performance. Using such a mental model in investigations can be the difference between adding a note to an SOP that creates more clutter and improves nothing or redesigning the work so that the risk is entirely removed or substantially mitigated through use of human factors and ergonomics. To enhance psychological safety, leaders must ask how rather than why, frame the work as a learning problem rather than an execution problem, speak about the workplace in terms of volatility, uncertainty, complexity, and ambiguity (VUCA), acknowledge their own fallibility in conversations, and model curiosity. A workplace with high psychological safety enables people to speak up without fear and share bad news with the leaders.
Morris: Insufficient planning is major obstacle to human performance improvement inside companies. It is all too common to rush new activities, products, and test methods into an operation while failing to build in time for familiarization and training. As a result, we often rely on the heroic efforts of our employees to get work done despite the obstacles. Building training steps into our work plans needs to be part of the embedded project planning routine.
Wilkes: Adherence to a mindset that views workers and their errors as the source of problems within operations and a weak commitment to operational learning. The former drives the weakness in the latter. Rather than looking at workers as a source of variability (error) in operations that must be controlled, workers should be looked at as a source of needed adaptive capacity for successful operations. This places high importance on operational learning — the feedback received from workers. This operational learning may be reactive, as in the case of an investigation, or it may be proactive. Proactive operational learning practices include after-action reviews, post-job learning activity, open reporting practices, and work observations. Operations with robust learning practices will act on this information with urgency to improve not only the performance context of work but the defenses that mitigate the consequence of error. The best operations learn quickly, share broadly, and can affect this improvement before the next operation occurs.
Wilson: Human performance improvement often represents a new mental model for people in terms of human error, how we learn from what goes wrong, and how we support desired actions and outcomes. For that reason, the most common obstacle in achieving human performance improvement is leaders who fall back on what they are used to — retraining, making people aware, and discipline. The other common obstacle we encounter is related to time. Especially where there are schedules and deadlines to meet, there is often a feeling that there is not time to plan, not time to take a step back and try to anticipate and manage potential risk, and not time to learn. Building greater capacity to anticipate pays off in the long term, but this is often harder to measure since you don’t always know what you prevented.
Takeaways For Improvement:
- Organizations must learn to apply systems-based thinking in order to improve human and operational performance.
- Better planning to embed familiarization time and training into project planning.
- Renewed commitment to operational learning.
- Building greater capacity to anticipate and manage potential risk.
Conclusion
Human performance improvement requires special focus and deliberate action by management to move from an old construct of investigating error as a human failure to an new construct where error is a symptom of a problem to be understood, linked to the context in which the error took place, and it is the starting point of the investigation, not its conclusion. Systems-based thinking at an organizational level is required to achieve improvements in human performance. And, importantly, operational learning should take on greater meaning with companies and or unit operations seeking to improve not just human performance, but overall performance.