Article | August 12, 2016

How Big Data Is Transforming Pharmaceutical Manufacturing

Source: Pharmaceutical Online
How Big Data Is Transforming Pharmaceutical Manufacturing

By Jack Schmidt, SAP Life Science Industry Director

By using in-memory computing to leverage rich datasets, drug manufacturers can optimize end-to-end quality and increase development efficiency.

The healthcare industry is exploding with data. Pharmaceutical manufacturers, research centers, chemists, molecular and cell biologists, and biotech firms are all collecting more drug-related data than ever before. Computers, drug manufacturing systems, and automated Internet of Things (IoT) sensors on the factory floor are also contributing, as they are generating massive amounts of digital, machine-produced data every second. In fact, more data has been created since 2003 than in all of previous recorded history.1

For pharmaceutical companies, their R&D and quality control teams are tasked with examining data at every stage of the manufacturing process, from the time raw materials arrive to the time the product is packaged and sent to distribution. That means that during the lifecycle of a drug, the manufacturer ends up collecting and storing petabytes of data.

With the power of in-memory computing technology and interconnected and automated systems, manufacturers now have the ability to analyze these massive amounts of quality, environmental, and IoT-generated factory data. Tapping into this Big Data allows companies to build end-to-end process controls, resulting in higher-quality products, more predictability, more efficient manufacturing, and faster time to market.

Reducing Drug Development Costs

One of the factors driving smart data usage among pharmaceuticals is the hope that by tapping into data analytics, companies can reduce the exorbitant cost of drug development. An analysis conducted at Forbes revealed a set of grim numbers on current drug development costs, stating that a company working on a single drug can expect to spend $350 million to get that product to market. And partly because so many drugs fail to ever reach the market, a large pharmaceutical company working on multiple drug candidates simultaneously can spend $5 billion for each new successful medicine.2

To understand how to reach the payoffs in development, let’s look at where we currently stand with harnessing drug manufacturing data. Where is the useful data stored? The short answer is everywhere.

The lifecycle of data collected in biopharmaceutical production is enormous and often collected in disparate, siloed systems. In some cases, pieces of critical data are still gathered and stored in paper-based logs, such as information recorded in lab notebooks at an R&D facility or batch records maintained by manufacturing teams.

However, those processes are changing as manufacturers move toward more efficient digital data collection and in-memory computing platforms. As the FDA continues to emphasize the importance of manufacturers undertaking continued process verification (CPV) as an integral part of the process validation lifecycle, manufacturers’ data collection is critical. CPV provides the manufacturer with assurance that a validated process remains in a validated state during the routine manufacturing phase of the product lifecycle. CPV includes preparation of a written plan for monitoring the manufacturing process, documentation of the data collected, analysis of the data, and the actions taken based on the results of monitoring the process.

Data Deluge From The Factory Floor

Gathering all the information in the manufacturing ecosystem into a usable, interoperable platform is a team effort. Process historian systems on the factory floor tag and track each batch of raw materials. Maintenance systems detail operational equipment and calibration settings. Building management equipment captures environmental stats that include air pressure, temperature, and other atmospheric readings in multiple locations at each plant. Automated IoT sensors can send data by the minute — or second — and also be programmed to send critical alerts or triggers if a temperature shift or other environmental concern arises.

The in-memory computing platform has to amass structured and unstructured information coming from all these disparate systems, and some data is machine-readable while other portions are not. The persistent challenge for the platform is to normalize the deluge of data to make it actionable, so manufacturers can align and analyze data sources in a reasonable, cost-effective manner and, most importantly, search for nonconformance issues.

With older computing architectures, looking for data anomalies at the extremes of the manufacturing control parameters could take months. Storage and memory limits often meant that researchers could only evaluate processing for one or two batches of materials at a time. Without a clear picture of the end-to-end processes, QA teams had to work harder — and longer — to synthesize the important pieces into actionable manufacturing changes.

Locating Variances

Data-gathering plays a critical role in the development of vaccines. Since vaccines contain attenuated viruses, they have to be handled under precise conditions during every phase of manufacturing. Components may have to be stored at exact temperatures for a year or more, with no variances from a regulator-approved manufacturing process. If nonconformances happen, the materials may have to be discarded, which can mean millions of dollars in lost revenue. Discarding entire batches can also cause major setbacks when vaccine development is needed quickly, which is currently the case with the Zika virus outbreak.

Merck & Co. faced a vaccine development dilemma a few years ago, as it experienced nonconformance issues in manufacturing that led to higher-than-usual discard rates. After implementing several different data-collection methods to identify the root cause of batch nonconformance issues, the Merck team was able to optimize analysis by using the open-source software framework Hadoop and collecting all the vaccine development and production information into a data lake.

With this process, Merck was able to come up with conclusive answers about production yield variance within just three months, according to Information Week. Merck performed 15 billion calculations and more than 5.5 million batch-to-batch comparisons, which helped researchers discover that certain characteristics in its fermentation phase of vaccine production were closely tied to yield in a final purification step.3

Quality By Design

Merck is the first of many examples. The opportunity to tap into Big Data and build efficiencies in drug manufacturing is only just beginning. The more we understand what’s happening in the manufacturing process, the easier it will be to incorporate the right controls that can improve the end-to-end lifecycle of biopharmaceutical production — speeding development, improving quality, and reducing costs. The long-term gains from using Big Data solutions are tremendous, and the entire healthcare ecosystem will ultimately realize the benefits.


  1. "The Human Face of Big Data" (Trailer), PBS, February 24, 2016,
  2. “The Cost Of Creating A New Drug Now $5 Billion, Pushing Big Pharma To Change,” Forbes, August 11, 2013,
  3. "Merck Optimizes Manufacturing with Big Data Analytics," Information Week, April 2, 2014,

About The Author

Jack Schmidt, SAP life science industry director, started his career at Johnson & Johnson (J&J) and has spent his entire 25-year career in the life science industry, working in pharmaceuticals, biotech, and medical devices.

At J&J, he worked in strategic planning, developing optimization algorithms to improve supply chain planning metrics, as well as a supply chain performance management model. He also led the greenfield start-up of a pharmaceutical manufacturing and packaging plant, which employed a lean production philosophy and high-performance work team concepts.

At SAP, Jack is responsible for identifying emerging industry trends and guiding SAP solution investments in these innovation areas. This includes automating and extending enterprise business processes through new capabilities such pharma supply chain serialization, patient engagement solutions, and predictive analytics for actionable insight.