GlaxoSmithKline (GSK) is a science-led global healthcare company with a special purpose: to help people do more, feel better and live longer. Creating a new drug can take anywhere from 8 years to 20 years for a pharmaceutical company. GSK was looking to shorten that development time by getting more value from its data.
To do this, it set out to create a Data Center of Excellence to accelerate the delivery of clean data from 1,000s of data sources to more than 10,000 plus data scientists worldwide involved in research and development (R&D) and accelerate time to market.
"In the pharma industry, it is very important for us to trace our data. Keeping track through StreamSets logs was a huge benefit for us because moving, transforming, and tracing the data brought high efficiency with compliance and scaling data engineering.”
- Arun Reddipalli, former Sr Director R&D Data Platform, GSK
To onboard the Data Center of Excellence around data delivery and DataOps, GSK needed to bring R&D data into one single system efficiently and quickly to rapidly scale up and utilize its data sources.
To achieve this, GSK needed to bring siloed data together into a primary data and information platform where users across the enterprise can consume all the data in different ways.
The GSK team is responsible for dynamically scaling its data flows to meet the demands of new data sources. They have evolved data practices to automate aspects of data acquisition and delivery utilizing bot-driven pipelines.
Using StreamSets, the GSK team has automated data pipelines and data drift handling with the flexibility to push technology boundaries without interrupting the critical flow of self-service data for scientists.