Article

What is data drift?

Learn about the unexpected and undocumented changes to data structure, semantics and infrastructure—and the impact it can have on your business.

Data drift defined

The explosion of data sources and rapid innovation in advanced analytics, data science, AI, and machine learning has fundamentally changed the scale and pace of data integration. The added pressure to make decisions quickly in unprecedented times means that your organization is more dependent on fresh, reliable data than ever before.

Data drift is unexpected and undocumented changes to data structure, semantics, and infrastructure that is a result of modern data architectures. Data drift breaks processes and corrupts data, but can also reveal new opportunities for data use.

A modern approach to data integration helps you accelerate your digital transformation and manage uncertainty by enabling you to harness data drift instead of fighting against it.

Data drift is...

A change in IP address format disrupts data to a BI dashboard and goes undetected for months

An update to a hostname disrupts applications globally and takes months to refactor

A transition from 10-digit to 12-digit IDs affects 18,000 known applications

Accelerating data transformation with DataOps

The vast majority of business logic that drives the modern enterprise resides in the integration of 1000s of tiny, specialized applications across multiple platforms. These integrations have become the most vulnerable points in modern business operation. Yet, traditional data integration processes and platforms ignore the simple fact that modern data semantics and structures change, frequently.

The practice of DataOps assumes change is constant. By architecting for change, you not only manage data drift, you can begin to harness its power, enabling you to respond to change and adapt to new business conditions and innovations with speed and agility.

More DataOps Case Studies

7 ways to handle data drift and embrace change

1. Reduce dependency on custom code

When custom coded pipelines break, your first challenge is finding the problem, then you have to find someone to fix it. Smart data pipelines built using code-free tools alert you to change and can be maintained by anyone on your team. When needed, custom code should be packaged for reuse and porting between pipelines.

2. Minimize schema specification

Take an intent-driven approach to developing and building pipelines. By specifying conditions for and transformations on fields for analysis your dataflows are more reliable, robust, and easier to diagnose.

3. Require fully instrumented pipelines

End-to-end instrumentation of streaming data gives you a window into performance as you contend with change. By design, pipelines should capture details of the entire architecture while minimizing operational overhead.

4. Decouple data pipelines from infrastructure

Large lockstep upgrades have been replaced by ongoing, periodic changes to components. Data integration that avoids tight coupling between systems generates pipelines that are more resilient to change and future proof your design work.

5. Build data pipelines for intent instead of semantics

Monitoring data for patterns as it flows through your infrastructure gives you insight into changes in meaning. You can catch errors before they pollute downstream analysis, prevent sharing of sensitive data, and even unlock new opportunities for data use.

6. Assume multiple platforms

With data stored in different environments, you will need an architecture for workload portability. Your data integration system should provide visibility across cloud platforms, and enable you to operate and synchronize across the environment effectively.

7. Keep your focus on what matters

Smart data pipelines detect and handle changes in schema, semantics, and infrastructure drift. They automatically adjust to small changes, send alerts based on rules, and highlight emerging design. A decoupled architecture like StreamSets abstracts away the “how” of implementation, so you can focus on the “what” of your data. Instead of fighting data drift, you can embrace change.

10 Best Practices for Modern Data Integration

What is data drift?

Learn about the unexpected and undocumented changes to data structure, semantics and infrastructure—and the impact it can have on your business.

Data drift defined

Data drift is...

A change in IP address format disrupts data to a BI dashboard and goes undetected for months

An update to a hostname disrupts applications globally and takes months to refactor

A transition from 10-digit to 12-digit IDs affects 18,000 known applications

Accelerating data transformation with DataOps

7 ways to handle data drift and embrace change

1. Reduce dependency on custom code

2. Minimize schema specification

3. Require fully instrumented pipelines

4. Decouple data pipelines from infrastructure

5. Build data pipelines for intent instead of semantics

6. Assume multiple platforms

7. Keep your focus on what matters

DEMO

Schedule a demo with a data integration expert

PLATFORM

Documentation

Contact us

StreamSets Community

Welcome

Discover

Connect

Hear from our CEO: The time has come for a Super iPaaS