The best way to understand something is through concrete examples. I’ve put together seven examples of data pipelines that represent very typical patterns that we see our customers engage in. These are also patterns that are frequently encountered by data engineers in the production environments of any tool.
Use these patterns as a starting point for your own data integration project, or recreate them for practice in architecting data pipelines.
Change Data Capture pipeline
Migration pipeline from an on-premise database to a cloud warehouse
Streaming pipeline from Kafka to Elasticsearch on AWS
Reverse ETL pipeline
Pipeline using a fragment
Pipeline from local file storage to local file storage
You’re probably wondering why a pattern like this exists. You might be surprised to learn that there are quite a few instances of pipelines that use a local origin and local destination. This is because StreamSets handles file storage extremely well due to being schema-agnostic. Because of this, it can ingest new columns or columns in a different order across many different files without any further intervention.
Once you make your pipeline, whatever ends up in your origin will end up in your destination without you having to touch it again. Another way StreamSets handles files so well is that it is quite easy (just a drop-down menu) to change file format. So, it is possible to quickly and easily save many .csvs to .json files, .avro files, .txt files, etc.