“With StreamSets and AWS, we’re able to consolidate distributed and isolated data and present it in a dashboard for better decision-making. This data became especially critical during the Covid period, as it impacted overall ridership.”
SamTrans IT spans several teams working in a decentralized ecosystem. Team member skill sets, use cases, data sources, and infrastructure vary widely between agencies, departments, and vendor solutions. This led to disparate processes for integrating data that were bespoke, non-sharable, and prone to failures.
When samTrans initiated a business intelligence reporting project to better understand ridership trends, the team knew they needed to find a way to replace distributed and isolated data with more streamlined processes. This would allow for more self-service and rapid development of data analytics.
The first step samTrans IT took in supporting the decentralized organization and streamlining processes was finding a common data integration platform that was easy-to-use and could support any data source and infrastructure
environment —on-premises, hybrid, or in the cloud.
It chose StreamSets for its ability to support their multiple on-premise source environments (Oracle, SQL Server, CSV, etc.) and cloud destinations like PostgreSQL and AWS S3. Using StreamSets, it created multiple smart data pipelines that gathered data from previously isolated sources and landed it reliably and continually in S3 and AWS PostgreSQL.
StreamSets is helping samTrans on its DataOps journey. Where it used to be impossible to manage and ingest data from various siloes into the cloud, that’s no longer the case. Having one unified tool with a graphical UI that lets the organization easily interact with and manage multiple sources and destinations is a big step in its process of streamlining.
It has also aided in feeding data to samTrans AWS partners in real-time, especially around ridership for management decision-making around budgets and lowering costs. As the team churns high volumes of data, they’ve begun introducing a new cloud data warehouse and will subsequently be venturing into AI-related technologies when they move on to analytics.