Follow Datanami:
April 12, 2018

The Cure for Chaos: Automating Your Data Pipeline

Zak Pines 


The number of SaaS applications companies use has exploded. Most SMBs use at least a dozen, which means customer data are spread across systems and departments.

So while customers can engage with businesses through more channels than ever — websites, sales reps, support reps, web forms, chat bots, support tickets, social media to name a few — the data from these interactions are stored across disparate systems, and it is often difficult to make use of the data in a unified manner.

For example, if you use Salesforce for CRM, Marketo for marketing automation, and Zendesk for support, a single customer might have a separate contact record for all three applications. Or maybe it’s Microsoft Dynamics, HubSpot, FreshDesk and QuickBooks. Or NetSuite, Pardot, HelpScout and Jira.

Having separate contact records is risky. As every system has its own data model, the three contact records will therefore be related to three different schemas. In one system, the contact may be affiliated with opportunities, tasks, activities; in another, tickets, events, and products; in the last, orders, marketing and campaigns. The danger here is you’re missing the entire picture — and not just for one contact — for every contact.

Traditionally to get trusted, consolidated reporting, organizations must manually extract, transform, and load data (ETL) from these sources to feed reports, analytics and business intelligence (BI) tools. To do so requires building API connections for each data source, modeling the data, and setting up a warehouse.

This process is highly technical. ETL is more than just moving data from one endpoint to another. Developers must learn how to authenticate each API, then work through an arduous process to merge the datasets together. To get data in the right format means eliminating duplicates, resolving conflicts, and standardizing unique data models into a single schema. And we’re not just talking about a flat data file, we’re talking about an entire data set of object relationships between systems (e.g. the link between contact and account) that had to be accessible via one trusted location, merged with all the opportunities, activities, tasks, and tickets with which they were associated.

As every agile shop knows, developer time is precious. Writing scripts and pushing out config files to generate reports steals time away from developing new products. Faced with this opportunity cost, organizations lean on business analysts, who spend endless time exporting database logs, mending and melding .csv files, and formatting pivot tables.

So while the explosion in SaaS has been a boon for productivity, it’s also obviated analysis of even the most basic customer metrics. Executives who want reliable customer acquisition cost (CAC), lifetime value (LTV), daily active users (DAU), annual recurring revenue (ARR), and churn data might wait months just to get data ready for analytics. Even hundreds of thousands of dollars later, their dashboards may be using data that is weeks old. Data values can differ. Or one SaaS system may format dates as DD-MM-YYYY, another as YYYY-MM-DD, which slows down getting insights from data.

Automate Your Data Pipeline

Fortunately, consolidating data is much easier than it used to be. The key is automating  your data pipeline using a modernized approach to ETL and warehousing.

Except Modern ETL isn’t really ETL at all. There is no laborious extraction, transformation, and loading of data. Rather, your data pipeline continuously streams data from your applications, unifies it, and delivers it as an on-demand cloud data warehouse for analytics and reporting.

In this new world, you select multiple data sources within a simple UI, and then let a cloud data warehouse auto-create field mappings and tables, whose database you can connect to any BI tool. Such tools standardize field formats and craft a universal schema, one that automates the entire data pipeline while preserving a deep knowledge of object relationships.

For developers, this is great because they no longer need to account for each API’s idiosyncrasies. Warehousing is also taken care of. Tables are already joined. And if a system isn’t already connected, a developer could write their own with an SDK.

This automated data pipeline is also a dream come true for analysts. With datasets merged, like records matched, and conflicts resolved, delivering reports in near real-time is now within reach. To change mappings between systems happens right within the UI, so there’s no need to file a ticket to IT to update the rules in some siloed SaaS system.

Accelerating Time To Analytics

Data scientists spend, on average, 80% of their time ingesting data, and about 20% mapping and modeling trends. Even after analysts have wrangled and formatted data, there is still no guarantee that queries will be swift or easy to run if data span across multiple warehouses. When executives ask analysts to generate key reports, the latter must repeat this tedious exercise and be intimately familiar with the nuances of querying distinct databases. Moreover, many SaaS applications limit the number of API calls. This means that refreshing dashboards is often slow and the volume of data they can be downloaded, limited. The gestalt of these challenges is that analysts spend more time formatting data than performing analysis.

With an automated data pipeline across SaaS applications organizations can get access to analytics much faster than before. The data prep time can move from 80% of an analytics project to zero.

The advantage of reducing the time to analytics is that businesses can go from analyzing data once a month to real-time. And once within the BI tool, queries return results much faster for a single dataset than for multiple databases. Business analysts can now get their job done faster.

Automating your data pipeline is great for executives in need of a single dashboard of KPIs. Data originate from all SaaS applications, making reports cross-functional as their data can now span from marketing and sales to support and customer success. These dashboards you can create in any BI tool, rather than the siloed application (e.g. Dynamics, Marketo, or HubSpot), which can be difficult to share with other teams. As you move from exclusivity to inclusivity, what was once a data guessing games become confident decisions. You’re free of the $100,000 you’re spending on consulting projects, storage, software, and servers to clean your data.

As for the reporting itself, most teams receive a high volume of activity data which a BI tool can make sense of, but most CRMs and native applications can’t. With a unified dataset, you can report on all your Activity data. This means you can tweak campaigns so as to acquire customers who don’t churn. You have insight into what your optimal customer profile looks like to know how and which channels to target profitable customers, and how customer success can help retain these high-value customers if ever they’re at risk as they approach renewal.

Once you do, the campaign attribution and performance reports you run will have closed the loop on reporting. An automated data pipeline treats the viral spread of SaaS — and by unifying customer data, improves the health of the entire business.

About the author: Zak Pines is the Moneyball Marketer, a data-driven marketing leader closely aligned across departments around driving revenue growth. Zak is the VP, Marketing for Bedrock Data, and has worked in marketing technology and business analytics for over two decades.

Related Items:

The Top Three Challenges of Moving Data to the Cloud

Yelp Open Sources Data Pipeline That Saved It $10M


Do NOT follow this link or you will be banned from the site!