Follow Datanami:
August 1, 2017

ETL Slowing Real-Time Analytics, Survey Finds

(kentoh/Shutterstock)

ETL, the extract, transfer and load tool used to move data between databases or to data warehouses, is struggling to keep pace with growing demand for real-time data analysis, resulting in operational inefficiencies and, ultimately, lost business opportunities, a new vendor survey warns.

Along with ETL, the survey conducted by IDC on behalf of sponsor InterSystems Corp., the database management vendor based in Cambridge, Mass., also notes that Changed Data Capture (CDC) technology also is proving to be a laggard as real-time data analysis gains momentum.

The study found that nearly two-thirds of data moved via ETL was at least five days old by the time it reached an analytics database. As for CDC, a real-time data replication technology, the survey revealed that on average it takes at least 10 minutes to move 65 percent of CDC data into an analytics database.

The findings “highlight the importance of concurrent transaction processing and real-time data analytics for improving customer experience, business productivity, operations and more,” the company noted in releasing the study this week. But as things stand now, the survey found a “data disconnect” as ETL fails to keep pace with real-time analytics.

The survey also found that 75 percent of IT executives polled worry that the data lag has hurt their business while 27 percent said the data disconnect is slowing productivity. Meanwhile, more than half of respondents said slow data is limiting operational efficiency.

IT executives also stressed the growing importance of new data types, particularly unstructured sensor and video data. Those new data sets make disconnects between real-time analytics and ETL along with CDC “more alarming,” InterSystems noted.

Indeed, the more than 500 companies surveyed listed Internet of Things data, relational as well as streaming data from external sources along with sensor data, graphs, video, JSON documents and geospatial data as “very important.”

Meanwhile, IDC reported that more than one-third of respondents said they are evaluating new database technologies this year as they retire older versions and search for ways to speed up data transfers for real-time analysis. Just over 20 percent said they are moving applications to software-as-a-service platforms while 17.3 percent were considering open source database options.

These moves are aimed primarily at bridging what the market researcher calls “The Great Divide” between transactions and analytics. The analytics requirements include handling data from many transactional databases that must be optimized for speedier queries along with analytical frameworks capable of sorting through and drawing conclusions from a soaring amount of data.

The biggest problem faced by many respondents is that transaction data moves to slowly via ETL technologies. Slow data integration is also fueled by the fact that databases geared to transactions are not designed to perform real-time analytics queries. Conversely, analytics databases can’t process transactions fast enough to meet application performance requirements.

Hence, companies like InterSystems promote database management systems designed, among other things, to blend transaction and analytics databases. InterSystems launched a product called Caché, a commercialized version of a database called MUMPS. The InterSystem platform incorporates multidimensional array structures for storing hierarchically structured data.

Recent items:

The Real-Time Future of ETL

Three NoSQL Databases You’ve Never Heard Of

 

Datanami