Follow Datanami:
September 25, 2019

Paving the Way for DataOps

Dave Mariani


How is your DataOps going? Have you become a bona fide “insights-driven” business yet? Or are you still struggling to implement DataOps effectively across your organization? If you’re like most, it’s probably the latter.

DataOps—an automated, process-oriented methodology for optimizing the rapid collection, integration, analysis, security, and integrity of data—needs to be standard practice at any insights-driven business. However, there are a number of technical hurdles that prevent companies from fully implementing a DataOps culture.

Limited access to data, difficulty aligning data from disparate data sources, slow query performance, data security and governance issues, and business disruption from altering the data architecture of the enterprise data warehouse all stand in the way of robust, effective DataOps. DataOps can also be hindered when companies are mired in cloud migration initiatives, unable to deliver the timely, accurate business insights the organization needs.

In order for DataOps to truly flourish and deliver the competitive advantages it promises, companies need to develop flexible environments that can overcome these technical and organizational obstacles. With a clear path forward, DataOps can fully leverage data in real time across all business units for new products, services, and customer and market insights to drive growth and outpace the competition. Such a flexible environment requires what is known as a “shared data intellect.”

Leveraging a Shared Data Intellect

A shared data intellect arises when all of the data in your organization is accessible and readable to all business users in all departments, and can be integrated in any combination desired. Essentially, it’s like combining the technical capabilities of the CIO organization with the analytical skills of the CDO organization. Like merging the left and right halves of the brain, a synchronized, holistic data environment emerges.

Many companies attempt to achieve this shared data intellect via cloud migration. However, there is a far simpler way to gain a shared data intellect and pave the way for DataOps no matter where you are in your cloud migration process: intelligent data virtualization.

Intelligent data virtualization is a new, source-agnostic approach to data virtualization that can read and communicate with all data in all formats, in legacy systems or in the cloud, without having to move it or transform it in any way. As you undergo cloud migration at your own pace and on your own terms, you can still gain the benefits of a shared data intellect. This means you can enact a DataOps culture now, without having to wait for your cloud migration to be completed—and all the headaches and uncertainty that entails.

Intelligent data virtualization enables the creation of a shared data intellect in four primary ways:

  1. Unencumbered connectivity to data and feeds of data;
  2. BI/OA tool-agnostic access;
  3. Superior analytical and query performance;
  4. Data security at rest and in flight.

Unencumbered Connectivity to Data

When data is difficult to find and access, DataOps runs the risk of producing inaccurate or incomplete insights that may result in disadvantageous decisions. Often in these situations, DataOps falls prey to the temptation to gather data in the form of local extracts that will go stale, get interpreted inconsistently, or get stolen or leaked.

Unencumbered connectivity means having comprehensive access to datastores and cloud platforms via live native connections and SDKs. By applying intelligent data virtualization to your enterprise data warehouse, you create a centralized capability to discover and work with any data you need, eliminating the temptation to create local data extracts and significantly improving data visibility and report quality.

BI/OA Tool-Agnostic Access

Bringing together disparate datasets can create costly errors in analysis when data with different business definitions are combined without being normalized.

(Ekaphon maneechot/Shutterstock)

Different Business Intelligence (BI) and Operational Analytics (OA) tools have different query languages, which will result in divergent query results between varying tools. Enterprises need to be able to automatically normalize data with a common enterprise business logic that is legible to multiple different BI/OA tools. Intelligent data virtualization applies a standardized business logic to all of your data, so that disparate datasets can be integrated and analyzed with any BI/OA tool you prefer. Query results can be relied on to be consistent across BI and OA tools.

Superior Analytical and Query Performance

Shortening the cycle time required to query the enterprise data warehouse improves an organization’s ability to make agile decisions and iterate through queries in the insight discovery process. Since queries on databases with billions of records can take hours or days to return, how can the enterprise accelerate query response times?

With intelligent data virtualization, as queries are run against datasets in the enterprise data warehouse, machine learning is applied to determine what data is needed and what data is extraneous. The virtualization software then builds optimized acceleration structures that are substituted for the raw data in queries. The result is massive time savings for DataOps, as extraneous data is bypassed, delivering query performance that is 5 to 40 times faster, depending on the datasets.

Data Security At-Rest and In-Flight

Working with datasets assembled from far-flung databases that may have different security schemas can be a serious risk for enterprises. DataOps seeks to provide valuable insights quickly, but it cannot do so at the expense of privacy and security.

Intelligent data virtualization uses best-of-breed security practices wherever possible, such as end-to-end TLS to protect data in flight, LDAP, Active Directory, IdP, and SAML for authentication; and JWT, CORS and REST for API access.

When accessing data, security leakage can occur when enterprises utilize connection pools for BI tools or depend on security aggregation systems. Intelligent data virtualization solves these challenges by checking security requirements at the source databases and applying those requirements to query results. User identities are tracked and verified, even when accessing data through a pool, and security policies from all data sources are collected and merged to filter results appropriately. These same security policies are applied to data aggregates, eliminating unintentional exposure of restricted or private data.

Why wait Any Longer for DataOps?

In order to compete with the most disruptive, forward-looking companies today, an insights-driven, DataOps-powered business model is no longer optional. An organization with a shared data intellect is the ideal environment for DataOps to thrive. Gaining this shared data intellect is much less complicated — and less expensive — than many believe.

With intelligent data virtualization, organizations can bring data together without moving it, share common views securely, and drive unprecedented analytics and query performance, leading to new creative discoveries, new services, and new business models and revenue streams. CIOs and CDOs who embrace the spirit of DataOps will lead this new data-powered charge that is propelling enterprises forward in a brave new data-driven economy.

About the author: Dave Mariani is one of the co-founders of AtScale and is its chief strategy officer and vice president of technology. Prior to AtScale, Dave was vice president of engineering at Klout and held the same position at Yahoo!, where he built the world’s largest multi-dimensional cube for BI on Hadoop.

Related Items: