Microsoft Aims for Data Analytics Unification with ‘Synapse’
Microsoft took a big step toward unifying disparate analytic toolsets and functions this week with the launch of Synapse Analytics, an ambitious new cloud-hosted offering that seeks to bring together the separate worlds of data warehousing and data lakes.
Data warehouses and data lakes are similar in a lot of respects. Both of them serve as storage repositories for landing massive amounts of data from a variety of sources. And both data warehouses and lakes provide access to processing engines that businesses can use to analyze the data in pursuit of competitive advantage.
But there are key differences between lakes and warehouses, too, starting with the types of data. Data warehouses are relational databases that mostly house structured data originating in business applications, whereas data lakes are often object stores or Hadoop Distributed File System (HDFS) clusters that are used to house less-structured data generated from a wide variety of sources.
Similarly, SQL is the favored analytical engine of choice for powering descriptive analytics in the data warehouse world, whereas data lake-based data is often used to train predictive models using machine learning algorithms. ETL is the favored data cleansing technique in the button-downed DW world, whereas data lake practitioners often favor ELT, where data is transformed after its landed.
These differences mean that companies often run data lakes and data warehouses on separate systems. Companies that have spent years to construct a good data warehouses that’s delivering solid answers typically do not want to muck around with it.
This division of labor between the data lake and data warehouse teams can lead to problems, according to Rohan Kumar, the corporate vice president of Azure Data.
“Both are critical, yet operate independently of one another, which can lead to uninformed decisions,” Kumar said. “At the same time, businesses need to unlock insights from all their data to stay competitive and fuel innovation with purpose. Can a single cloud analytics service bridge this gap and enable the agility that businesses demand?”
The answer, according to Kumar’s blog post introducing Synapse Analytics, most certainly is “yes.”
Synapse Analytics “gives you the freedom to query data on your terms, using either serverless on-demand or provisioned resources, at scale,” Kumar wrote. “Azure Synapse brings these two worlds together with a unified experience to ingest, prepare, manage, and serve data for immediate business intelligence and machine learning needs.”
Technically, Synapse Analytics is the next iteration of Azure SQL Data Warehouse, the company’s cloud-based data warehouse offering. The offering combines the SQL query capabilities of PowerBI and the predictive smarts of Azure Machine Learning into one solution that can do both.
Synapse Analytics also adds the option to harness serverless computing in the Azure cloud to go along with traditional provisioned compute. Support for Microsoft’s object storage system, Azure Data Lake Storage (ADLS), provides access to less-structured data, while integration with Apache Spark provides a means for building sophisticated data pipelines that can work with both SQL and ML engines.
Informatica is looking forward to helping Microsoft unify analytic workloads. “This launch holds substantial impact on those enterprises who require blazing fast analytics on very large amounts of data for the daily operations of their business,” Informatica’s senior vice president and general manager for cloud, big data, and data integration, Ronen Schwartz, told Datanami. “Think airline companies who must process data around ticketing and seat availability, passenger check-in, and scheduling in real-time, to ensure maximum efficiencies and profitability for the company.”
Microsoft made several other major announcements at its Ignite conference in Orlando, Florida, this week, including the GA of SQL Server 2019, which ships with HDFS and Apache Spark built in; and Project Cortex, which uses machine learning techniques to create a knowledge network for customers.