Databricks SQL Now GA, Bringing Traditional BI to the Lakehouse
Companies that want to run traditional enterprise BI workloads but don’t want to involve a traditional data warehouse may be interested in the new Databricks SQL service that became generally available yesterday.
The Databricks SQL service, which was first unveiled in November 2020, brings the ANSI SQL standard to bear on data that’s stored in data lakes. The offering allows customers to bring their favorite query, visualizations, and dashboards via established BI tools like Tableau, PowerBI, and Looker, and run them atop data stored in data lakes on Amazon Web Services and Microsoft Azure (the company’s support for Google Cloud, which only became available 10 months ago, trails the two larger clouds).
Databricks SQL is a key component in the company’s ambition to construct a data lakehouse architecture that blends the best of data lakes, which are based on object storage systems, and traditional warehouses, including MPP-style, column-oriented relational databases.
By storing the unstructured data that’s typically used for AI projects alongside the more structured and refined data that is traditionally queried with BI tools, Databricks hopes to centralize data management processes and simplify data governance and quality enrichment tasks that so often trip up big data endeavors.
“Historically, data teams had to resort to a bifurcated architecture to run traditional BI and analytics workloads, copying subsets of the data already stored in their data lake to a legacy data warehouse,” Databricks employees wrote in a blog post yesterday on the company’s website. “Unfortunately, this led to the lock-in, high costs and complex governance inherent in proprietary architectures.”
Spark SQL has been a popular open source query engine for BI workloads for many years, and it has certainly been used by Databricks in customer engagements. But Databricks SQL represents a path forward beyond Spark SQL’s roots into the world of industry standard ANSI SQL. Databricks aims to make the migration to the new query engine easy.
“We do this by switching out the default SQL dialect from Spark SQL to Standard SQL, augmenting it to add compatibility with existing data warehouses, and adding quality control for your SQL queries,” company employees wrote in a November 16 blog post announcing ANSI SQL as the default for the (then beta) Databricks SQL offering. “With the SQL standard, there are no surprises in behavior or unfamiliar syntax to look up and learn.”
With the non-standard syntax out of the way, one of the only remaining BI dragons to slay was performance. While users have been running SQL queries on data stored in object storage and S3-compatible blob stores for some time, performance has always been an issue. For the most demanding ad-hoc workloads, the conventional wisdom says, the performance and storage optimizations built into traditional column-oriented MPP databases have always delivered better response times. Even backers of data lake analytics, such as Dremio, have conceded this fact.
With Databricks SQL, the San Francisco company is attempting to smash that conventional wisdom to smithereens. Databricks released a benchmark result last month that saw the Databricks SQL service delivering 2.7x faster performance than Snowflake, with a 12x advantage in price-performance on the 100TB TPD-DS test.
“This result proves beyond any doubt that this is possible and achievable by the lakehouse architecture,” the company crowed. “Databricks has been rapidly developing full blown data warehousing capabilities directly on data lakes, bringing the best of both worlds in one data architecture dubbed the data lakehouse.”
(Snowflake, by the way, did not take that TPC-DS benchmark lying down. In a November 12 blog post titled “Industry Benchmarks and Competing with Integrity,” the company says it has avoided “engaging in benchmarking wars and making competitive performance claims divorced from real-world experiences.” The company also ran its own TPC-DS 100TB benchmark atop AWS infrastructure and–surprise!–found that its system outperformed Databricks by a significant margin. However, the results were not audited. )
Databricks has built a full analytics experience around Databricks SQL. The service includes a Data Explorer that lets users dive into their data, including any changes to the data, which are tracked via Delta tables. It also features integration with ETL tools, such as those from Fivetran.
Every Databricks SQL service features a SQL endpoint, which is where users can submit queries. Users are given “t-shirt” size instance choices; the workloads will also elastically scale (there is also a serverless option). Users can construct their SQL queries within the Databricks SQL interface, or work with one of Databricks’ BI partners, such as Tableau, Qlik, or TIBCO Spotfire, and have those BI tools send queries to the Databricks SQL endpoint. Users can create dashboards, visualizations, and even generate alerts based on data values specified in Databricks SQL.
While Databricks SQL has been in beta for a year, the company says it has more than 1,000 companies already using it. Among the current customers cited by Databricks are the Australian software company Atlassian, which is using Databricks SQL to deliver analytics to more than 190,000 external users; restaurant loyalty and engagement platform Punchh, which is sharing visualiations with its users via Tableau; and video game maker SEGA Europe, which migrated its traditional data warehouse to the Databricks Lakehouse.
Now that Databricks SQL is GA, the company says that “you can expect the highest level of stability, support, and enterprise-readiness from Databricks for mission-critical workloads.”
January 21, 2022
- Deci Launches SuperGradients, an Open-Source Deep Learning Training Library for CV Models
- Cloudian Partners with WEKA to Deliver Exabyte-Scalable Storage for AI and ML
January 20, 2022
- Ascend.io Announces First Annual Data Automation Summit
- Lockheed Martin Space Selects ESTECO’s VOLTA for Its Total System Model Framework
- Sisense Releases New Study: ‘The Business Intelligence Landscape’
- Protegrity Partners With Google Cloud to Support BigQuery Remote Functions
- Pavilion Data Raises $45M to Expand Its Platform for Accelerating Data Analytics
January 19, 2022
- KIOXIA Introduces UFS 3.1 Embedded Flash Memory Devices With QLC Technology
- Precisely Signs a Definitive Agreement to Acquire PlaceIQ
- Jumio Completes 4Stop Acquisition
- ScienceLogic Rides AIOps’ Surge with Banner Year, 2022 Expansion
- Alation Launches “Data Radicals” Podcast
- Dell Technologies Speeds Journey to Multi-Cloud with Portfolio Expansion
- Datatron Offers Accelerated AI Model Deployment and AI Governance Program
- Ground Labs Launches Data Discovery Network
January 18, 2022
- DataSecOps Pioneer Satori Partners with AWS, Sees Six-Fold Increase in Users
- New Study Reveals 78 Percent of Companies Cite AI as a Key Revenue Driver in 2022
- Yugabyte Announces Strategic Partnership with UK Data Services Provider Intuita
- Vectice Announces $15.6M in Seed and Series A Funding
- Pure Storage Modernizes Partner Program to Align with as-a-Service Evolution
Most Read Features
- All Eyes on Snowflake and Databricks in 2022
- Data Mesh Vs. Data Fabric: Understanding the Differences
- Data Science and AI Predictions for 2022
- Big Data File Formats Demystified
- 2022 Big Data Predictions from the Cloud
- The Future of the Metaverse + AI and Data Looks Bright
- Is Quantum Computing the Future of AI?
- 10 NLP Predictions for 2022
- 2021 Big Data Year in Review: Part 2
- Who’s Winning In the $17B AIOps and Observability Market
- More Features…
Most Read News In Brief
- Mathematica Helps Crack Zodiac Killer’s Code
- Alteryx to Acquire Data Wrangler Trifacta for $400 Million
- Data Prep Still Dominates Data Scientists’ Time, Survey Finds
- Big Growth Forecasted for Big Data
- Databricks SQL Now GA, Bringing Traditional BI to the Lakehouse
- Google Cloud Attacks Supply Chain Crisis with Digital Twin
- Narrowing the AI-BI Gap with Exploratory Analysis
- ETL Tool Apache Hop Graduates Incubator
- Composable Analytics: Where Headless BI Meets Your Data in the Cloud
- Nothing Runs Like a GPU-Powered, Fully Autonomous Deere
- More News In Brief…
Most Read This Just In
- Collibra Announces Investment from Snowflake to Expand Data Intelligence for Snowflake Data Cloud
- Deloitte Launches Metaverse Services and Metaverse Experience Studio
- BBVA Teams with Accenture to Streamline Operations with Artificial Intelligence
- HHI Group to Establish Big Data Platform in Partnership with Palantir Technologies
- Anomalo and Snowflake Partner to Help Enterprises Trust Their Data
- WEKA Announces $73M Funding Led by Hitachi Ventures
- Databricks Launches Data Lakehouse for Retail and Consumer Goods Customers
- Datanami Reveals Winners of the 2021 Readers’ and Editors’ Choice Awards
- New Report Finds Self-Service Analytics Are Critical to Empowering Frontline Workers
- Alteryx Announces Acquisition of Trifacta
- More This Just In…
Sponsored Partner Content
May 3 - May 5Houston TX United States