Dremel Builder Gets $7M for SQL-Based Supertool
Big data startup Metanautix emerged from stealth mode today by announcing a $7-million round of venture funding to further development of a SQL-based power tool. Led by the former Google engineer who headed the development of Dremel, the company aims to dissolve product and technology barriers by “re-imagining” SQL at the heart of an emerging big data supply chain.
SQL is enjoying a renaissance as the big data boom continues to reverberate throughout the IT and business sectors. While emerging big data platforms like Hadoop and NoSQL database have introduced powerful new ways to store and access data, the resurgent momentum behind SQL is in many cases forcing the developers of these cutting-edge systems to rethink their omission of SQL–and in many cases, patch the systems with new SQL access points.
This SQL resurrection is not only leading Hadoop distributors like Cloudera, Hortonworks, and Pivotal to fund big SQL access programs like Impala, Stinger, and HAWQ, but it’s leading established business intelligence and data warehouse vendors like Oracle, Teradata, Microsoft, and IBM to put SQL front-and-center of their logical data warehouse strategies. Write one SQL query, the thinking goes, and let the system federate the query out to where the data resides.
Metanautix is taking a similar strategy, just without the legacy baggage of needing to support existing data warehouse customers. It stays as far away from the storage layer as it can, preferring instead to provide a thin, neutral layer of SQL oil that will lubricate the gears of the emerging “data supply chain.”
“What we’re trying to do is make it possible for analysts to reach into whatever data they want to reach into and not really have to care about all the details,” says Metanautix co-founder and CEO Theo Vassilakis, who led the a staff of 75 for the development of Dremel, the distributed query engine that powers Google’s BigQuery.
“We call it re-imagining SQL because is SQL is very old and standard,” Vassilakis continues. “It’s baked into many corporate applications. We think that one of the impediments to getting at all the data that’s out there is that analysts have to keep switching tools and using a lot of different systems. We wanted to make it so that you could do all the stages of analysis–a little ETL, a little ad hoc analysis, also some serving–and do it with one system with a standard language.”
The company’s eponymous product will use standard SQL to access any type of data wherever it resides, including relational databases like MySQL, NoSQL databases like Mongo, HDFS, object-data stores, Amazon S3, CIFS, NFS, and others. When it becomes generally available later this year (it’s currently in limited release), the software will perform a variety of functions–including ETL, ad-hoc queries, serving analytic dashboards, and even running machine learning algorithms–without requiring users to move among different tools.
It’s all about enabling the “data supply chain” to function as effortlessly as possible. “We’re using that phrase to refer to when one person feeds data into another person who feeds data into another person, and how do you aggregate that and provide a good sense of [what’s going on],” Vassilakis tells Datanami. “We want to fit into their existing environments as opposed to forcing them into a certain way.”
The product was inspired in part by Dremel, the insanely big and fast query tool that Google built to analyze petabytes of data and run in a distributed manner across thousands of nodes. Dremel has inspired other software developers, including those behind the Apache Drill project, which is developing an interactive query technology based on Dremel for the Hadoop environment.
Metanautix was also inspired in part by the work that Metanautix CTO and co-founder Apostolos Lerios did at Facebook. Lerios led the development of the photo-upload portion of the social media giant’s website. With more than 300 billion images, it’s currently the largest photo repository in the world. The combination of the two co-founder’s background led to Metanautix.
Metanautix supports all of the analytical aspects of the ANSI SQL standard; it doesn’t bother with the transaction oriented data types because that’s not the company’s focus. The software treats any data as a standard table, including structured data and unstructured data. For unstructured data types, such as JSON data and images, the company built its own functions.
“Our goal is basically, wherever you have your data, we want to be able to go read it,” Vassilakis tells says. “If it’s NoSQL, we’ll go read it from NoSQL. If it’s a server or if the Web or HDFS, we’ll go get it. Whatever we don’t support, we’ll use our extensibility mechanism….so people can plug in their legacy logic, but still make it visible to the user as plain SQL.”
“Metanautix is a distributed system. You can run one node or you can run 1,000-server configurations behind your Qlik product if that’s what you need,” Vassilakis says. “Part of the beauty of SQL is you don’t need to know the difference. We can say, hey is there big data? Just point your QlikTech instance to us, treat us as SQL, but we can go out and read a lot more data than whatever data you’ve allocated to Qlik and still have it be performant and integrated.”
In distributed and Hadoop environments, Metanautix software can replace the need for MapReduce coding. “Our view is Hadoop is great. You get lots of high-throughput storage and data formats and all of that. But it’s also challenging because for a lot of things, you have to write custom code or connectors,” Vassilakis says. “One of the areas customer are interested in is joining Hadoop data with non-Hadoop data. They say, Can I join Oracle and MySQL with Hadoop? Can I get next-gen data, such as GPS data or Internet of Things traces, into Hadoop, but then join it with more traditional accounts information database stuff, on the fly without writing new code or MapReduce?”
Metanautix will also run machine learning algorithms. The company has demonstrated how to implement the K-Means clustering algorithm in SQL. “People don’t ‘think of SQL that way,” Vassilakis says. “It turns out it’s not that hard. Eight queries we did it in. A lot of things that people felt are in the purview of some super specialized system that just does machine learning or clustering specifically is now at their fingertips. We’re basically trying to make it so just SQL and more queries is the answer. We treat all data like table and all questions as SQL and try to bring those two things together.”
The company wrote its software using a combination of C++ for performance and Java for extensibility. The software runs in virtualized Linux and Windows environments, and is designed to scale.
Today the company announced that it closed a $7 million Series A round financing, which was led by Sequoia Capital and includes investments from the Stanford University endowment fund and from Shiva Shivakumar, former vice president of engineering and distinguished entrepreneur at Google.
September 23, 2021
- AtScale Expands Semantic Layer Solution for Microsoft Excel
- CNCF End User Technology Radar Provides Insights into DevSecOps
- At Annual OCEANS 2021, Sofar Ocean Debuts First-of-Its-Kind Maritime Open Standard, Bristlemouth
- Elastic Announces the General Availability of Elastic App Search Web Crawler, New Features for Elastic Enterprise Search
- Securonix Achieves FedRAMP In-Process Authorization
- EDJX and Cubic Corporation Partner to Launch the Internet of Military Things Edge Platform
September 22, 2021
- GigaOm Names Moogsoft an Industry Leader in “Radar for AIOps Solutions” Report
- Clearsense Acquires Plug-and-Play AI Analytics Firm
- Purdue University Global Launches Master of Science in Data Analytics
- Dihuni OptiReady CognitX Deep Learning Servers and Workstations Powered by NVIDIA Ampere Architecture-based GPUs
- Scality Awarded New U.S. Patent for Breakthrough Technology in Hyper-Scale Data Protection
- MicroAI to Bring AI Training to Renesas MCUs
- Recent Gartner VP Analyst Sanjeev Mohan Joins Okera as a Strategic Advisor
- C3 AI Reinvents Enterprise Software UX With C3 AI Data Vision
September 21, 2021
- Healthcare Analytics Summit 21 Virtual Kicks Off Today
- Tesco Selects Teradata Vantage to Drive Enterprise-Wide Analytics at Scale
- Ketch Secures $20 Million in Series A1 Funding, Accelerating its Rapid Growth
- Yandex Spins Off ClickHouse into Standalone Company
- Analytics Vidhya Announces $5.5 Million Strategic Investment from Fractal, Aims to Train Half a Million Full Stack AI Professionals
- Nutanix Cloud Platform Breaks Down Silos in Hybrid Multicloud Operations
Most Read Features
- One on One with Google Cloud Product Director Irina Farooq
- Big Data File Formats Demystified
- Tabular Seeks to Remake Cloud Data Lakes in Iceberg’s Image
- What’s the Difference Between AI, ML, Deep Learning, and Active Learning?
- Who’s Winning In the $17B AIOps and Observability Market
- SambaNova Brings Custom Silicon To Bear on High-End AI Workloads
- In Search of the Modern Data Stack
- COVID-Driven Cloud Surge Takes a Toll on the Data
- Rethinking Education in an AI-First World
- Did Rockset Just Solve Real-Time Analytics?
- More Features…
Most Read News In Brief
- LinkedIn Open Sources Tech Behind 10,000-Node Hadoop Cluster
- Data and AI Salaries Continue Upward March, O’Reilly Says
- Gartner Shuffles the Technology Deck with Latest ‘Hype Cycle’ Report
- Data Prep Still Dominates Data Scientists’ Time, Survey Finds
- Who’s Winning in Open Source Data Tech
- Can Apple Right its Privacy and Security Cart?
- Hands-Off: Manual Data Integration Tasks Plummeting, Gartner Says
- Why Is SAS Going Public?
- Apollo CEO Bullish on GraphQL’s Potential in the Enterprise
- Why Young Developers Don’t Get Knowledge Graphs
- More News In Brief…
Most Read This Just In
- TIBCO NOW 2021 Showcases Limitless Power of Data
- Cribl Raises $200M in Series C Funding on Traction with Global Enterprise Customers
- Toloka Launches Data Research Grants, Announces First Eight Recipients
- Anaconda Announces Support for Pyston, Hiring Lead Developers Kevin Modzelewski and Marius Wachtler
- MariaDB Announces SIS Provider Campus Cloud Services Migration to MariaDB SkySQL
- Transaction Processing Performance Council (TPC) Launches an Artificial Intelligence Benchmark (TPCx-AI)
- Kinetica Fuses Streaming and Contextual Analysis At Scale
- DataRobot Launches “DataRobot AI Cloud” Platform
- OneStream Previews New AI and ML Capabilities at Splash 2021
- JetBrains Launches Public Early-Access Program for JetBrains DataSpell IDE
- More This Just In…
Sponsored Partner Content
October 5 - October 7
October 12 - October 14
October 19London United Kingdom
October 27 - October 28
November 29 - December 3
December 6 - December 10San Diego CA United States