Splicing a Pause Button into Cloud Machines
Splice Machine develops a machine learning-enabled SQL database that is based on a closely engineered collection of distributed components, including HBase, Spark, and Zookeeper, not to mention H2O, TensorFlow, and Jupyter. Customers use it to build complex AI apps that include transactional, analytical, and ML components. The company just announced a Kubernetes operator for customers running in private cloud environments. So what’s CEO Monte Zweben’s favorite new feature?
The pause button.
“How about that pause button?” Zweben said during a demo of Splice Machine’s Kubernetes Ops Center. “When you pause on Splice Machine, it drains Kubernetes nodes and makes them available for other applications to use.”
Support for Kubernetes is not new at Splice Machine. The company relied on Mesos for some time before pivoting to Kubernetes a couple of years ago. Since then, the company has used K8S to manage customer environments as part of its software as a service (SaaS) offering). Now with Kubernetes Ops Center, which was unveiled last week, customers running the platform on their own gear in their own data center (or in a private cloud) can also leverage Kubernetes to maximize their compute resources.
The pause button is placed prominently at the top of the Kubernetes Ops Center screen. When pressed, it instructs the Kubernetes distribution (Rancher and OpenShift are currently supported, with more on the way) to essentially put Splice on ice and prevent it from consuming any more resources.
This is a big deal considering the amount of resources that customers are wasting in the cloud. A report issued last week by Pepperdata, a provider of tuning solutions for big data applications, found that big companies were wasting millions of dollars, and that even smaller companies could save hundreds of thousands of dollars by tuning their applications (in particular, Apache Spark) to make better use of cloud resources.
Hitting the pause button in Splice Machine is one way to achieve savings.
“I think it’s a powerful thing that we’re offering on premises,” Zweben says. “Even on prem, if you’ve got a small set of virtualization going on, if you can pause and give up your resource to another user, that’s pretty powerful.”
The pause button is pressed frequently for the AWS cluster that Splice Machine uses for its demos. Before getting on a call with a prospect or a journalist, Zweben hits the restore button, and the cluster quickly comes back online. “If we’re not demoing this cluster, why pay for the infrastructure?” Zweben said. “I just checked in five to 10 minutes before we talked and I hit the restore button and it comes back, just like it was.”
Zweben couldn’t put a dollar amount on the savings, but says that they are substantial. “It is more than 50% savings when you’re shutting a cluster off overnight,” he said. “We do that on our trials. We have an automatic trial mechanism, where you can come to Splice, and get it for a few weeks for free. If somebody is not active during their trial, we just auto-pause it.”
With Kubernetes running herd on compute resources, Splice Machine is free to concentrate on more important things, like ensuring that all the complex distributed components function as a seamless unit.
“All of the Splice Machine clusters have that elasticity where you can turn it off, and it basically doesn’t consume resources,” Zweben said. “The ability to separate storage and compute in that way saves an enormous amount of money.”
The split between on-prem and cloud customers is roughly 50/50 for new accounts, Zweben said. The nature of Splice Machine’s customer base – one of its credit card customers runs its data center in an underground bunker protected by armed guards – precludes the cloud from being adopted more often.
In addition to enabling elasticity, the Kubernetes Ops Center supports Helm Charts, which allow customers to augment their Splice Machine environment with other capabilities. For example, a customer could package a new machine learning model or a Kafka queue as Helm Charts, and integrate them into Splice Machine via Kubernetes.
“The ability for them to add this componentry extremely quickly and to be managed within the same infrastructure–this is really creating a new level of agility that you didn’t have before,” Zweben said.
Kubernetes is a hot technology at the moment, but it’s just one piece of the puzzle in Splice Machine’s big game. The San Francisco company’s end goal is delivering an AI platform that can do all “three legs of the stool” – transactional, analytical, and machine learning workloads – and thereby enable smaller companies to succeed with AI.
“There’s too many moving parts today for AI to really be brought into the world at scale,” Zweben said. “Right now you still have leaders building AI system, not your traditional companies, in production. Operationalizing it has been too hard. We’re democratizing it. That’s why we put these components together to make it easy to scale for AI.”
Splice Machine was born in the days of Hadoop, and uses some of the same underlying data processing engines that were distributed in that platform. But Splice Machine has surpassed the capabilities of that earlier platform by ensuring tight integration with those engines in support of its customers enterprise AI initiatives, not to mention elastic scaling via Kubernetes.
The way that Splice Machine engineered HBase (for storage) and Spark (for analytics), and its enablement of ACID capabilities for SQL transactions, are core differentiating factors that weigh in Splice Machine’s favor for being a platform on which to build real-time AI applications, according to Zweben.
“Doing table scans as the basis of an analytical workload is abysmally slow in HBase, and so, in Splice Machine, we engineered at a very low level the access to the HBase storage with a wrapper of transactionality around it, so you’re only seeing what’s been committed in the database based on ACID semantics,” Zweben explained.
“That goes under the cover at a very well-engineered level, looking at the HBase storage and grabbing that into Spark dataframes,” he continued. “We’ve engineered tightly integrated connectivity for performance. I don’t think anybody is going to be able to do that easily without the same level of effort that we put into it, especially being transactionally consistent with ACID compliance, like Splice Machine is.”
Splice Machine holds patents on the work, which took years to develop, and it’s being well-received by companies in financial services, healthcare, retail, government, and other sectors. The new Kubernetes operator doesn’t necessarily help with the core database development effort, but it definitely helps with managing the whole kit and caboodle in support of AI.
And, of course, Kubernetes enables that pause button, which is a big deal when running this stuff in the real world.
September 29, 2020
- PyTorch / XLA now generally available on Cloud TPUs
- Data Science to Accelerate Drug Discovery with Artificial Intelligence and Machine Learning, Says Frost & Sullivan
- DDN Tops the Ratings in Intersect360 User Survey for Technical and Operational Satisfaction and Future Vision for Storage
- New Denodo Platform 8.0 Accelerates Hybrid/Multicloud Integration, Automates Data Management with AI/ML, and Boosts Performance
- Intel Enters into Strategic Collaboration with Lightbits Labs
- Pepperdata Announces Query Spotlight Now Supports Apache Impala
- Oracle Helps Marketers Simplify the Management and Activation of Customer Data
- Datadobi Launches Pre-Migration Assessment Service
- Signals Analytics Awarded Wide-Ranging Patent Grant for Automatic Extraction of Information from Unstructured Data Sources
September 28, 2020
- Cohesity Announces Automated Disaster Recovery that Minimizes Application Downtime and Data Loss
- DataStax Co-Founder and CTO Jonathan Ellis to Keynote at ApacheCon 2020 on Open Source in the Cloud Era with DataStax Astra and Apache Cassandra
September 25, 2020
- PostgreSQL 13 Released: Performance Gains, Space Savings, Enhanced Security, Developer Experience
- WANdisco Announces Global Agreement with Infosys to De-Risk and Accelerate Data Lake Migration to the Cloud
- Matillion Partner Ecosystem Identifies Trends Driving Data Transformation Market
- TIBCO Simplifies Data Unification With TIBCO Any Data Hub
- Trifacta Named Leader in G2’s Fall Grid Report for Data Preparation
- Seagate’s New Solutions Equip Enterprises for the New Data Economy
September 24, 2020
- Spectra Logic Announces Industry’s First Tape Library to Store One Exabyte of Uncompressed Data Leveraging LTO-9 Technology
- QDA Miner 6 Powers Businesses with New Qualitative Analysis Capabilities
- Cambridge Semantics Appoints Brian D. Owen as Chief Executive Officer
Most Read Features
- How Facebook Accelerates SQL at Extreme Scale
- Big Data File Formats Demystified
- 10 Big Data Statistics That Will Blow Your Mind
- VC Ben Horowitz Dishes on Hadoop, AI, and Data Culture
- Microsoft Now Developing Its Own Hadoop
- How to Build a Better Machine Learning Pipeline
- The CDO’s Role in Leading Data-Driven Transformation
- How the Coronavirus Response Is Aided by Analytics
- The Future of Labor in an AI World
- Is Python Strangling R to Death?
- More Features…
Most Read News In Brief
- Snowflake to Make it SNOW on NYSE
- Aerospike Gives Legacy Infrastructure a Real-Time Boost
- Google Joins the MLOps Crusade
- A ‘Breakout Year’ for ModelOps, Forrester Says
- Snowflake Pops in ‘Largest Ever’ Software IPO
- New AI Tool Maps the Families of the Bible, A Song of Ice and Fire
- Microsoft Launches Spatial Analytics, Other AI Services at Ignite
- Air Force Expands Predictive Maintenance
- Cassandra Gets an Indexing Upgrade
- Fivetran Launches Pay-As-You-Go Option for ETL
- More News In Brief…
Most Read This Just In
- Monte Carlo Raises $16M to Build the World’s First Data Reliability Platform
- Talend Introduces Industry-First Measure of Data Health to Bring Clarity and Confidence to Every Business Decision
- Tabor Communications, Inc. Announces Expansion of the Editorial Team
- ScyllaDB Unveils One-Step Migration from Amazon DynamoDB to Scylla NoSQL Database
- IBM Cognos Analytics-Based Business Transformation Going Strong
- Tamr Data Mastering Platform Now Available on Microsoft Azure
- Scality RING8 on All-Flash Delivers File and Object Storage Performance 10x Faster Than Competitive Solutions
- The Joint Network Center Protects and Archives Scientific Data for Max Planck Society with Quantum
- Domino Data Lab Named a Leader in Notebook-Based Predictive Analytics and Machine Learning Evaluation by Global Research Firm
- Yugabyte Announces Speaker Lineup for Distributed SQL Summit 2020
- More This Just In…