Splicing a Pause Button into Cloud Machines
Splice Machine develops a machine learning-enabled SQL database that is based on a closely engineered collection of distributed components, including HBase, Spark, and Zookeeper, not to mention H2O, TensorFlow, and Jupyter. Customers use it to build complex AI apps that include transactional, analytical, and ML components. The company just announced a Kubernetes operator for customers running in private cloud environments. So what’s CEO Monte Zweben’s favorite new feature?
The pause button.
“How about that pause button?” Zweben said during a demo of Splice Machine’s Kubernetes Ops Center. “When you pause on Splice Machine, it drains Kubernetes nodes and makes them available for other applications to use.”
Support for Kubernetes is not new at Splice Machine. The company relied on Mesos for some time before pivoting to Kubernetes a couple of years ago. Since then, the company has used K8S to manage customer environments as part of its software as a service (SaaS) offering). Now with Kubernetes Ops Center, which was unveiled last week, customers running the platform on their own gear in their own data center (or in a private cloud) can also leverage Kubernetes to maximize their compute resources.
The pause button is placed prominently at the top of the Kubernetes Ops Center screen. When pressed, it instructs the Kubernetes distribution (Rancher and OpenShift are currently supported, with more on the way) to essentially put Splice on ice and prevent it from consuming any more resources.
This is a big deal considering the amount of resources that customers are wasting in the cloud. A report issued last week by Pepperdata, a provider of tuning solutions for big data applications, found that big companies were wasting millions of dollars, and that even smaller companies could save hundreds of thousands of dollars by tuning their applications (in particular, Apache Spark) to make better use of cloud resources.
Hitting the pause button in Splice Machine is one way to achieve savings.
“I think it’s a powerful thing that we’re offering on premises,” Zweben says. “Even on prem, if you’ve got a small set of virtualization going on, if you can pause and give up your resource to another user, that’s pretty powerful.”
The pause button is pressed frequently for the AWS cluster that Splice Machine uses for its demos. Before getting on a call with a prospect or a journalist, Zweben hits the restore button, and the cluster quickly comes back online. “If we’re not demoing this cluster, why pay for the infrastructure?” Zweben said. “I just checked in five to 10 minutes before we talked and I hit the restore button and it comes back, just like it was.”
Zweben couldn’t put a dollar amount on the savings, but says that they are substantial. “It is more than 50% savings when you’re shutting a cluster off overnight,” he said. “We do that on our trials. We have an automatic trial mechanism, where you can come to Splice, and get it for a few weeks for free. If somebody is not active during their trial, we just auto-pause it.”
With Kubernetes running herd on compute resources, Splice Machine is free to concentrate on more important things, like ensuring that all the complex distributed components function as a seamless unit.
“All of the Splice Machine clusters have that elasticity where you can turn it off, and it basically doesn’t consume resources,” Zweben said. “The ability to separate storage and compute in that way saves an enormous amount of money.”
The split between on-prem and cloud customers is roughly 50/50 for new accounts, Zweben said. The nature of Splice Machine’s customer base – one of its credit card customers runs its data center in an underground bunker protected by armed guards – precludes the cloud from being adopted more often.
In addition to enabling elasticity, the Kubernetes Ops Center supports Helm Charts, which allow customers to augment their Splice Machine environment with other capabilities. For example, a customer could package a new machine learning model or a Kafka queue as Helm Charts, and integrate them into Splice Machine via Kubernetes.
“The ability for them to add this componentry extremely quickly and to be managed within the same infrastructure–this is really creating a new level of agility that you didn’t have before,” Zweben said.
Kubernetes is a hot technology at the moment, but it’s just one piece of the puzzle in Splice Machine’s big game. The San Francisco company’s end goal is delivering an AI platform that can do all “three legs of the stool” – transactional, analytical, and machine learning workloads – and thereby enable smaller companies to succeed with AI.
“There’s too many moving parts today for AI to really be brought into the world at scale,” Zweben said. “Right now you still have leaders building AI system, not your traditional companies, in production. Operationalizing it has been too hard. We’re democratizing it. That’s why we put these components together to make it easy to scale for AI.”
Splice Machine was born in the days of Hadoop, and uses some of the same underlying data processing engines that were distributed in that platform. But Splice Machine has surpassed the capabilities of that earlier platform by ensuring tight integration with those engines in support of its customers enterprise AI initiatives, not to mention elastic scaling via Kubernetes.
The way that Splice Machine engineered HBase (for storage) and Spark (for analytics), and its enablement of ACID capabilities for SQL transactions, are core differentiating factors that weigh in Splice Machine’s favor for being a platform on which to build real-time AI applications, according to Zweben.
“Doing table scans as the basis of an analytical workload is abysmally slow in HBase, and so, in Splice Machine, we engineered at a very low level the access to the HBase storage with a wrapper of transactionality around it, so you’re only seeing what’s been committed in the database based on ACID semantics,” Zweben explained.
“That goes under the cover at a very well-engineered level, looking at the HBase storage and grabbing that into Spark dataframes,” he continued. “We’ve engineered tightly integrated connectivity for performance. I don’t think anybody is going to be able to do that easily without the same level of effort that we put into it, especially being transactionally consistent with ACID compliance, like Splice Machine is.”
Splice Machine holds patents on the work, which took years to develop, and it’s being well-received by companies in financial services, healthcare, retail, government, and other sectors. The new Kubernetes operator doesn’t necessarily help with the core database development effort, but it definitely helps with managing the whole kit and caboodle in support of AI.
And, of course, Kubernetes enables that pause button, which is a big deal when running this stuff in the real world.
April 20, 2021
- Cape Privacy Raises $20 Million in Series A Funding For Encrypted Learning Platform
- TruSTAR Introduces API 2.0, Operationalizing Data Orchestration and Normalization
- Redis Labs Ushers the Real-Time Era with Redis as a Data Platform
- SC21: Introducing the [email protected] Data Science Competition
- Netreo Expands APM Capabilities with Strategic Acquisition of Stackify
- Starburst Empowers Tableau Users with Real-Time Data Querying for Improved Analytics
- Dataiku Announces Strategic Investment from Snowflake
- Filebase Raises $2M Led by Multicoin Capital to Unify Decentralized Storage Networks
April 19, 2021
- Tencent Cloud Makes Metrics and Data Monitoring More Efficient Through Integration with Easy-to-Use Grafana
- Novel Use of 3D Geoinformation to Identify Urban Farming Sites
- Loft Labs Open-Sources Virtual Cluster Technology for Kubernetes
- GoodData Launches Cloud-Native Platform as First Step in New Data as a Service Category
- Druva Secures $147 Million Investment to Extend Market Leadership
- Unity Dramatically Reduces AI Training Time, Budgets with Launch of Synthetic Datasets
April 16, 2021
- Chain.io Secures $5M Funding Round for Cloud-Based Supply Chain Integration
- Rockset Enables Real-Time Analytics for MySQL and PostgreSQL
- Alluxio Improves Interface Support to Simplify Onboarding of Data Driven Applications
April 15, 2021
- Tecton Unveils Major New Release of Feast Open Source Feature Store
- AWS Launches Free Course on Machine Learning For Business Leaders
- New FORMULA 1 Insights Powered by AWS Will Help Fans Make Sense of Split-Second Decisions
Most Read Features
- Big Data File Formats Demystified
- A ‘Glut’ of Innovation Spotted in Data Science and ML Platforms
- Synthetic Data: Sometimes Better Than the Real Thing
- Who’s Winning In the $17B AIOps and Observability Market
- He Couldn’t Beat Teradata. Now He’s Its CEO
- Why Data Science Is Still a Top Job
- Cloud Data Warehousing: Understanding Your Options
- Is Python Strangling R to Death?
- Big Data Predictions: What 2020 Will Bring
- A Nutrition Label for AI
- More Features…
Most Read News In Brief
- Data Prep Still Dominates Data Scientists’ Time, Survey Finds
- AWS Adds Explainability to SageMaker
- Global DataSphere to Hit 175 Zettabytes by 2025, IDC Says
- Databricks Edges Closer to IPO with $1B Round
- The Union of Salesforce, Tableau Yields Hybrid ‘Business Science’
- Domo Gets the Lead Out with a ‘Palooza
- Esri Simplifies Developer Access to Location Data with ArcGIS Platform
- CDOs Must Shift to Offense, Survey Finds
- Fiverr Adds Data Science Recruiting Category
- The AI Inside NASA’s Latest Mars Rover, Perseverance
- More News In Brief…
Most Read This Just In
- Moody’s Analytics Wins Award for Best Use of AI in Banking or FinTech
- Aiven Raises $100M Series C to Expand Global Open Source Innovation
- Alluxio Advances Analytics and AI with NVIDIA Accelerated Computing
- AWS Announced Strategic Partnership with Hugging Face NLP Startup
- GrafanaCONline Returns June 7-17, CFP Is Open Now
- y42 Raises $2.9M to Provide a Scalable and Affordable Data Stack to Companies of All Sizes
- Domino Data Lab Debuts New Solutions with NVIDIA to Enhance the Productivity of Data Scientists
- ThoughtSpot Acquires SeekWell to Operationalize Analytics, Push Cloud Data Insights to Business Apps
- Alteryx Global Inspire 2021 Conference to Showcase New Products in Analytics and Data Science
- Trifacta Announces Industry’s First Data Engineering Cloud
- More This Just In…
Sponsored Partner Content
May 4 - May 5
May 13 @ 11:00 am - 12:30 pm