Machine Learning: No Longer the ‘Fine China’ of Analytics, HPE Says
Machine learning has become a core component of companies’ analytic initiatives and is no longer the “fine china” only brought out for special occasions, according to a manager with Hewlett-Packard Enterprise, which today announced that its Vertica analytics database now runs popular classes of machine learning algorithms.
While previous versions of Vertica could run R algorithms — as opposed to shipping them off to run on a Hadoop cluster or another adjacent system — Vertica 8.0 will be the first version of the flagship columnar database that formally supports a broad collection of popular machine learning algorithms, according to Jeff Veis, vice president of marketing for Big Data Platforms at HPE (NYSE: HPE).
“It used to be niche, or maybe like fine china for special occasions, to use machine learning, and now it’s showing up as a must-have for almost all our customers,” Veis tells Datanami. “It’s becoming very important to do that form of advanced analytics. We brought that in-database so you can run it across your whole data set.”
Enabling Vertica 8.0 to run three popular machine learning algorithms–including K-means, logistical regression, and linear regression–directly on data stored in the columnar database will bring several benefits. Most importantly, Veis says, it will eliminate the need for Vertica customers to sample their source data and then to push those ML workloads to other systems.
“A lot of data scientists have their models, have their algorithms, sometimes have a lot of investment in it, but they have not been able to run that in [Vertica’s] database,” Veis says.” If your data scientist has their favorite R model of choice, they would be able to bring that in, or if you wanted to use a functionality like K-means, then you could develop a new model and work with it.”
While HPE isn’t backing away from SQL, the company acknowledges that a lot of emerging big data analytic workloads aren’t powered with SQL data engines. “You can use the models you want to use when you have more complex queries that go beyond SQL,” Veis says.
Over time, HPE plans to flesh out its machine learning capability with more algorithms and more capabilities for specific industries. “I think we have the essentials, but certainly our commitment is to build that out,” Veis says. “As you go to different verticals, whether it’s healthcare or retail, you’ll see a preference for more specialized machine learning algorithms, so our intention is to continue to add support and broaden it over time.”
HPE, which is holding its annual Big Data Conference in Boston, Massachusetts this week, envisions machine learning becoming more pervasive across the enterprise. Its new cloud-based offering, dubbed HPE Haven OnDemand (HoD) Combinations, is designed to make it easy for developers to tap into the power of machine learning via simple API calls.
“This is intended for a mainstream software that’s just developing enterprise or mobile apps and wants to bring in a facial recognition or enterprise search-based context or graph analysis or prediction [into their applications],” Veis says. “They don’t necessarily want to be a data science expert but they want to infuse that capability within their app.”
HoD Combinations is the latest iteration of Haven (Hadoop, Autonomy, Vertica, and Enterprise), a big data solution that was launched three years ago. HPE launched HoD earlier this year with about 50 API calls, and now it offers more than 70 with HoD Combinations.
“All the complex coding is eliminated,” Veis says. “You call this cognitive service with one API call, as opposed to using different APIs from different vendors and having to bounce all around, which increase latency and coding complexity. All that is pretty much eliminated. We’ve done the testing. We understand all the API junctures. We can let you know this is a fully chained API call.”
One early adopter of HoD Combinations is a dating app site that wanted to infuse facial recognition, which enables app users to confirm that the person on the other end of the video call is in fact who they claim to be. HoD Combinations will become generally available in about 60 days, but early adopters can request a trial now.
HPE made several other significant announcements at the show, including native support for geospatial data in Vertica 8.0. Veis says that Vertica customers want to process the geotagging and geofencing in the database itself. “We have enabled them to do that, which is incredibly important, especially when you’re supporting mobile apps, where knowing the geo-presence is so important to be able to respond.”
Vertica 8.0 also gains better Hadoop integration, including some “push down” processing of Vertica queries into Hadoop. Vertica already supported the ORC data format, thanks to its work with Hortonworks (NASDAQ: HDP). With version 8.0, Vertica gains support for the Parquet compression standard that’s favored by Cloudera.
“What that means is if you have ORC or Parquet or Apache HDFS data in your data lakes and also have a Vertica traditional deployment, you can now connect up to those data lakes, run queries on them, or do joins with that data,” Veis says. “You can do that with zero data movement. You don’t have to put Vertica on the nodes, and you don’t have to move that data to Vertica for Vertica to be able to query it…A lot of our customers are pretty excited about it because it really embraces the broader ecosystem.”
HPE also delivered “optimized” support for Apache Spark with this release of Vertica, the company says. This will improve the movement of data to and from Apache Spark, which some Vertica customers will use for building their machine learning models and running them on small-scale data, the company says.
Finally, HPE is now enabling customers to run Vertica on the Azure cloud from Microsoft (NASDAQ: MSFT). Previously it supported the AWS cloud from Amazon (NASDAQ: AMZM). Azure support will give customers more flexibility in deployments, Veis says.