Inside Hitachi Vantara’s Very Ambitious Data Agenda
FPGAs and object storage systems. NVMe storage and ML model management. IoT edge computing and converged server infrastructure. It would be a big understatement to say that Hitachi Vantara has a lot going on, but it also might be why the company is so interesting to watch, and why you might want to keep an eye on it, too.
Last week, the wholly owned subsidiary of the Japanese industrial giant swung through San Diego, California, where it hosted its Hitachi NEXT 2018 user conference. The show served as the one-year anniversary of the creation of Hitachi Vantara, which emerged from Hitachi Data Systems in late 2017 (you’ll remember that it bought big data analytics firm Pentaho in 2015).
But with more than 10,000 customers and 7,000 employees, Hitachi Vantara is far from a startup. In fact, the company already counts 85% of the Fortune 100 as customers, which gives it the advantage of being an incumbent, particularly among big industrial firms, where it’s just as likely to compete with the likes of GE and Siemens as IBM or Oracle.
Hitachi Vantara made several announcements at the show, which just barely touch the surface of the company’s expansive offerings. The product news includes:
- The launch of a new member of its Lumada analytics solution to help companies in transportation, manufacturing, and energy better harness data to optimize the care and feeding of big pieces of equipment, such as compressors and turbines (which are manufactured by the parent company), no matter where that data might reside;
- A new release of Hitachi Enterprise Cloud (HEC) Container Platform, which is built on Kubernetes, VMware, and Mesosphere tech to let customers move workloads among clouds and on-premise servers, like its own Hitachi Unified Compute Platform (UCP).
- A new all-flash hyperconverged system, the UCP HC V124N, which can house up four Intel 3D XPoint-based Optane SSDs and eight capacity-optimized NVMe SSDs. The 72 TB system offers a 3x increase in IOPS and 4x reduction in latency compared to Hitachi’s previous all-flash HCP HC system, the company says.
- A new release of its Smart Data Center service to help companies run their data centers more effectively through big data analytics. With this release, the company is adding AI smarts to its cooling capabilities, with the end goal of saving money while avoiding server meltdowns.
To be sure, these announcements don’t begin to show everything that Hitachi Vantara has to offer. During his keynote address Thursday, Hitachi Vantara vice president of portfolio marketing John Magee (formerly the CMO for GE Digital’s Predix platform) gave the low down on the company’s aggressive plans.
“We think about our mission being to innovate in our portfolio to help you drive more of that innovation with data,” Magee said. “It starts with core product development, hardware and software engineering, all the things we do to collaborate. And increasingly we’re driving more synergies between different domains in our portfolio.”
Magee differentiated between operational technology (OT) and information technology (IT). That OT focus isn’t surprising, considering the parent company’s industrial roots in manufacturing everything from air conditioners and plasma televisions to nuclear power control systems and hydraulic excavators. But that unique operational vantage – combined with the $2.9-billion annual R&D budget of its parent company and a patent collection that exceeds IBM’s on the IT side – could give Hitachi Vantara an edge in transferring data from potential energy into actionable energy.
Speaking of energy, Hitachi Vantara did itself a big favor by hiring Bill Schmarzo away from Dell EMC earlier this year. During his high-octane keynote, the author of “Big Data MBA” stressed the importance of seeing the big picture in big data and not getting distracted by a never-ending sea of technologies.
“We need to re-frame the conversation. You have the assets. You got the sombrero full of peanuts, but yet you’re so busy staring at the tires figuring out where you’re going, you’re not looking down the road to see where you should be going,” said Schmarzo, who’s official title at Hitachi Vantara is CTO of IoT and analytics.
Most companies rate very low on the big data business model maturity index, which Schmarzo created with his grad students at the University of San Francisco. “It’s not a measure how good you are with Hadoop. It’s not a measure of how good you are with TensorFlow or Kafka,” he said. “It’s not a technology problem. It’s an economics opportunity.”
The big question, of course, is how Hitachi Vantara is going to drive those economics outcomes, and technology of course is going to play a big role. And in that regard, the company is doing some really interesting work.
For example, the company is looking to take advantage of its parent company’s expertise in creating LiDAR cameras for its smart cities offerings. LiDAR is often used in autonomous driving applications, but the cameras are quite expensive. By working with its parent company, Hitachi Vantara is looking to embed analytic capabilities inside of a much less expensive LiDAR camera, which could open up new smart city opportunities, the company’s smart city marketing director Justin Bean told Datanami.
Video is the biggest data of all, and it’s also the perfect data type to squirrel away inside of Hitachi Content Platform, which is the object storage system used by 2,600 customers. HCP, along with its IoT twin, HCP Anywhere, are horizontal technologies that figure to play prominently across Hitachi Vantara’s offerings, particularly as data is collected at the edge to feed real-time decision making.
A year ago, the company unveiled Content Intelligence, which is a collection of solutions that will allow customers to do more with data as it’s flowing into HCP. According to Scott Baker, the company’s senior director of content and data intelligence strategy, Content Intelligence can work as a data quality gateway, triggering data transformations to run when certain types of data arrive, kicking off workflows specific to GDPR compliance, or calling sentiment analysis routines developed with machine learning algorithms.
With so many products in the mix, keeping up with the capabilities can be daunting, acknowledged Baker. In many cases, it comes down to maintaining access to underlying tools and technologies while Hitachi Vantara plays out its goal of developing pre-built IoT solutions, particularly for larger firms with bigger regulatory burdens.
“Striking a balance between where the company is trying to go from an IoT perceptive versus where a lot of the emphasis is from an engineering and corporate focus on infrastructure side, I would tell you that we’re going to continue to run both sides of the fence,” Baker said.
Data science is critical to a lot of what Hitachi Vantara is doing, from Content Intelligence in the object store to the Lumada solutions, and everywhere in between. But the company is also selling data science tooling, and that’s where its Pentaho offerings come in, particularly as it pertains to machine learning model management. Mark Hall, one of the original creators of the Weka machine learning framework, heads up Hitachi Vantara’s efforts in getting predictive models into the real world.
“Having been involved in data science for many years, nothing causes more disillusionment to the data scientists than watching the fruits of their model-building labors die a slow and painful detach while navigating the bureaucratic, red tape-laden pathway from the development lab to the deployment environment,” Hall said during his keynote address. “Without a seamless and rapid process for taking or transitioning those predictive models from development to production, there are ample opportunities for the data science to be ‘lost in translation.'”
To be sure, Hall is bullish on the prospects of his Java-based Weka framework helping to make data scientists productive. But he’s even more excited about a free new download from Pentaho called the Pluggable Machine Intelligence (PMI) that’s designed to allow Pentaho to manage the data science models no matter where they’re created, including Weka or Python, R, Spark ML, and other data science products.
“When you combine this with things like automated champion-challenger, model-swapping, and automated building and refreshing of the models,” said Hall, the machine learning architect, “it maximizes the predictive accuracy of the solution while reducing the load on your limited data science resources.”
Bob Madaio, Hitachi Vantara’s vice president of infrastructure solutions marketing, provided a glimpse of new technologies that are coming down the pike. In particular, the work with field programmable gate arrays (FPGAs) provides the potential to get more value out of data.
“What’s going on right now in the labs is we’re looking at a common FPGA platform,” he said. “We’re working to develop common algorithms that work with the FPGA but are much more transferable to workloads.”
Having a common FPGA platform would “accelerate insights on many different analytic opportunities,” Madaio said. “It also means as we’re seeing data gravity pull compute to the edge, we’re going to vastly accelerate what we can do with edge devices with less physical hardware.”
IoT data is predicted to surpass 40 zettabytes by 2025. In an effort to prevent that from being unmanageable, Hitachi Vantara is researching neural network-based storage techniques to provide a form of intelligent compression.
“What this research basically says is machine learning can track the kinds of data, improve how it can be reduced, and stored more efficiently for different types of data,” Madaio said. “We think this is a significant [technology] when we can take the learning of these algorithms and push them across the portfolio.”
Hitachi Vantara views IoT as an integral component of future applications, and it’s making the research bets and developing the frameworks now to make that a reality.
“It’s about continual learning, not just continual development,” Magee said. “These models have to get smarter and smarter. And that means the data pipeline has to go from this project in the back room with the data scientists to this real -time flow of data. That’s a big challenge, it’s a big opportunity, and we’re very focused on how to make that a reality going forward.”
Related Items:
Orchestrator Emerges to Speed ML Models to Production
Gartner: Top Storage Vendors Taming Unstructured Data
Hitachi Adds Enterprise Search to Object Store