The Past and Future of HPDA: A Q&A with Steve Conway
Steve Conway, a senior adviser at Hyperion Research, has been watching the big data market evolve and transform for well over 10 years. You will remember that it was Conway, while at IDC, who coined the term “high performance data analysis,” or HPDA, to reflect the confluence of big data with HPC hardware. So how has the HPDA market changed?
Conway, who has been a longtime contributor to Datanami, recently sat down to discuss his views on the market and where it might go from here. This is a lightly edited transcript of that conversation.
Datanami: Steve, thanks for joining us. When you founded the HPDA practice at IDC back in 2012, big data was all the rage. But so much has changed since then. Does the term “big data” still mean anything?
Steve Conway: It means something, but it doesn’t have the punch that it did. I tend to use the umbrella term, data-intensive computing, just because people often associate AI only with the analytics side of things, and forget that, for nearly all of the most important emerging AI use cases, simulation is almost as important as analytics.
So I tend to use data intensive computing, which can mean that the methodology is simulation or the methodology is one of the analytics, either one of the learning models – machine or deep learning, or graph analytics or one of the more semantic analysis [techniques].
DN: Analytics has changed quite a bit, but simulation and modeling have been around for a while.
SC: Analytics has been around as long as simulation, just after World War II. It’s been around for a long time too and was originally the province of the intelligence community and the Department of Defense. And then in the 1980s it was adopted by the commercial market, mainly the financial services industry. And for a long time that was pretty much the only other private sector market that heavily used analytics as opposed to just simulation, until fairly recently.
DN: Data volumes continue to grow, and analytic techniques have evolved a bit. So what has fundamentally changed?
SC: The combination of increasing requirements and the advance of information technology–those two things going together have made it much more feasible to tackle the high data volumes. What you saw in the 1990s in particular was all this talk about the information age and turning big data volumes into information, into usable, actionable information. But that turned out to be, with few exceptions, pretty much just talk because the technology wasn’t there yet to enable it. It just wasn’t.
What you had back then was the classic three-tiered, client-server architecture, where the analytical engines was off to the side. It wasn’t part of the workflow. It was off to the side, and it was only entrusted with sample data. It wasn’t entrusted with live data in any way. It was really more of an experiment.
But advances in technology, particularly led by supercomputing…has brought to the picture [what] was really needed. One thing is called 40 years of experience with this thing called parallel processing. And another thing was networking technology inside the computer in particular that enable very fast data rates.
And then memory subsystems that are very different than what you find in commercial enterprise data centers, memory subsystems that can hold a lot of data, that can hold a big data problem in memory. But the minute it spills out of memory, the performance falls off a very deep, sharp cliff
The advances of technology have coincided with lots more data sources. And on the commercial side, requirements for large global companies, have, for the first time, pushed up into the HPC competency space, meaning that enterprise server technology and enterprise data centers by themselves can’t handle it anymore. They’re not designed to.
That’s how we originally got into it. Not because we were so smart, but because we got woken up. In 2009, within a two month period, three [vendors] – SAP, SAS Institute, and Oracle – all called us, out of the blue – we never had dealt with them before – and they essentially said the same thing: ‘Hey we’ve a big problem. Some of our biggest and most important customers, we’re not going to be able to satisfy them anymore using enterprise server technology. It really isn’t up to the task. This crisis is two-to-three years away, but here it comes.’ And so we need to learn about HPC.
DN: What did they end up doing? Did they just parallelize the analytical workloads they were doing on SMP servers?
SC: These are new workloads they have to do to stay competitive…in enterprise computing that pushed them up to HPC.
One is credit card companies. In a very short period of time, typically 50 to 150 milliseconds, when somebody does a credit card swipe, they have to see whether the person has money to cover it, and they have to look for evidence of fraud.
Another very common type [is reflected in] a company that’s one of the worlds’ leading managers of rental properties, timeshares. They have an inventory of about 3 million timeshare properties worldwide. What they used to be able to do was consult all their branches around the world and the branch offices would compile data, for each property, on how popular has it been for the last quarter, how much they’re charging, what’s happening with comparable properties’ pricing in the area.
It was repricing exercise that was able to be carried out once a quarter. Now they have to do it five to six times per day to stay competitive.
The third type of problem…is sales analysis. What used to happen is that companies– and we’re really talking about large global companies for the most part – they would ask their data structure, who were the top 25 salespeople in France last quarter? And now for competitive reasons, what they have to do is say, who are the top 25 now and why? Did it have to do with who was their manager? Did it have to do with their territory and what was in their territory? Did it have to do with their mode of operations?
So they gather all that information for France, and then they spin all that up in the HPC system and it tells them, okay, here’s what you need to do to maximize revenue for the next quarter. You need to make these changes. But this stuff is leapfrogged. As soon as one company starts doing this, their competitors are pushed to have to do it also.
DN: You can’t just add more disk and servers to that problem. You have to fundamentally rethink the architecture.
SC: Yes and that’s what pushed them up to HPC. They’re very much like how a weather bureau in HPC has been, where they have to put out multiple forecasts per day. Instead of the 50 to 150 millisecond [response time], this is more like a couple of hours that they have to do that to stay competitive.
DN: That’s a different definition of HPC than how many people define it. How do you even define HPC anymore?
SC: It’s a moving target. It classically was defined as the fastest class of computer at any point in time, meaning that something that was HPC 20 years ago wouldn’t be today. It might be your cell phone by now. But now it’s more clearly characterized essentially based on the workflow requirements. Meaning, if an enterprise server can handle it, then it ain’t HPC.
DN: Looking back over the past 10 years, did things pan out like you thought they would? Were there any big surprises?
SC: The big surprises were which companies bought other companies. I look back on what I wrote 10 years or so ago, about big data and what would AI look like in the next 10 years, and so forth. At the risk of sounding self-absorbed, it pretty much was there. It was pretty clear, not because I’m a smarty, but because the lines were pretty clear. And particularly, as we all do, as you do, if you kind of wipe away the hyperbole as much as you can…
DN: How quickly is HPC in the cloud growing?
SC: It is growing rapidly. For about a decade, until about three years go, it had stayed at 8% to 9%. And then it jumped. And the reason why it jumped wasn’t because the users suddenly woke up to the cloud, it’s because the cloud suddenly woke up to the user.
DN: Was there anything in particularly why the clouds woke up to that realization at that point in time?
SC: It was the two-folded realization that the global high performance computing market was no longer a rounding error in the global IT market, that it was a number worth pursuing…about a $30 to $40 billion market.
For some of them, it was just as important that they saw that HPC had become indispensable at the forefront of R&D for the most economically important AI use cases, meaning that if they wanted to be leaders in AI, it was very important in particular in the HPC market, because that in general is where the mainstream market was going to be headed in not too many years.
DN: What specific use cases do you see driving the growth of AI?
SC: The important use cases, like automated driving, precision medicine, smart city, IOT, edge computing. If you’re doing research at the forefront of those fields, you’re using HPC.
DN: Does that mean that it’s becoming standardized?
SC: I’ll start by saying, AI is at a very early stage. There’s a long road ahead until we get anywhere near general AI that passes the Turing test and all of that. Because AI, even at this early stage, it can do lots of useful things having to do with voice and visual recognition mainly. They seem like tough problems, but in the scheme of things, they’re going to turn out, retrospectively, very easy problems. The tough problems are ahead.
So what really needs to evolve are things like inferencing capacity, which are the brains of the outfit in AI. That’s where most of the intelligence is located. There’s just a lot of opportunity, and challenges, to move AI forward. As useful as it is now, it really isn’t very far along, and there are lots of issues surrounding that too, on the ethics and transparency sides.
DN: We have come very far with AI and HPC, but there is so much work to do.
SC: The AI is not contextual yet. It’s specific. It’s very task-oriented. It’s Rain Man, idiot savant stuff. It’s very good at one thing. But it’s not even to the point yet where a computer or device knows what human eyes are…It doesn’t end up being one app. It’s like 10 apps in a box in order to do that task, and things aren’t connected yet in a contextual way.
DN: Your predictions from back in 2010 were prescient. Do you have any predictions for 2030?
SC: I think there’s going to be, and it’s already started, much more focus on the inferencing part of AI. So much focus has been on the training part, so [there will be] more focus on the inferencing part, as raising the IQ of the system. And raising the IQ…for the whole field of AI will be increasingly important and that’s already starting in the last couple of years.
DN: Steve, thanks for taking the time and sharing your thoughts with the Datanami audience.