Too many big data initiatives are science projects that take months of effort, risk failure and require highly trained data scientists with scarce skills. According to a CSC survey, 55 percent of big data projects aren’t completed and many others fall short of their objectives.Read more...
GPUs Push Big Data’s Need for Speed
As we noted on Monday during our first report from the GPU Technology Conference in San Jose, graphics processor giant, NVIDIA is diving into big data in a formal way with some core announcements around new functionality, programming options and now use cases for data-intensive enterprise apps.
During his keynote, NVIDIA CEO, Jen-Hsun Huang delved into more detail about how GPU computing is invading the realm of commercial and web-scale apps.
Huang noted that when it comes to large-scale enterprise and mobile applications, using “best effort” approaches to performance (namely vanilla datacenters) just won’t cut it. Users are demanding services based on highly complex algorithms fed by constant streams of ever-changing data. Further, in many cases, especially for things like social media analysis, it doesn’t make much sense to bother storing the data—it needs to be quickly ingested and barfed out in real-time for instant analysis.
To highlight how big data is being accelerated by GPUs, one of the company’s superstar data-intensive application use cases, audio recognition service, Shazam, revealed how they are able to snip their time to results, extend their services and increase efficiency.
Shazam boasts 300 million users and is adding two million more each week. This ever-growing group of users hits the service with roughly 10 million requests to identify songs based on an audio “fingerprint” that scans a database of over 25 million songs to find the singular match. As if that process isn’t enough, the company wants to make sure they can turn over results before people get tired of waiting.
While it’s much more entertaining to simplify what the folks at Shazam are doing on the big data front, their operations grow in complexity with added scale. However, adding GPUs into the mix cut their costs down by one-third, says the company’s CTO, Jason Titus. Further, they’re able to cut down the processing time, even with the addition of new content and users.
The idea of using GPUs to boost performance of massive systems is catching on for some of the biggest of data problems (not to mention computational challenges). The top supercomputer on the planet, Oak Ridge National Lab’s Titan, is a 18,688-node GPU powerhouse, with each node boasting one of the Opteron 16-core CPUs, 32 GB of memory, and of course, an NVIDIA Tesla K20x GPU tucked in to push the system to almost 300,000 cores.
While the large majority of companies running big web-scale operations that power apps or analytics aren’t likely to plunk down the $80 million for a cluster of that magnitude (and imagine power and cooling), they are looking to GPUs to add performance while increasing overall efficiency with the hope that their investment in GPUs will pay off in terms of new capabilities and operational savings.
But then again, even with a major investment in hardware, it’s not as simple as snapping in a few graphics cards—the porting process can be a bit of challenge, although the company is working to push a new generation of big data developers into its CUDA framework, partnering to bring CUDA support to more mainstream languages, including Python.
NVIDIA has big plans for some of its new architectures. While some are more relevant to the mobile, gaming and general consumer space, Kepler and its successors could find a soft spot in the big data world–especially once the developer community is engaged.
On that note, take a look at the company’s roadmap for GPGPU computing–and the ecosystem as a whole.
More on the big data and GPU angle coming this week…