Follow Datanami:
March 18, 2013

Python Wraps Around Big, Fast Data

Nicole Hemsoth

This week we’re at the GPU Technology Conference (GTC ’13) in San Jose taking a look at how GPU giant NVIDIA plans to push their vision for accelerated analytics and high performance computing applications.

A major part of NVIDIA’s “coming out” to the big data community is today’s announcement around the popular open source Python language, which is now being opened to CUDA developers who want to eek the last grains of performance out of their apps via GPU acceleration.

Python is finding its way into an ever-expanding set of use cases that fall into both the high performance computing and big data buckets. From powering mission-critical operations at NASA to a range of core functions in industry, retail and elsewhere, this is probably not a bad snake to ride into the next stage of the big data boom for any compiler or software support company.

The GPU giant teamed up with Continuum Analytics to deliver Python support to CUDA. This via its NumbaPro Python compiler, which is part of their larger Anaconda Accelerate offering. If you recall, this high performance Python suite caught mainstream eyes recently when DARPA announced a $3 million investment in the company to help them push more profound capabilities into NumPy and SciPy libraries for use in big data scientific, research and defense apps.

“Hundreds of thousands of Python programmers will now be able to leverage GPU accelerators to improve performance on their applications,” said Travis Oliphant, co-founder and CEO at Continuum Analytics. “With NumbaPro, programmers have the best of both worlds: they can take advantage of the flexibility and high productivity of Python with the high performance of NVIDIA GPUs.”

Wrapping around Python to get a stranglehold on the big data acceleration market isn’t a bad idea for NVIDIA, which is always trying to push developers into CUDA as a high performance parallel framework. They’re entering into an already rich ecosystem though– Python boasts something on the order of 3 million users and is being tapped for several new breeds of big data apps that require serious complexity but without all the knee-deep wading through thick code mud.

According to Roy Kim, NVIDIA’s product manager on their HPC-oriented Tesla side, the value prop for developers, their community, and of course, his own company is clear. The pairing of perceived productivity of Python with the performance enhancements of GPUs will offer incentive for new users to tap into GPU acceleration, he believes.

Kim contends that while this is still a relatively new area for big data developers who so often seem focused on the algorithms alone, adding a sizzle to the hardware could mean big things for applications that need speed at scale.

Accelerating at massive scale is really where GPUs shine, says Kim, pointing to their use in current big data and supercomputing environments, including some of the world’s fastest supercomputers (including Titan at Oak Ridge National Lab). He claims that adding the flexibility and ease of use of Python to this performance pool will boost CUDA and GPU use in scientific and enterprise big data environments.

To step back for a moment, it’s worth noting that the kickoff to the NVIDIA event happened with a big data bang, the first to come from NVIDIA, which has generally removed itself from tapping into the buzzy term. However, as their booming high performance computing division is finding more workloads that fit equally into the big data realm, it’s no surprise.

Now that some of the algorithm and application sides of the big data puzzle have been snapped together, the refreshed focus on hardware (specifically, fancy-schmancy accelerators) could trigger a new wave of systems designed to make ordinary big enterprise and web data believe its riding in an HPC cluster.

Pull in one of the easier to manage languages that has significant potential to further expand into HPC and big data given continual enhancements, and NVIDIA might have a nice performance package to present to the people that are pushing Python to its limits already.

Just as a side note, while Python adoption in HPC is still an uphill battle, when it comes to big data, it is being fed some meaty investments, both from within the open source community and from some of the vendors backing it. While that aforementioned $3 million from DARPA to extend Python’s length is a drop in their bucket compared to the agency’s massive $100 million effort to fund big data technologies, it’s giving it added legitimacy as a language primed for data-intensive app developers.

But anyway—we are hoping to run into some of the Python users running around San Jose at the show today and tomorrow. In the meantime we’ll continue our reporting from the GPU side of the big data fence. If any of you are here, I’m walking about during the show tomorrow and attending a few of the sessions that are a good fit for the big data-interested developer and end user. Feed me your story ideas… 🙂

Related Articles

How Facebook Fed Big Data Continuuity

Continuum Unleashes Anaconda on Python Analytics Community

Beyond Big Data: Addressing the Challenge of Big Applications

Datanami