Why Every Python Developer Will Love Ray
There are many reasons why Python has emerged as the number one language for data science. It’s easy to get started and relatively forgiving for beginners, yet it’s also powerful and extensible enough for experts to take on complex tasks. But there’s one aspect of Python that has bedeviled developers in the big data age: Getting Python to scale past a single node. Solving that dilemma is the number one goal of Project Ray.
The name “Ray” will ring a bell if you’ve been following the goings-on at RISELab, the advanced computing laboratory formed at UC Berkeley. As the follow-on to AMPLab, which gave us Spark, Tachyon (now Alluxio), and Mesos, there are certain expectations for impactful technologies to flow from RISELab. So far, Ray is the top candidate to achieve escape velocity and thrive in the open source world.
Ray was created by two grad students in the RISElab, Robert Nishihara and Philipp Moritz, as a development and runtime framework for simplifying distributed computing. Under the guidance of UC Berkeley computer science professors Michael Jordan and Ion Stoica, Ray has progressed from an academic research project into an open source software project with about 200 contributors from 50 companies While Ray isn’t widely known and still in emerging onto the scene, it was deemed solid enough by Ant Financial, an affiliate of Alibaba, to put it into production for fraud detection.
In an interview with Datanami, Nishihara describes the goals of the Ray project:
“What we’re trying to do is to make it as easy to program the cloud, to program clusters, as it is to program on your laptop, so you can write your application on your laptop, and run it on any scale,” Nishihara says. “You would have the same code running in the data center and you wouldn’t have to think so much about system infrastructure and distributed systems. That’s what Ray is trying to enable.”
Ray’s General Purpose
Part of the problem with the current bunch of computational frameworks, particularly in the modern era of neural networking, is that the frameworks are specialized and don’t play well with others. For example, to develop the AlphaGo program that turned so many people’s heads several years ago, Google would have had to cobble together many different frameworks, or build a new framework from scratch.
“The challenge there is that you not only have neural networks that you have to scale up, but you also have these simulations that you need to scale up,” Nishihara says. “So the simulations are something that would be a good fit for Spark –very data-parallel stuff. But Spark wouldn’t be good for the machine learning and training part. For the machine learning and training part, TensorFlow would be a great fit, but then TensorFlow is not really designed to scale up these simulations.
“So you have these two specialized systems that are each good for part of the application, but they’re a little too specialized to encompass the whole thing,” he continues. “Ray is trying to be a general purpose system to really make it easy to scale up arbitrary Python code. That includes the machine learning part, where we could use TensorFlow, and it also includes the simulation part, where we can just scale up arbitrary Python functions.”
Ray can run on a cluster manager like Kubernetes or it can be installed on bare metal. In either case, it’s Ray’s job to ensure that applications can be run in a distributed manner, with all the intra-node communication, data transfer, and resistance to failure that distributed computing requires.
To read more about how that works, check out the 2018 research paper authored by Nishahara, Moritz, and their colleagues, titled “Ray: A Distributed Framework for Emerging AI Applications.”
So far, Nishihara has tested Ray on a cluster with 10,000 cores. But that shouldn’t be taken as the ceiling for Ray. “In principle, it should go larger than that,” he says.
In addition to core Ray, the plan calls for creating a subset of Ray libraries that are designed to accelerate development of certain types of applications. Currently the Ray project offers two “very high quality” libraries that can be used for hyperparameter tuning library and reinforcement learning, Nishihara. The libraries for distributed training, model serving, and data processing are still in the pipeline.
There’s also a stream processing library being developed atop Ray. This work is being spearheaded by Ant Financial, a subsidiary of Alibaba. Nishihara says the Ray team has validated that it works.
Ray’s Ready for Use
While some of the more advanced use cases are still in their nascent stages, the core element of Ray is ready to go. The Ray code is available from GitHub under an Apache 2 open source license. And although the framework is really targeting developers who code data science and machine learning applications in Python, Ray also will support Java, Nishihara says.
With Moore’s Law not delivering the computational speedups that it did in the past, Ray could provide a power boost for developers looking to scale, but without the need for advanced skills in distributed computing.
“It’s not like they have to relearn everything they know about computer science and software,” he says. “If they have some Python that they have already written and now they want to scale it up, they should be able to download Ray and with a few small modifications to their applications – in a lot of cases, not really having to reorganize things — be able to just scale it up.”
With just a few minutes of time, Ray can take a Python application from a laptop and run it at scale on a distributed cluster. “You can tell your engineers to just build their applications the way they would, without even thinking about scale, just build the application and get it working on laptops,” he says, “and then afterwards they can spend half an hour to an hour to scale it up with Ray.”
The future is bright for Ray. In the coming months, Nishihara and his colleagues will be launching a company around the framework, called Anyscale. Stay tuned to Datanami for news about that.