From Amazon to Uber, Companies Are Adopting Ray
What if you could use your favorite machine learning library to code an AI program on your laptop, and have it automatically transferred to run in a massive, distributed fashion on the cloud? That’s the general idea behind Ray, the open source framework that is attracting a growing number of top-tier companies, including Amazon and Uber, both of which will be sharing their stories at this week’s Ray Summit.
Uber is a well-known user of big data, real-time analytics, and AI technology, and Uber engineers have also contributed their share of big data creations to the open source community. With that in mind, it’s interesting to note that Uber is betting heavily on Ray to be the distributed computing substrate for its next-generation machine learning environment.
“Uber has a new machine learning platform project called Canvas, and they decided to choose Ray to build this platform on,” says Ion Stoica, a UC Berkeley computer science professor and co-founder of Anyscale, the commercial outfit behind Ray and Ray Summit 2021, which is free and is taking place online June 22-24.
“They’ve also done quite a bit of work in porting some of the frameworks on top of Ray,” Stoica tells Datanami regarding Uber. “One is Horovod, the popular distributed training framework. XGBoost, probably the most popular non-deep learning library [is another one that Uber engineers have ported]….And they use Dask, which also run on top of Ray.”
The work to get Dask running on Ray was spearheaded by an engineer at Descartes Lab, the geospatial intelligence firm that spun out of Los Alamos National Lab in 2014. The company was having trouble getting Dask to scale and was looking for a solution.
“One of their engineers decided, hey, let me try running it on Ray to see if that can help it scale,” says Robert Nishihara, the CEO and co-founder of Anyscale and the co-creator of Ray. “And he wrote 200 lines of code just to get rid of the scaling bottlenecks.”
Once the work has been done to integrate a computing framework with Ray, then the benefits accrue to all subsequent users of that framework. The biggest benefit is scalability, as Ray automatically handles the nitty-gritty details of parallelizing applications that weren’t originally designed to run in a distributed manner. The Ray runtime also simplifies management of clusters.
“Once your library is working on top of Ray, now you can use all the other libraries,” Stoica says. “It’s as simple as another function call. It’s that simple. So that’s the real power.”
Another way to think about Ray is that it enables an “infinite laptop,” Nishihara says.
“Everything is moving to the cloud, and distributed computing is becoming more and more important,” Nishihara says. “[But] there’s a huge gap between what it takes to write a program on your laptop and what it takes to write a scalable program that runs across hundreds of machines. The latter takes a huge amount of expertise.
“We are trying to make it so if you know how to program on your laptop, then that’s enough,” he continues. “Then you can take advantage of all the cloud resources, with the familiar developer experience of programming on your latop. You don’t have to be an expert.”
There are two general buckets that Ray integrations fall into: shallow and deep integration. Shallow integration, which accounts for about two-thirds of the existing community libraries that have been integrated with Ray, provides the fastest return on investment, as it requires only about a day’s worth of code changes (not counting QA testing).
“Shallow maybe sounds bad, but it’s actually a good thing because it’s simpler,” Nishihara says. “You get all of the benefits from a very simple integration. It’s easy to maintain and it’s lightweight.”
Deeper integration is required to squeeze all of the benefits of running atop Ray, particularly when it comes to performance and stability, Stoica says. The Dask integration ultimately was completed at the deep level.
Ray Summit attendees will hear about a Dask on Ray deployment at Amazon. The popular online retailer had a Dask job cost $1 million to run, Nishihara says. It benchmarked that Dask job running on Ray, and found it highly beneficial from a cost point of view (although maybe it was not as beneficial to its subsidiary AWS’ revenues).
“Just swapping it out and running on top of Ray, it cut the costs to one third, saving over half a million per job,” Nishihara says. “It made it more scalable and everything just by swapping in Ray.”
Currently, about 20 machine learning and deep learning frameworks have been integrated with Ray; you can see the full list here. Work continues on more libraries behind the scenes, including for TensorFlow.
Ray also comes with several built-in machine learning libraries, including Tune, for hyperparameter tuning; RLIb, a reinforcement learning library; RaySGD, a distributed training wrapper; and Ray Serve, for scalable and programmable serving.
Ray is a general compute framework, but it’s being mainly used in machine learning and AI workloads, which have high demands for scalability. There are no plans to make a big push to get database or transactional systems running atop Ray, Stoica says. However, with that said, there is a need to support business logic and micro services within Ray.
“Our main targets are people who are building these tools, who are building these platforms on top of us, and people who are building end-to-end applications,” says Stoica who was a 2017 Datanami Person to Watch. “You will see many examples at the Ray Summit. That’s why the production aspect is very important, because at the end of the day, that’s what the application is all about.”
Registration for the Ray Summit 2021, which is free, is running at about twice the level of last year’s event, which Stoica attributes to community growth. The project now has about 500 contributors, and it’s the fastest-growing distribute computing frameworks in terms of GitHub Stars, he says.
To learn more about Ray Summit or to register, go to raysummit.anyscale.com.