LinkedIn is releasing to the open source community its machine-learning tool used to train the ranking algorithm for its newsfeed, advertising and customer recommendations.
The world’s largest professional network (NYSE: LNKD) said Tuesday (June 7) its Apache Spark-based machine learning library dubbed Photon ML would give data scientists a more accurate picture of underlying datasets as they train algorithms to parse the backgrounds of individual users.
Citing recent efforts by Facebook (NASDAQ: FB) to improve techniques for populating social media users’ feeds, LinkedIn said it built the machine learning library on Spark to improve the quality of models that could scale quickly to larger datasets. Facebook unveiled an AI initiative last week called Deep Text designed to deliver a “deep-learning based text understanding engine”
Photon ML, the LinkedIn machine-learning library for Apache Spark to be made available as an open source tool, also is designed to help research engineers select the best algorithms for recommendation systems similar to LinkedIn’s “People You May Know” feature.
The company said it uses Photon ML for data preparation that often involves “extract, transform and load” of data from web sites. It then labels data and “joining in” features. Machine learning algorithms are then applied to determine scoring functions for its recommendation and search systems. The best models are then tested to determine their impact on customers.
As the core of LinkedIn’s model training approach, Photon ML also “has served as a drop-in replacement to other machine learning libraries, such as the previously open-sourced ADMM implementation in ml-ease,” the company noted in a blog post.
LinkedIn’s Paul Ogilvie added that the professional network runs Photon ML using Apache Spark and Yarn. It is hosted on the same cluster as other Hadoop MapReduce applications. “Switching our workflows from Hadoop MapReduce to Spark on Yarn has generated a 10 to 30x increase in the speed of model training,” Ogilvie added.
LinkedIn, Facebook and others are promoting new ways of building and applying machine learning technologies to train models and improve search functions. Hence, there has been a growing shift toward sharing code with machine learning developers.
“While there are many open source machine learning libraries currently available, we feel that Photon ML is an important addition because of the direction we intend to take the library toward: generalized additive mixed effect models,” or GAME, Ogilvie said.
GAME models are designed to train algorithms using a more accurate assessment of the underlying dataset that reflects actual user experiences on LinkedIn. By releasing these techniques to the open source community, Ogilvie said broader use would lead to better algorithms for recommendation systems.
Internal testing of a subset of the GAME implementation trained using Photon ML showed improved recommendations in areas like job applications and other LinkedIn services. “While these tests are still in their early stages, these results indicate that Photon can significantly improve recommendations for members,” Ogilvie noted.
LinkedIn Diagnostics Help Tune Hadoop Jobs
One on One With LinkedIn’s VP of Engineering