LinkedIn today announced that its open source feature store, dubbed Feathr, is joining LF AI & Data, the Linux Foundation’s umbrella foundation for big data and AI projects.
Feathr was originally developed at LinkedIn to help manage and serve features used in its machine learning applications. Instead of manually working with features as part of an individual data pipeline, Feathr automates and standardizes the interaction with the data type, which is used in both the training and inference stages of machine learning.
The impetus in creating Feathr was providing greater consistency, accuracy, and performance in its machine learning programs. By defining the data features used in ML programs once in a common feature namespace, users can now pull them up “by name” from within ML workflows. This allows the same features to be used multiple ML programs, improving productivity and accuracy. Feature stores also provide a more repeatable method for transforming source data into features (which is something not found in all feature stores), and boosts the performance of ML serving at the inference stage by centralizing the storage and serving of features.
Feathr takes a traditional feature pipeline for ML applications….(Image courtesy LinkedIn)
Since launching Feathr internally in 2017, use of the software has grown. According to LinkedIn, the feature store is now being used to track thousands of features used by the social media giant.
“It has reduced the engineering time required for adding and experimenting with new features from weeks to days,” LinkedIn data infrastructure engineer Hangfei Lin writes in a blog post today. “It’s also performed up to 50% faster than the custom feature processing pipelines that it replaced.”
LinkedIn released the code behind Feathr under an Apache 2.0 license this April, allowing the general public to use the open source feature store for the first time. Since then, the project “has achieved substantial popularity among the machine learning operations (MLOps) community” and is being adopted by companies across multiple industries, Lin writes.
…and standardizes the workflow with a centralized repository for features (Image courtesy LinkedIn)
By donating Feathr to The Linux Foundatoin’s LF AI & Data group, LinkedIn is putting additional governance in place around the open source project, which should help attract more users and more contributors to the project.
“We’re excited to welcome Feathr to LF AI & Data and for it to be part of our technical project portfolio (41 projects and growing) with over 17K developers,” Dr. Ibrahim Haddad, the general manager of LF AI & Data, said in a press statement. “We aim to support Feathr to expand its user base, grow its community of developers, become a leader within its own category, and enable collaboration and integration opportunities with other projects. We look forward to the project’s continued growth and success as part of LF AI & Data.”
Microsoft is also part of the Feathr story (LinkedIn is owned by Microsoft). According to Lin, LinkedIn engineers have worked with their Microsoft Azure colleagues to ensure Feathr runs well on Azure and is integrated with other Azure products and projects.
According to this Azure blog post, Feathr now works with Apache Spark, Juypter, Azure BLOB Storage, HDFS, Snowflake, Databricks Delta Tables, and SQL Server. Microsoft was also involved with the open-sourcing of Feathr back in April.
Feathr is now a sandbox project at LF AI + Data. For more information, check out its GitHub page.
Feature Stores Emerging as Must-Have Tech for Machine Learning
AT&T and H2O.ai Team Up on Feature Store
2021: The Year of the Feature Store