Follow Datanami:
November 11, 2020

LinkedIn Open Sources Dagli to Simplify ML Pipeline Building

LinkedIn yesterday announced that it has open sourced Dagli, a Java-based framework for building and deploying machine learning pipelines.

While the number and quality of tools for developing machine learning models has continued to increase, bringing everything together to deploy an ML model continues to be problematic, explains LinkedIn research scientist Jeff Pasternack in a blog post yesterday.

“Duplicated or extraneous work is often required to accommodate both training and inference, engendering brittle ‘glue’ code that complicates future evolution and maintenance of the model and creating long-term technical debt,” he writes.

That’s what spurred the creation of Dagli, which LinkedIn developed to help engineers and data scientists create “bug-resistant, readable, modifiable, maintainable, and trivially deployable model pipelines without incurring technical debt,” Pasternack writes.

Dagli creates a single machine learning pipeline, which is used for both training and inference. The pipeline is defined as a directed acyclic graph (DAG), where the nodes of the DAG repress the inputs to the pipeline, which can either be “placeholders” or “generators.” “Transformers are the child nodes of the graph, accepting one or more input values for each example and producing an output value,” Pasternack writes.

Dagli can be used by both experienced machine learning engineers and engineers with less experience. It works with a Java-based stack, and runs on standalone servers, Hadoop, and even IDEs. It comes “out of the box” with a range of examples for neural networks, logistic regression, and gradient boosted decision trees, as well as FastText, cross-validation, cross-training, feature selection, data readers, evaluation, and feature transformations, LinkedIn says.

For more information, see Pasternack blog on LinkedIn. You can also download Dagli on GitHub.

Related Items:

LinkedIn Unveils Open-Source Toolkit for Detecting AI Bias

LinkedIn Open Sources Kube2Hadoop

LinkedIn Unleashes ‘Nearline’ Data Streaming