October 5, 2017

Solving the ‘Last Mile’ Problem in Data Science

Alex Woodie

There’s a ton of innovation occurring within the realm of data science at the moment thanks to a blossoming of machine learning technologies and techniques. But much of that innovation isn’t getting into production because of an impedance mismatch between data science and IT. Now a company called Open Data Group is aiming to close that gap with a Docker-based model deployment framework.

Open Data Group‘s CTO Stu Bailey describes the company’s FastScore framework as an abstraction layer that makes it easier for enterprise IT professionals to deploy data science models into production environments.

“We’re exclusively focused on bridging analytic professionals, data scientists, quants, model builders, analytic engineers – whatever you want to call them — and IT for deploying analytic models,” he says. “Our solution focuses on getting models deployed as durable, cloud portable assets that will have a very long life time, but can be easily changed, easily migrated.”

FastScore doesn’t care what language or environments the analytic model is developed in. It supports models developed in Python, R, SAS, H2O.ai, and Juypter and Apache Zeppelin data science notebook, among others. Once the model is created and logged in its Avro schema, FastScore can transform it into a Docker microservice that can be called with a REST API, which IT professionals are familiar with.

“We’re exclusively focused on deployment,” Bailey says. “We have no agenda in how model building is done. We spend quite a bit of time integrating with model building tools, but our focus is building a very clear abstraction for handing off models from the data lab or the data science process into pre-production and production environments.”

The company is just as neutral when it comes to production environments as it is for data science development environments. Users can use whatever scheduling system they want, including Kubernetes, DC/OS, or CloudFoundry, while it supports data stores like S3, HDFS, and relational databases.

Creating an abstraction layer between data science and IT lets the data scientist department move as quickly as they want, while protecting the systems administrators, network administrators, systems analysts, and ultimately the CIO from getting too involved with the day-to-day management of machine learning models.

Open Data Group aims to streamline production deployment of analytic models with its FastScore framework

Open Data Group recently dealt with a large manufacturer that was struggling to get machine learning models into production. “They had 40 awesome models, but they hadn’t been deployed,” said Bailey. “Why not? Because of this impedance mismatch between the newness of machine learning, and the general risk profile of IT versus data science.”

In addition to bundling machine learning models as consumable Docker microservices, the company also tends to the daily care and feeding of the models, which it refers to as AnalyticsOps. It offers hooks into code repositories like GitHub, model management functionality, and AB testing capabilities for comparing the effectiveness of models.

“We have a simple set of abstractions, a very consumable technology stack that really makes the data science much more productive, but it’s built like IT would expect it to be built, in a modular way that’s future proof for their own journey,” Bailey says. “If I want to move a prediction function for a GLM that I built in scikit-learn and deploy it to AWS and plumb it into a Kafka stream, that should be easy. And then if I want to move it to Google, that should be easy too. And I should have a very high degree of confidence that all the math is absolutely going to be the same.”

While it’s new to the big data software scene, Open Data Group has been in the machine learning and applied statistics business for 18 years, and has quite a bit of experience in helping customers get real benefits out of their data.

The Chicago-based company was founded by Robert Grossman, the well-respected inventor of predictive model market language (PMML) and a professor at the University of Chicago, and recently switched from a services-first model into developing software. Now it’s taking that experience and bundling it up in such a way as to enable customers to be creative with data science and use the latest advances, yet protect the IT department from being caught up in the explosion of innovation.

Bailey, who taught at University of Chicago with Grossman before co-founding Infoblox with Pete Foley, who is Open Data Group’s CEO, says the timing is right for companies to get more serious about data science and how they can use it to improve their business.

“I would have loved to have started a data science company when I started Infoblox but it was too early, in my estimation,” he says. “We’re seeing a transformation of large portions of the economy and industrial sector with data science in a somewhat analogous way to how computer science really started to impact very large portions of the economy in the 80s and 90s… It’s just the beginning of the journey.”

Why You Need a Data Science Platform

Applications: Data Mining, Enterprise Analytics

Technologies: Frameworks, Middleware

Sectors: Financial Services, Manufacturing, Retail

Vendors: AWS, google, InfoBlox, Open Data Group

Tags: AnalyticsOps, cloud, data science, deployment, FastScore, Open Data Group, python

Only registered users may comment. Register using the form below.

Check off newsletters you would like to receive*
- HPCwire
- EnterpriseTech
- Datanami
- Technology Conferences & Events
- Advanced Computing Job Bank
- Technology Product Showcase
Email*
Name*
First Last
Organization*
Job Function*
Industry*
Country*
City*
State*
Province*
- Please check here to receive valuable email offers from Datanami on behalf of our select partners.

April 26, 2024

April 25, 2024

April 24, 2024

Sponsored Partner Content

Get your Data AI Ready – Celebrate One Year of Deep Dish Data Virtual Series!

Supercharge Your Data Lake with Spark 3.3

Learn How to Build a Custom Chatbot Using a RAG Workflow in Minutes [Hands-on Demo]

Overcome ETL Bottlenecks with Metadata-driven Integration for the AI Era [Free Guide]

Gartner® Hype Cycle™ for Analytics and Business Intelligence 2023

The Art of Mastering Data Quality for AI and Analytics

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Top 6 Strategies for Reducing Data Warehouse Costs

Building an Operational Data Warehouse for Real-time Analytics

Sponsored Multimedia

The Power of DataOps: Bring Automation to Life
No Comments

Tactical Steps for Cloud Migration
No Comments

Immuta Data Access Platform
No Comments

Data Mesh: Fact or Fiction?
No Comments

Contributors

Featured Events

AI & Big Data Expo North America 2024

CDAO Canada Public Sector 2024

AI Hardware & Edge AI Summit Europe

AI Hardware & Edge AI Summit 2024

CDAO Government 2024

Solving the ‘Last Mile’ Problem in Data Science

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

April 26, 2024

April 25, 2024

April 24, 2024

Most Read Features

Most Read News In Brief

Most Read This Just In