Follow Datanami:
May 9, 2018

The 10 Step Guide to Mastering Machine Learning

Colin Priest

(Phonlamai Photo/Shutterstock)

Artificial intelligence (AI) and machine learning are transforming the global economy, and companies that are quick to adopt these technologies will take $1.2 trillion from those who don’t. Businesses that fail to take advantage of predictive analytics, or don’t have the time or resources – like highly-trained (and expensive) data scientists – will fall behind organizations that embrace AI and machine learning to extract business value from their data.

Enter automated machine learning, a new class of solutions for accelerating and optimizing the predictive analytics process. Incorporating the experience and expertise of top data scientists, automated machine learning automates many of the complex and repetitive tasks required in traditional data science, while providing guardrails to ensure critical steps are not missed. The bottom line: data scientists are more productive and business analysts and other domain experts are transformed into “citizen data scientists” that have the ability to create AI solutions.

As more so-called “automated machine learning” tools are brought to market, often with limited feature sets, there is a need to define the requirements for a true automated machine learning platform. This article highlights the 10 capabilities that must be addressed to be considered a complete automated machine learning solution.

1. Preprocessing of Data

Each machine learning algorithm works differently, and has different data requirements. For example, some algorithms need numeric features to be normalized, and some require text processing that splits the text into words and phrases, which can be very complicated for languages like Japanese. Users should expect their automated machine learning platform to know how to best prepare data for every algorithm and following best practices for data partitioning.

2. Feature Engineering


Feature engineering is the process of altering the data to help machine learning algorithms work better, which is often time-consuming and can be expensive. While some feature engineering requires domain knowledge of the data and business rules, most feature engineering is generic. A true automated machine learning platform will engineer new features from existing numeric, categorical, and text features. The system should understand which algorithms benefit from extra feature engineering and which don’t, and only generate features that make sense given the data characteristics.

3. Diverse Algorithms

Every dataset contains unique information that reflects the individual events and characteristics of a business. Due to the variety of situations and conditions represented in the data, one algorithm cannot successfully solve every possible business problem or dataset. Automated machine learning platforms need access to a diverse repository of algorithms to test against the data in order to find the right algorithm to solve the challenge at hand. And, the platform should be updated continually with the most promising new machine learning algorithms, including those from the open source community.

4. Algorithm Selection

Having access to hundreds of algorithms is great, but many organizations don’t have the time to try every algorithm on their data. And some algorithms aren’t suited to their data or data sizes, while others are extremely unlikely to work well on their data altogether. An automated machine learning platform should know which algorithms are right for a business’ data and test the data on only the appropriate algorithms to achieve results faster.

5. Training and Tuning

It’s standard for machine learning software to train an algorithm on the data, but often there is still some hyperparameter tuning required to optimize the algorithm’s performance. In addition, it’s important to understand which features to leave in or out, and which feature selections work best for different models. An effective automated machine learning platform employs smart hyperparameter tuning for each individual model, as well as automatic feature selection, to improve both the speed and accuracy of a model.


6. Ensembling

Teams of algorithms are called “ensembles” or “blenders,” with each algorithm’s strengths balancing out the weaknesses of another. Ensemble models typically outperform individual algorithms because of their diversity. An automated machine learning platform should find the optimal algorithms to blend, include a diverse range of algorithms, and tune the weighting of the algorithms within each blender.

7. Head-to-Head Model Competitions

It’s difficult to know ahead of time which algorithm will perform best in a particular modeling challenge, so it’s necessary to compare the accuracy and speed of different algorithms on the data, regardless of the programming language or machine learning library the algorithms come from. A true automated machine learning platform must build and train dozens of algorithms, comparing the accuracy, speed, and individual predictions of each algorithm and then ranking the algorithms based on the needs of the business.

8. Human-Friendly Insights

Machine learning and AI have made massive strides in predictive power, but often at the price of complexity and interpretability. It’s not enough for a model to score well on accuracy and speed – users must trust the answers. And in some industries, and even some geographies (see the EU’s  GDPR), models must comply with regulations and be validated by a compliance team. Automated machine learning should describe model performance in a human-interpretable manner and provide easy-to-understand reasons for individual predictions to help an organization achieve compliance.

9. Easy Deployment

An analytics team can build an impressive predictive model, but it is of little use if the model is too complex for the IT team to reproduce, or if the business lacks the infrastructure to deploy the model to production. Easy, flexible deployment options are a hallmark of a workable automated machine learning solution, including APIs, exportable scoring code, and on-demand predictions that don’t require the intervention of the IT team.

10. Model Monitoring and Management

Even the best models can go “stale” over time as conditions change or new sources of data become available. An ideal automated machine learning solution makes it easy to run a new model competition on the latest data, helping to determine if that model is still the best, or if there is a need to update the model. And as models change, the system should also be able to quickly update the documentation on the model to comply with regulatory requirements.

Businesses that turn to automated machine learning encompassing these features will save time, increase accuracy, and reduce compliance risk when building out their machine learning models – helping them become a truly AI-driven enterprise.

About the Author: Colin Priest is the Director of Product Marketing for DataRobot, where he advises businesses on how to build business cases and successfully manage data science projects. Colin has held a number of CEO and general management roles, where he has championed data science initiatives in financial services, healthcare, security, oil and gas, government and marketing. Colin is a firm believer in data-based decision making and applying automation to improve customer experience. He is passionate about the science of healthcare and does pro-bono work to support cancer research.

Related Items:

Practical Tips for Success with Machine Learning

Machine Learning, Deep Learning, and AI: What’s the Difference?