Follow Datanami:
October 24, 2022

Dataiku 11.1 Update Boosts Data Science and MLOps

Dataiku has unveiled the latest update to its data science and machine learning platform, Dataiku 11.1. This update includes improvements to existing capabilities as well as new features for data scientists, ML engineers, and analysts.

Dataiku 11 introduced a guided task within its VisualML framework to simplify developing and deploying time series forecasting models. The 11.1 update now enables users to optimize hyperparameters for their forecasting models. This optimization uses a k-fold cross validation strategy that respects time ordering and ensures validation folds are both consecutive to training sets and non-overlapping, according to Dataiku.

When the k-fold cross validation is activated for binary or multi-class classification tasks, a new stratified option splits the samples in the same proportion as they appear in the whole population and can be used to eliminate sampling bias during cross-test validations. The company says this approach allows users to more accurately model situations seen at prediction time, or when users are modeling on past data to make predictions with future-oriented data. There are also new model comparison generation capabilities for time series models which enable data scientists to compare and contrast models with measurements like performance metrics, time series resampling settings, features handling, and training details.

Optimizing hyperparameters for forecasting models using a k-fold cross-validation strategy is now available. Source: Dataiku

Explaining model behavior or troubleshooting unexpected or incorrect predictions is valuable for clarifying model predictions to stakeholders. The Dataiku platform supports explainability through its VisualML interpretability functionalities, and for computer vision users, this has now been enhanced for image classification modeling in Dataiku 11.1. The “What If?” tab now contains a visual heat map representation that highlights which areas had the most influence on the model’s prediction. When hovering over images for each predicted class, the heat map is overlaid on the scored image to see exactly which pixels the model focused on for its prediction.

The platform’s explainability features are also now available for externally sourced models brought into Dataiku through the MLflow integration: “Data scientists can now compute partial dependence to see how the model is influenced by values across each variable, subpopulation analysis to track any potential bias on subsets of data, and individual explanations to deep dive on extreme probabilities,” Dataiku said in a company blog post.

A new heat map overlay shows which pixels a computer vision model focused on when making a prediction. Source: Dataiku

For those who have been using Dataiku’s MLflow integration to import models, the reverse is now possible. Models developed in Dataiku 11.1 can now be exported in the open source MLflow format for ML engineers wishing to deploy models outside of Dataiku. Users can also directly export Dataiku models to Python code for use in any Python code outside of Dataiku.

Dataiku 11.1 also has two new chart types for data visualization. There is now a treemap chart for visualizing relationships and ratios between elements in categorical and hierarchical data. A second addition is a KPI chart that displays individual aggregated features as single numbers with conditional formatting to gauge KPI progress.

Other platform enhancements include support for additional data connection types and table descriptions, enhanced data exploration, cleansing, and export, and new coding capabilities. Visit the release notes or a blog from Dataiku’s Christina Hsiao to read more about 11.1.

Related Items:

Dataiku 11 Release Offers Enhanced AI Toolset

Dataiku Releases New ‘AI & Us’ Documentary Series

Dataiku Nabs $400 Million in Quest to Democratize AI