Follow Datanami:
February 23, 2017

Platform Incorporates Spark to Boost Collaboration

In-memory tools such as Apache Spark continue to mark inroads on predictive analytics platforms designed to allow data scientists and analysts to apply machine learning to large and diverse data sets. Among the goals is limiting unnecessary data movement.

Seeking to improve development and management of enterprise data science projects, the latest release of Dataiku’s platform adds new functionality designed to boost collaboration across large teams along with new segmentation and scoring tools.

Paris- and New York-based Dataiku (as in “data haiku”) said this week the 4.0 version of its data science platform includes Apache Spark pipelines, interactive hierarchical clustering along with compliance and regulatory tracking needed to comply with new data governance rules. The latter capability is specifically targeting at regulated industries such as aerospace and defense, financial services, healthcare, insurance and pharmaceuticals.

The upgraded platform targets data science team members ranging from spreadsheet users to machine learning experts with the goal of improving data analytics products.

Dataiku is betting that more enterprises are struggling with the tedious task of targeting, scoring and segmenting customers. In order to reduce the IT workload and limit unnecessary data movement, the updated platform integrates Spark in-memory capabilities to increase efficiency by enabling data teams to use a minimal number of concurrent Spark jobs to rebuild large data flows.

It also touts streamlined collaboration across data teams via new dashboards as well as the integration of third-party collaboration software, including GitHub, HipChat and Slack. The latest version also incorporates visual machine learning libraries that can eventually be used to include machine-learning experts on expanding data teams.

“The secret is to streamline the movement of information within an organization while reducing the movement of data itself,” Dataiku CEO Florian Douetteau noted in a statement announcing the latest platform release.

Analytics applications for the Dataiku platform include predictive data flows to detect fraud, reduce churn, improve company logistics or predict future maintenance issues.

The startup announced last fall it was collaborating with healthcare analytics specialist Intermedix Corp. to build an analytics tool for predicting which patients are most likely to miss scheduled appointments. The partners said the tool is now being used in more than 50 U.S. private clinics.

Founded in 2013 Dataiku raised $14 million in a Series A funding round led by FirstMark Capital in October 2016, bringing its total funding thus far to $17.7 million.

Recent items:

Predictor Looks to Reduce Patient No-Shows

Data Startup Targets Machine Learning For Healthcare