Follow Datanami:
August 14, 2023

GenAI Is Making Data Science More Accessible, Dataiku Says


Large language models and generative AI are being adopted for all kinds of new and interesting use cases, which we explore daily in these pages. One of the less visible use cases is widening the pool of users who can tap into advanced data science capabilities, thereby lowering the technical barrier that once separated the data haves from the have-nots, a Dataiku executive says.

The rapid pace of development for LLMs and GenAI is enabling average tech workers to do things that data scientists couldn’t even do six months ago, says Jed Dougherty, Dataiku’s vice president of platform strategy

“Not to say data science is dead or data scientists are dead. There’s still a ton of data out there that’s not text,” Dougherty says. “It’s not that data scientists aren’t needed anymore. There’s just problems they’ve never been able to solve that now anyone can solve, and that’s pretty cool.”

We’re fast reaching the point where just about anybody can tap into the sort of advanced AI capabilities that previously was only accessible to the largest FANG companies, Dougherty says, referring to the acronym for Facebook, Amazon, Netflix, and Google (but now used to represent all advanced tech giants).

“For me it’s a great time to be in this space,” he says. “It’s the biggest thing that’s happened, from an

Dataiku is integrating with ChatGPT and other LLMs  (MD.SHAHRIYA_HASAN/Shuttersetock)

algorithmic perspective, easily since Google Search, since PageRank ,as far as changing the way people interact with the world. To be working in the space at this time is terrific, invigorating.”

Dataiku is developing its platform to make it easier for non-AI experts to leverage LLMs and GenAI, such as ChatGPT, without exposing them to the nitty-gritty technical details. It’s the same approach it used for simplifying how users work with “classical” machine learning models, such as classification and regression algorithms, as well as for deep learning frameworks like PyTorch and Tensorflow.

The company has two specific tools that it’s working on to bolster the GenAI and LLM capabilites of its platform, including Prompt Studio and AI Prepare, both of which are in preview at the moment, with general availability expected soon.

Prompt Studio will allow users to develop new “recipes” in Dataiku that let them tap into LLM capabilites atop their existing data. For example, it will allow a marketing manager to tell an AI model (ChatGPT, Bard, etc.) to automatically write and send emails to a list of users.

“Essentially, you take in all your Salesforce data about every customer that you have, connect it to ChatGPT, and say ‘Write a cold call e-mail for every one of these customers,’” Dougherty says. “Hit one button in Dataiku and all of a sudden you have 500 cold call emails, which then you can click one more button in Dataiku and send out those emails to everybody.”

Dataiku provides a platform for working with LLMs as well as traditional ML models

The other new tool, AI Prepare, will leverage GenAI models to automate data transformation tasks within Dataiku. Instead of requiring the user to manually write SQL to define the joins, filters, etc. to execute on the data, AI Prepare will generate the SQL for the user based on a few English language prompts and then execute the job.

Users will be able to inspect and change the data flow created by AI Prepare just as they can with everything Dataiku does, Dougherty says. Oversight is necessary to detect mistakes, malfunctions, and hallucinations introduced by GenAI, he says.

“We want to be a stable environment for enterprise organizations to work in an enterprise way with all these GenAI capabilities,” he tells Datanami. “When I talk about a stable environment, I’m talking about a responsibility structure, preventing folks from going off the rails, either from spending too much money, accessing improper data that they shouldn’t be seeing, or rolling out models or working with models that they shouldn’t be working with.

“But at the same time making it so that the largest amount of people in your organization can leverage these things in a way that they can understand, and not just through chats,” Dougherty continues. “It’s not always just going to be a one person talking to a chatbot kind of interface. We really want people to be able to apply this stuff to the massive data sets they’ve been working with for the last 10 years.”

LLM providers that Dataiku supports out of the box

The French-American company (its headquarters are in New York City but the CEO and CTO work out of Paris) has recently rolled out its RAFT framework to ensure GenAI use cases stay within certain bounds. RAFT, which stands for stands Reliable, Accountable, Fair, and Transparent, is based on other emerging frameworks for the ethical use of AI.

Dataiku functions as a full data platform in that it includes tools for utilizing ML and AI as well as data prep and analytics tools. The company hasn’t yet used GenAI to create new visualizations and reports, but that will likely be coming in the future, according to Dougherty.

Dataiku has worked to lower the barrier of entry to its products to the point where, if you’re a good Excel user, you should be able to use Dataiku. That’s all part of the company’s strategy for the democratization of data and AI.

“It’s very much expanding the persona,” Dougherty says. “Certainly, data scientists are going to use this consistently for the most challenging part of the work that they’re doing. But there’s no reason why a business person can’t do this at this point. I wrote zero lines of code to [generate summaries of all Congressional bills] and it took me 15 minutes. Obviously I use Dataiku a lot. But this is not a high barrier to entry anymore, which is really, really cool.”

Related Items:

Cutting Through the GenAI Noise

What Is MosaicML, and Why Is Databricks Buying It For $1.3B?

Dataiku 11.1 Update Boosts Data Science and MLOps