Follow Datanami:
February 20, 2024

KNIME Works to Lower Barriers to Big Data Analytics

(Khakimullin Aleksandr/Shutterstock)

Big data analytics can be complicated–there’s just no way around that simple fact. But it shouldn’t be more complicated than it needs to be. With recent updates to its data analytics offerings–including a revamped user interface delivered last year and an automated runtime unveiled last week–KNIME is showing that it’s serious about lowering barriers to working with its software.

KNIME got its start in 2004 when a computer science professor named Michael Berthold led the development of a new data processing system that was open, modular, and scalable. That Java-based effort, which started at the University of Konstanz, turned into the Konstanz Information Miner, or KNIME.

Many of the early features that endeared KNIME to the pharmaceutical industry–such as a drag-and-drop GUI that allowed users to build data processing pipelines using building blocks called “nodes”–are still present in the software. KNIME Analytics Platform, which is distributed under a GPL license, today boasts more than 4,000 nodes that implement some function, such as connecting to a database, calling an NLP function, or scoring the performance of an ML algorithm.

There are two main flavors of KNIME: the KNIME Analytics Platform, which is free and open source, and KNIME Hub, which has free and not free parts. The KNIME Community Hub is mostly free and allows users to collaborate on the development of KNIME analytic workflows, while the KNIME Business Hub is proprietary and fully supported software that enables users to run KNIME workloads on the servers of their choice.

Users build their analytics workloads in a drag-and-drop manner using the KNIME Analytics Platform

Last week, the Zurich, Switzerland-based company announced a big update to KNIME Community Hub. In addition to letting teams collaborate privately on visual workflows they built with KNIME Analytics Platform, but now they can also automate those workflows and run them on the cloud in the software-as-a-service (SaaS) manner, the company says.

KNIME users only pay when they choose to run their workflows on the KNIME Community Hub cloud, the company says. Everything else is free, including the capability to collaboratively develop, share their ideas, and even version tracking and rollback.

The SaaS capability takes KNIME to the next level, says Berthold, the company’s longtime CEO.

“So far, KNIME Community Hub has been an important part of our open ecosystem, as an easily accessible repository to find and share solutions and collaborate on data science workflows,” Berthold said in a press release. “With the new SaaS features, we now allow the community to collaborate in small teams and easily execute their workflows in the cloud.”

KNIME has also improved the ease-of-use of the KNIME Analytics Platform. According to Rosaria Silipo, the company’s principal data scientist, version 5.1, which shipped last July, represents a big improvement in the usability department.

“We changed the UI,” Silipo tells Datanami in a recent interview. “We made it more beautiful and easier to use. And that means that we reorganized it a bit.”

Having upwards of 4,000 pre-built nodes at your beck and call can be a bit intimidating to the uninitiated. So to simplify things, KNIME version 5.1 shows a reduced number of the most important nodes when users first sign in.

“We have 3,000, 4,000  nodes available, so it’s a lot of nodes. Newcomers might feel a bit overwhelmed,” she says. “The most commonly used nodes are available the first time, so it becomes easier for people to find the things they need,” Silipo says. “I think it’s easier to use, especially for newbies.”

KNIME Analytics Platform 5.1 also brings an AI chatbot to the screen to help users find features and navigate through the product. KNIME developed the chatbot, dubbed KNIME AI Assistant, or K-AI, using a large language model (LLM) trained on the company’s knowledge base, Silipo says.

Users can even ask K-AI to assemble the nodes in a KNIME workflow. She says that works “most of the time.” K-AI can also write Python code (the product allows users to code in popular languages like Python, R, Java, JavaScript, and even Weka. “This one works very well,” she says.

Rosaria Silipo is the principal data scientist at KNIME

As a low-code, no-code platform, KNIME helps to lower the barrier to data science and analytics. However, that doesn’t mean that anybody can automatically be successful sitting behind KNIME Analytics Platform.

“We crack two barriers when you work with data. The first barrier is the coding and the second barrier is the math behind all the data algorithms,” she says. We remove the coding barrier, so then even people you know who are used to Excel can come and build their pipeline of nodes.”

But that doesn’t mean that KNIME pipelines cannot be complicated, and that it cannot handle sophisticated workflows. And of course, users still need to have a solid background in math. “You need to know what you’re doing, absolutely,” Silipo says.

While KNIME Analytics Platform incorporates some generative AI functions, the product itself is mostly focused on traditional machine learning, Silipo says. Some members of the 300,000-strong KNIME community have built plug-ins that enable KNIME workflows to call out to LLMs.  The company is currently working to determine how to incorporate more GenAI and LLMs into the product, which the company will discuss at the upcoming KNIME Spring Summit, which is scheduled to take place April 15-17 in Austin, Texas.

Related Items:

KNIME Releases a State of Data Science and Machine Learning Survey

The Maturation of Data Science

Data Prep Still Dominates Data Scientists’ Time, Survey Finds