Follow Datanami:
October 1, 2015

Cisco, Paxata Join Forces on Data Prep

Data preparation specialist Paxata announced a partnership this week with Cisco Systems designed to advance the networking giant’s data preparation capabilities on its emerging big data platform.

The partnership in which Paxata’s technology will underpin the Cisco (CSCO) data prep offering was unveiled this week at Stata + Hadoop World. The deal underscores the growing momentum of the data prep market as data sets grow in size and complexity. The Cisco platform will use Paxata’s machine intelligence algorithms along with an Excel-like interface.

The partners added that the Cisco data prep tool would enable integration with Cisco’s data virtualization capabilities as a way of leveraging earlier IT investments. The Cisco data prep tool’s Hadoop- and Spark-based architecture is designed to prepare and analyze ever-larger datasets.

Cisco added that the data preparation capability would run on its Unified Computing Systems infrastructure for big data that is designed to integrate compute, storage and networking. It is embracing data prep technology as a way “to address the significant data integration challenges [companies] face when preparing analytic data sets,” Kevin Ott, vice president of Cisco’s Data Virtualization and Analytics Business Unit, noted in a blog post.

He added that Cisco’s data prep strategy focuses on balancing “self-service needs with governance constraints, while optimizing infrastructure.”

Cisco’s take on data prep and analysis.

“Paxata will expand Cisco’s reach by bringing data preparation to a much larger business-facing audience,” Paxata CEO Prakash Nanduri, added in a statement. Paxata’s Adaptive Data Preparation platform is built on Apache Spark and optimized to run in Hadoop environments.

Paxata, Redwood City, Calif., announced a $18 million Series C funding round in September after it reported 400 percent year-on-year revenue growth. It so far claims 45 paying customers. The latest funding round was led by EDBI, the corporate investment arm of the Singapore Economic Development Board, Paxata said.

The boom in data prep technology is being driven largely by the reality that most data sets are relatively “dirty,” especially with the influx of unstructured data. Hence, they must be cleansed prior to analysis, proponents of the technology stress.

Paxata’s data prep tools uses a combination of machine learning algorithms and data visualization techniques to help analysts identify and fix anomalies in their data. The startup recently announced it is running on the latest release of Apache Spark.

Paxata and other pure-play data prep startups are gaining traction as data sets grow and system companies like Cisco forge partnerships. “The key ingredient of data preparation platforms is their ability to provide self-service capabilities that allow knowledgeable users, who are not IT experts, to combine, transform and cleanse relevant data prior to analysis,” noted Philip Howard, research director at Bloor Research, in a recent survey of the data prep market.

Recent items:

Why Big Data Prep Is Booming

Automating the Pain Out of Big Data Transformation

Datanami