IBM ‘DataWorks’ Leverages Watson, Spark
Project DataWorks, a new initiative launched this week by IBM to advance its analytics push, seeks to forge a cloud-based analytics platform that combines different data types with its Watson cognitive computing technology.
The DataWorks initiative also reflects IBM’s (NYSE: IBM) embrace last year of the Apache Spark in-memory computing framework. The company initially moved 15 analytics applications onto Spark clusters while launching an Apache Spark cloud service.
IBM said this week the data initiative would help automate deployment of data using Watson-based machine learning and Apache Spark. It also promises to speed data ingestion from “50 to hundreds of Gbps” for applications ranging from enterprise databases to Internet of Things applications as well as social media and weather prediction.
Among other tasks, DataWorks is being positioned as a way to alleviate onerous data preparation. “We know that clients spend up to 80 percent of their time on data preparation, no matter the task, even when they are preparing to take advantage of today’s advanced AI and machine learning approaches,” Bob Picciano, senior vice president of IBM Analytics,” noted in a statement.
Project DataWorks seeks to leverage cognitive capabilities “to integrate all data sources on one common platform, enabling individuals to get the data ready for insight and action, faster than ever before,” Picciano added.
Meanwhile, the initiative looks to deliver a single cloud environment where data can be wrangled and common datasets as well as models can be shared while adhering to rigorous data governance rules covering what data needs to be retained and how. Data governance can then be run on both Apache Spark and the Watson cognitive computing platform running on IBM’s Bluemix cloud service.
DataWorks uses Watson Analytics and natural language processing to analyze and create visualizations with a single line of code, the company claimed. Seeking to make greater use of “dark,” or unstructured, data, the platform then presents a snapshot of data assets along with an audit trail showing who is using data for what purpose.
The initiative also addresses what some market analysts refer to as a “big data fabric” that helps manage large data volumes, preps that data and delivers it to decision makers in real time. DataWorks and other efforts also seek to close the gap by providing a ready-made cloud service that can be fine-tuned for different analytics applications while saving the cost of developing a data framework from the ground up.
Hence, IBM is expanding its backing of Apache Spark with its Watson DataWorks initiative that adds hybrid cloud services as way to deliver a cognitive-based analytics framework. The goal, the company said, is delivering “self-service [analytics] for all user types.”
IBM said it has so far enlisted more than 20 partners and technologies to support Project DataWorks.
In connection with the DataWorks launch, IBM also this week launched a related “methodology” called “DataFirst” designed to help assess the skills and technology roadmap required to leverage its cognitive analytics tools. The methodology essentially offers customers access to proven practices for developing new processes for data prepping and analysis.