IBM Embraces Hadoop in ‘BigInsight’ Push
IBM jumped onto the Hadoop bandwagon this week with the introduction of its BigInsights for Apache Hadoop offering along with machine learning with R statistical computing and other features designed to handle data analysis at massive scale.
The introduction coincides with the launch of an industry initiative by IBM and others to promote Apache Hadoop and big data technologies in enterprises.
IBM BigInsights for Apache Hadoop comes with a broad data science toolset to query data, visualize and carry out machine learning techniques at scale, IBM said Feb. 17. The enterprise data analysis platform includes three new modules:
- BigInsights Analyst, which includes IBM’s SQL engine along with spreadsheets and visualizations. The tool is designed to improve the efficiency of millions of annual SQL queries. Those queries also can be run unchanged against Hive, HBase and relational databases, IBM said.
- BigInsights Data Scientist, designed as a platform for a new machine-learning engine capable of tuning its performance over large datasets to spot patterns. It also comes with a dozen “industry-specific” algorithms such as Clustering, Decision Tree and PageRank while natively supporting open source R statistical computing.
- BigInsights Statistical Management, which includes new management tools designed to help allocate resources and optimize workflows. Deployments can scale to large numbers of users or clusters, IBM said.
Along with its BigInsights platform, IBM also announced its Open Platform for Apache Hadoop designed to provide data access controls and authentication for enterprises. IBM said it is also adding support for Apache Spark to allow computing engines to use interactive analytics applications.
The analytics platform also seeks to integrate Hadoop as part of a warehousing and data architecture. For example, IBM said the predictive analytics capabilities of SPSS could be used to build predictive models or “exercise” machine learning or R in Hadoop.
IBM’s embrace of Apache Hadoop comes as more enterprises turn to the analytics tools to collect and store extremely large sets of highly variable data from a growing list of sources. IBM is betting that an automated data scientist feature would appeal to enterprises seeking to make sense of more data using statistical modeling while making it easier for IT managers to deploy across enterprises.
Among IBM’s goals in releasing its Apache Hadoop offering is improving access to a “broader community of analysts,” Beth Smith GM of IBM Analytics Platform, noted in a statement.
IBM is among the founding members of the Open Data Platform (ODP) Initiative, an industry association formed this week to help drive collaboration and standardization across Hadoop and big data technologies. Other founding members include GE, Hortonworks, Pivotal and SAS.
Along with promoting big data solutions via standard platform, the initiative will attempt to define, test and certify a standard “ODP core” of big data open source projects.