SAS Developing In-Memory Statistics for Hadoop
SANTA CLARA, Calif., Feb. 11 — SAS is developing an interactive analytics programming environment for the open source Hadoop framework based on SAS in-memory technology. By enabling enterprises to draw deep insights from big data assets faster and with greater precision, the new software will boost the bottom line, reduce risk, improve customer understanding and increase opportunities for success.
SAS In-Memory Statistics for Hadoop will enable multiple users to simultaneously and interactively manage, explore and analyze data, build and compare models, and score massive amounts of data in Hadoop. The Hadoop open source framework is widely considered the future of big data. Expected in the first half of 2014, SAS’ software will greatly boost productivity for data scientists.
“SAS In-Memory Statistics for Hadoop loads Hadoop data once and keeps it in memory for multiple analyses within a session – across multiple users,” said Oliver Schabenberger, SAS Senior Director, Analytic Server Research and Development. “Compare that to approaches that require writing data to disk. All that data shuffling is extremely inefficient with big data.”
The SAS in-memory architecture offers unprecedented speed – an absolute requirement for finding value in massive amounts of data. The same in-memory analytics technology that powers the popular SAS Visual Analytics also underpins SAS In-Memory Statistics for Hadoop.
“Data scientists, modelers and statisticians no longer need a patchwork of tools because we’re eliminating the need for different analytic programming languages. SAS In-Memory Statistics for Hadoop supports the entire range of analytics, providing a fast, powerful and comprehensive means for collaborative analysis,” said Schabenberger.
Among the numerous supported statistical and machine learning modeling techniques in SAS In-Memory Statistics for Hadoop are: clustering, regression, generalized linear models, analysis of variance, decision trees, random decision forests, text analytics and recommendation systems.
Industry analyst firm IDC expects Hadoop to reach $812.8 million in sales in 2016 – a compound annual growth of 60.2 percent. SAS anticipates customers will similarly continue deploying big data architecture to glean big insights.
“Hadoop represents significant benefit to enterprises whose accumulated data holds tremendous value. SAS is committed to providing the industry’s best analytics to those deploying this promising big data architecture,” said Wayne Thompson, SAS Chief Data Scientist. “SAS supported big data customers before big data was the buzz. As the technology evolves we’re meeting changing needs, as our customers have come to expect.”
Hadoop spreads data over large clusters of commodity servers and performs processes in parallel. It also detects and handles failures, which is critical for distributed processing. In addition to low distributed hardware cost and the safety net of data redundancy, Hadoop’s notable advantages include:
- Parallel processing – Hadoop’s distributed computing model can process huge volumes of data.
- Scalability – Hadoop systems can be grown easily by adding more nodes.
- Storage flexibility – Unlike traditional relational databases, data does not need to be preprocessed for storage, and Hadoop easily stores unstructured data.