What You May Have Missed at Strata + Hadoop World 2014
Talk about information overload. If you were one of the lucky 5,000 to attend the Strata + Hadoop World conference last week, then you were subject to a marathon session of big data keynotes delivered continually for the better part of two days. It’s understandable that you missed out on some of the big data news announced at the show, including Cray’s new Hadoop appliance, or the latest tools from Revolution Analytics and Tableau. Don’t worry: We’ll get you back up to speed.
Supercomputer maker Cray used the Strata + Hadoop World to unveil the Urika-XA, a new appliance designed to run big data analytics workloads in production settings. The new appliance combine HPC-like hardware, such as SSDs, parallel file systems, and fast interconnects, with the analytical power of the Apache Hadoop and Apache Spark frameworks.
The hardware specifics of the Urika-XA appliance are impressive. The first iteration of the box sports a 48-node architecture powered by Intel Ivy Bridge processors, 36 TB of memory, 38 TB of SSD (up to 200 TB storage total if regular HDDs are installed), Infiniband interconnects, and HDFS and Lustre file systems. (A 1,000-node plus version will debut in 2015.) The software side, meanwhile, features Cloudera‘s Distribution of Hadoop and Spark backed by the Cray Adaptive Runtime for Hadoop and the Urika-XA management system.
The new system was beta tested at the Oak Ridge National Laboratory (ORNL), which used the device for running simulations in the climate science, materials science, and healthcare areas. But Cray president and CEO Peter Ungaro sees the Urika-XA helping to harness big data opportunities in academic as well as enterprise settings. “The convergence of big data and high performance computing is creating a demand for an open analytics system built on a supercomputing architecture–a solution that allows customers to realize the benefits of advanced analytics techniques, better deal with data complexities, lower TCO and realize faster time-to-value results,” Ungaro says. For more on the Urika-XA, see this story in HPCwire.
Many big data projects get their starts within the visualization and discovery tools provided by Tableau Software, and last week at Strata + Hadoop World, the Seattle, Washington company expanded its big data connectivity in several ways. For starters, it’s now offering a new “direct connection” capability into IBM‘s Hadoop distro, called InfoSphere BigInsights, complementing existing connectors Tableau offers for Hortonworks, Cloudera, MapR Technologies, and Pivotal. The company also announced beta versions of a new connector for Amazon‘s Elastic MapReduce service and for the Apache Spark project’s Spark SQL engine. Tableau is also participating in Databricks Spark certification program.
If you’re doing advanced analytics in R today, you’re probably familiar with Revolution Analytics and its collection of software and services to help customers get more from R. At Strata, the Mountain View, California-based company added two more solutions to its lineup, including Revolution R Open, a free open source R distribution designed to boost the use and performance of R. This product is built with the Intel Math Kernel Library and is ideal for data scientists and statisticians, the company says. In addition to the core R capabilities, it allows users to create data visualizations. The company’s other announcement is the availability of Revolution R Plus, a subscription-based technical support service for open source R. If you want to run your R code in parallel on Hadoop clusters, the top-end Revolution R Enterprise offering remains your best bet.
With so many consumer and business products, it’s easy to forget that Microsoft is also Hadoop provider. Last week, the software giant announced an update to its hosted Hadoop distribution, Azure HDInsight, which adds a technical preview for Apache Storm. Microsoft says Storm support will allow let process millions of Hadoop data items in real time, as part of their Internet of Things applications. HDInsight, of course, is based on Hortonworks Data Platform, and as part of HDP 2.2, Hortonworks added support for hybrid Hadoop data clusters that are split between on premise hardware and the Azure cloud.
The Redmond, Washington comapny also used Strata to talk up the Azure Machine Learning environment that it announced in June and which is now in a technical preview. The company is eager to help organizations use its hosted Hadoop to find items of insight, and then use Azure Marketplace to productize those insights and put them into production in a hosted cloud. You can read more about the possibille use cases this involves on Microsoft’s Machine Learning Blog.
Data analytics and ETL tool provider Pentaho used Strata + Hadoop World to unveil new Hadoop integration capabilities. The Orlando, Florida company is pushing new data modeling and publishing features with the aim of helping customers use Hadoop to create what it calls “streamlined data refineries.” The new features help move data from Hadoop into large-scale analytical databases, such as Hewlett-Packard‘s Vertica. It’s all about giving users the data blending refining capabilities they need, while satisfying the IT’s requirements for governed orchestration processes, says Pentaho’s product chief Chris Dziekan.
If you’re looking for a single shrink-wrapped Hadoop bundle, you might check out the new Data Lake Hadoop Bundle 2.0 unveiled at Strata last week by Pivotal and EMC, the storage giant that spun Pivotal out as a separate company a couple of years ago. The bundle includes EMC’s Data Computing Appliance (DCA) compute nodes, the Isilon network attached storage (NAS) storage nodes, the Pivotal Hadoop Distribution (Pivotal HD), as well HAWQ, a SQL query engine.
Ensuring the security of data in Hadoop has been one of the platform’s bugaboos for the past couple of years. One vendor looking to address the concerns is Voltage Security, which last week unveiled the Voltage SecureData Suite for Hadoop. The offering combines format-preserving encryption and tokenization software with implementation services. A starter edition is available for $40,000 while a license for the production-ready enterprise edition runs a cool $140,000.
No big data conference would be complete without some cloud thrown in, and in that category we have Rackspace, which unveiled used Strata + Hadoop World as the launch pad for OnMetal Cloud Big Data Platform. The new offering gives customers bare metal access to Hadoop clusters loaded with Spark in just three clicks of the mouse, the vendor says. Spark is just in beta on the Rackspace cloud, but it does include the full complement of tools, including Spark SQL, Spark Streaming, MLib, and GraphX libraries. Preliminary benchmarks show the OnMetal Cloud Big Data platform running 50 percent to 100 percent faster than the original flavor of Cloud Big Data Platform, the company says.