SC12 Special: This Week’s Big Data Top Ten
This week the annual Supercomputing Conference (SC12) closed its doors, bidding over ten thousand attendees adieu and leaving us with a plethora of news items to pass along.
While we aren’t carrying just data-intensive news from the largest system makers (and users) in the world, we expanded our top seven to make room for some extra items from the show.
Let’s get started with the week’s top news items, beginning with a research project that has continued to make waves as the need for big data-ready networks continues to grow:
Internet2 Fuels Big Data Science Research at SC12
Internet2 community members including the National Aeronautical and Space Administration (NASA), the National Oceanic and Atmospheric Administration (NOAA), and many university-based researchers demonstrated this week at the 24th annual SC Conference (SC12) how the Internet2 Network supports high-energy physics research, such as the search for the Higgs boson particle, and climate and atmospheric science for improved predictions of extreme weather, like Hurricane Sandy.
To support the scientists at SC12, Internet2 deployed three 100 Gigabit Ethernet links to its new Software Defined Networking (SDN)-based Advanced Layer 2 Service, as well as two 10 GE links to its Research and Education IP backbone for a capacity of 320 Gigabits from a single provider. In addition, Internet2 staff co-chaired the effort to measure and characterize all 772 Gigabits of capacity into SC12.
New technologies within the recently deployed Internet2 Network upgrade combine more than 15,000 miles of transcontinental 8.8 Tbps network with the first open, national-scale production network utilizing SDN and OpenFlow, standards. The 100 GE Internet2 connections also support components of next-generation networking technology for collaborative Big Data research in many disciplines.
Below are two (among many) of the Internet2 research initiatives presented at the conference.
NOAA Science On a Sphere (SOS) Research and Weather Briefings: Animated images on a room sized, global display system that uses computers and video projectors to display planetary atmospheric storms, climate change data, and ocean temperatures – along with an overview of the weather patterns in real-time for Salt Lake City, the United States and the world.
Efficient Large Hadron Collider (LHC) Data Distribution across 100 Gbps: The Internet2 Network and Advanced Layer 2 Service supports these data movement tools used to support analysis of the same types of data leading to the recent potential Higgs Boson discoveries at the LHC, which produces data flows of more than 100 Petabytes per year, and increasingly relies on the efficient movement of data sets between the globally distributed computing sites. This demonstration interconnected three major LHC Tier-2 computing sites and the SC12 show floor using 100 Gbps technology (and all of Internet2’s Advanced Layer 2 Service links to SC12) through a collaboration between Caltech, the University of Victoria, and the University of Michigan – with support from industry partners.
NEXT — HDS Scales-Out on HANA >
Hitachi Data Systems Announces Scale-Out Platform for SAP HANA
Hitachi Data Systems Corporation, a wholly owned subsidiary of Hitachi, Ltd., announced that the new scale-out capabilities of Hitachi Unified Compute Platform (UCP) Select for the SAP HANA platform have been certified by SAP. The new scale-out architecture combines Hitachi Compute Blade with Hitachi enterprise-class data storage, and supports up to 16 nodes today with further expansion in the future.
The platform delivers SAP’s next-generation, in-memory computing technology on an integrated, optimized hardware platform, which combines Hitachi blade server technology and enterprise-class storage systems with industry-standard network components.
According to John Mansfield, executive vice president, Global Solutions Strategy and Development, Hitachi Data Systems, the key is that it “combines highly reliable and scalable compute clusters for in-memory computing with an external, enterprise-class persistent storage tier to further ensure 24/7 operation in the most demanding and data-intensive environments.”
The solution uses Hitachi blade server technologies to expand capacity for SAP HANA. Hitachi Data Systems will begin general availability of the new scale-out platform for SAP HANA next quarter.
LucidWorks Big Data Now Available
LucidWorks announced the general availability of LucidWorks Big Data, an application development platform that integrates search capabilities into the foundational layer of Big Data implementations. Built on a foundation of Apache open source projects, LucidWorks Big Data enables organizations to uncover, access and evaluate large volumes of previously dark data.
The release of LucidWorks Big Data follows a comprehensive and highly collaborative beta program through which the product’s integrations, scalability, usability and APIs were rigorously tested.
Designed to be ready out-of-the-box, LucidWorks Big Data is a platform that combines the power of multiple Apache open source projects, including Hadoop, Mahout, Hive and Lucene/Solr, to provide search, machine learning, recommendation engines and analytics for structured and unstructured content in one platform available in the cloud, on premise or as a hybrid solution.
The LucidWorks Big Data platform includes necessary open source components, pre-integrated and certified. With LucidWorks, organizations could avoid the staggering overhead costs and long lead times associated with infrastructure and application development lifecycles while assessing product fit.
LucidWorks Big Data is a development platform that includes a unified development platform for developing Big Data applications; a certified and tightly integrated open source stack: Hadoop, Lucene/Solr, Mahout, NLP, Hive; single uniform REST API; out-of-the-box provisioning – cloud or on premise; and pre-tuned software by open source industry experts.
Platfora Announces $20M Investment
Platfora closed a $20M Series B round of funding led by Battery Ventures with participation from existing investors Andreessen Horowitz and Sutter Hill Ventures. In-Q-Tel is also an investor in Platfora. Roger Lee, general partner at Battery Ventures, joins the Platfora Board of Directors. Platfora will use the investment to respond to interest in its product by expanding sales and marketing organizations in the US and globally, and continuing to build out its product engineering and design team in San Mateo, CA.
Platfora transforms raw data in Hadoop into scale-out, in-memory BI without the need for a traditional data warehouse. Platfora uses HTML5 canvas technology, enabling collaborative data analysis across any device or platform.
“In order to address the interest in Platfora, we need to accelerate our go-to-market strategy and quickly grow our team,” said Ben Werther, CEO and founder, Platfora. “We’re so honored to have the investors we do and we are focused on getting Platfora to wide availability in Q1 2013.”
Platfora works by distilling data from Hadoop into a high performance data processing engine, making access to the data fast. Platfora leverages the power of Hadoop to perform the heavy lifting, processing massive data sets into efficient, scale-out in-memory ‘lenses,’ which can span dozens or hundreds of servers to utilize their collective memory and processing.
Platfora automatically and immediately refines the in-memory data based on the questions being asked by end users, instead of requiring 6-12 month IT cycles and manual engineering.
“With the emergence of Hadoop, legacy ETL, data warehouses, and BI solutions no longer make sense,” said Mike Dauber, principal at Battery Ventures. “The early demand for Platfora speaks to the company’s vision and understanding of how businesses need and want to access and analyze data.”
NEXT – HPCc Goes to the Opera >
Opera Solutions, LexisNexis Partner through HPCC Systems
LexisNexis Risk Solutions announced that Opera Solutions, LLC has chosen HPCC Systems, an open-source platform for Big Data analysis and processing, as one of the processing components in its Vektor Big Data platform. Opera Solutions, whose global customers are in data-rich industries, will leverage the HPCC Systems’ platform to process and analyze large internal and external data.
“Our mission is to very rapidly extract predictive Signals from Big Data flows and turn them into directed actions that help drive frontline productivity and bottom-line growth,” said Arnab Gupta, CEO, Opera Solutions. “HPCC Systems’ platform supports this mission through cutting-edge, proven technology that we can use to rapidly integrate and find Signals in large data flows.”
HPCC Systems grew out of the need for LexisNexis Risk Solutions to manage, sort, link, integrate, and analyze billions of records within seconds. Opera Solutions will leverage HPCC Systems’ architecture, high-level data-centric programming language, and two processing platforms: the Thor Data Refinery Cluster and the Roxie Rapid Data Delivery Cluster. The HPCC Systems Big Data Platform will integrate with Opera Solutions’ Vektor Big Data analytics platform, which supports the deployment of Opera Solutions’ hosted analytics solutions.
“HPCC Systems continues to gain momentum and is poised to help Opera Solutions rapidly deliver value to its customers,” said Flavio Villanustre, Vice President, Infrastructure, HPCC Systems.
Hadapt and MapR Announce Partnership
MapR Technologies, Inc. announced a partnership with Hadapt. The partnership enables customers to leverage the MapR Distribution for Hadoop in conjunction with Hadapt’s Interactive Query capabilities to analyze all types of data, structured, semi-structured and unstructured, in a single, enterprise platform.
Hadapt’s Adaptive Analytical Platform and MapR’s Distribution for Hadoop enables business analysts to harness the Hadoop ecosystem via SQL and conduct investigative analytics on a unified platform with no connectors, complexities or rigid structure. The combined solution alleviates the barriers to entry in the traditional enterprise environment with features like no single point of failure and disaster recovery.
“MapR is pleased to bring our Distribution to the Hadapt Analytical Platform,” said Alan Geary, senior director, business development, MapR Technologies. “This partnership lets customers deploy our highly differentiated distribution in combination with Hadapt’s platform and gives customers a compelling, enterprise solution that is flexible, powerful and easy to use and deploy.”
Convey Touts Graph 500 Performance
Convey Computer Corporation announced results for the November 2012 Graph 500 benchmark. Convey systems dominated power/performance with the five fastest entries (sorted by GTEPS) for single-node systems.
Convey’s new MX system earned the number one position for single-node systems with a result of 14.6 GTEPS (billion edges per second) on a problem of scale 29. Convey credits the performance to the company’s new graph instruction set architecture (ISA). The ISA implemented on the hybrid-core systems enables massive levels of parallelism, with over 32,000 threads of execution.
Convey HC-2ex systems capture the next four single-node entries, clocking in at 11.4 GTEPS on a problem of scale 27. This kind of performance is important in efficiently executing high-performance analytics applications in areas such as genomics, graph analytics, social network analytics, fraud detection, and security.
“Hybrid-core platforms deliver the best performance per watt on this benchmark, an important benefit because power consumption continues to be a challenge for large data centers,” explained Bruce Toal, CEO of Convey. “These results demonstrate that both the MX Series and the HC Series excel at executing extremely large, hard to partition problems.”
The MX Series introduces architectural features critical to data-intensive computing. To achieve high degrees of parallelism, the system features highly threaded reconfigurable compute elements, hardware support for efficient thread scheduling, and in-memory arithmetic operations. The development tools are based on OpenMP, the most pervasive parallel programming model in computing. As with all Convey systems, the MX Series supports an industry-standard Linux ecosystem, including compilers and support for OpenMP.
“The Graph 500 benchmark is all about finding relationships between data,” explains Dan Olds of the Gabriel Consulting Group (GCG). “Convey’s ranking shows that their heterogeneous approach better manages the unstructured, irregular data found in many high-performance analytics problems.”
Convey’s hybrid-core computing architecture tightly integrates advanced computer architecture and compiler technology with commercial, off-the-shelf hardware — namely Intel Xeon processors and Xilinx Field Programmable Gate Arrays (FPGAs).
NEXT — Continuuity Scores Funding >
Continuuity Raises $10 Million for Hadoop-Based Big Data Application Development
Continuuity announced a $10 million Series A funding round. Battery Ventures and Ignition Partners led the round, and were joined by returning and new investors Andreessen Horowitz, Data Collective and Amplify Partners. The company will use the funds to accelerate product development and drive its go-to-market strategy. Continuuity’s board of directors will include Roger Lee from Battery Ventures, Cameron Myhrvold from Ignition Partners, along with company co-founders Todd Papaioannou and Jonathan Gray. The Continuuity leadership team and board of directors have helped build multiple companies, including Greenplum, Corio and Dynamical Systems, as well as having served in critical strategic roles at Teradata, Yahoo!, Facebook and Microsoft.
Continuuity recently introduced a Big Data application hosting platform, the Continuuity AppFabric, at O’Reilly Strata + Hadoop World in October 2012.
” New kinds of Big Data applications that are able to pull meaning from data at scale will enable them to do this,” said Cameron Myhrvold, Founding Partner at Ignition Partners. “Continuuity is making it possible for the existing legions of application developers to invent and deploy such apps.”
With a fully integrated developer experience, Continuuity supports application development from prototype to production. The Continuuity Developer Suite includes an SDK, development tools and a fully-featured, single-node version of the Continuuity AppFabric to create and iterate on applications quickly. When ready, developers deploy applications with the click of a button — directly to a remote instance of the Continuuity AppFabric, which is available today on a per-customer basis in an on-premise or managed private cloud edition, and in the future a public cloud edition.
“Built by developers for developers, we believe the Continuuity AppFabric is going to ignite the new wave of Big Data application development and experimentation,” said Todd Papaioannou, co-founder and CEO at Continuuity. “Our team members have been instrumental in shaping Big Data infrastructures and applications at companies that pioneered the technology. Now we’re taking that deep expertise to put a next generation platform and tools in the hands of developers, entrepreneurs and enterprises.”
Numascale Launches Cluster Priced Scalable Shared Memory
The hardware-controlled ccNuma environment allows applications to have fast access to the aggregate memory of all servers in the system. NumaChip is fabricated by IBM and was seen operating with IBM x3755 at SC12 in Salt Lake City, Utah.
NumaConnect enables very large shared memory systems built from commodity servers at the price point of high-end clusters. Systems built with NumaConnect run standard operating systems and all x86 applications.
NumaConnect provides a main memory capacity of up to 256TBytes in a single commodity-based system. “Customers with Big Data problems are excited by the ability to directly address any record anywhere in their entire data set in a microsecond or less,” says Einar Rustad, co-founder and VP of Numascale.
“This is orders of magnitude faster than clusters or systems based on any form of existing mass-storage devices and will enable data analysis and decision support applications to be applied in new and innovative ways,” says Kåre Løchsen, Numascale founder and CEO.
The technology is being tested for oil exploration applications by Statoil and for general HPC datacenter use by the University of Oslo. Some of the main applications planned in Oslo include bioscience and computational chemistry. The EU project PRACE has financed a 72-node cluster of IBM x3755s for proving the technology.
NumaConnect works with AMD Opteron-based servers connecting to HyperTransport and directly addressable, shared memory can be up to 256TBytes.
NumaConnect provides system-wide cache coherency logic with a directory-based protocol that scales to 4096 nodes where each node can have up to three multicore processors. The cache coherency logic is implemented in an ASIC together with interconnect fabric circuitry with routing tables for multi-dimensional Torus topologies.
NumaConnect starts shipping in December for use with servers from IBM, Supermicro, and AIC.
SAP Launches Real-Time, SAP HANA-Powered Platforms
SAP AG and announced a new wave of solutions that leverage the power of the SAP HANA platform to help businesses transform their industry and derive new actionable value. These new solutions aim to enable real-time planning, reporting and analysis on large-scale volumes of data, new sense-and-respond scenarios and personalized interactions with consumers. The announcements were made at SAPPHIRE NOW + SAP TechEd, being held as a co-located event in Madrid from November 13-16.
Restricted available and unrestricted available solutions include: SAP Liquidity Risk Management application, SAP Accelerated Trade Promotion Planning application, SAP POS Data Management application, and SAP Customer Usage Analytics analytic application.
Planned solutions include: SAP Demand Signal Management application and SAP Operational Process Intelligence software.