Follow Datanami:
November 9, 2012

This Week’s Big Data Big Seven

Datanami Staff

We’re fresh off the chase for stories from the recent Strata HadoopWorld event and now we’re preparing to roll into this coming Supercomputing ’12 conference in Salt Lake City where there will be plenty of big data talk to go around for both scientific and enterprise computing folks. We’ll be reporting live on site from there this coming week.

The week’s emphasis on pre-SC ’12 activities has brought a few HPC vendors out of the woodwork with their big data offerings, most notably Cray and its subsidiary, YarcData. We try to balance out the top news items of the week but the Seattle supercomputer maker really stole the show this week with some interesting use cases—and some major acquisition action.

Aside from Cray and YarcData, others, including Revolution Analytics, Google with its BigQuery and more can be found in our top seven for this busy early November week.

ORNL Looks to YarcData for Fraud Detection

YarcData announced it was awarded a contract to deliver a uRiKA graph-analytics appliance to the Oak Ridge National Laboratory (ORNL).

Analysts at ORNL will use the uRiKA system as they conduct research in healthcare fraud and analytics for a leading healthcare player. In addition to the healthcare fraud detection program, researchers and scientists at ORNL will also apply the capabilities of the uRiKA graph-analytics appliance to other areas of research where data discovery is vital. These potential use cases include healthcare treatment efficacy and outcome analysis, analyzing drugs and side effects, and the analysis of proteins and gene pathways.

“YarcData’s uRiKA appliance is uniquely suited to take on these challenges, and we are excited to see the results that will come from the strategic analysis of some very large and complex data sets,” said Jeff Nichols, Associate Laboratory Director for Computing and Computational Sciences at Oak Ridge National Laboratory. 

“Finding patterns of fraud and abuse in the healthcare industry is a significant challenge, but it is also a problem that the uRiKA graph-analytics appliance is designed to address because of its unique ability to discover unknown or hidden relationships and patterns in large volumes of data,” said Arvind Parthasarathi, president of YarcData.

Next—The Next Big Data, HPC Monopoly? >


Cray Acquires Appro International

Global supercomputer company Cray Inc. announced it signed a definitive agreement to acquire Appro International, Inc., a privately-held developer of advanced scalable supercomputing solutions, for approximately $25 million in cash, which assumes at least a $3.5 million net working capital balance at closing with no debt. Currently the #3 provider on the Top100 supercomputer list, Appro builds some of the world’s most advanced high performance computing (HPC) cluster systems.

“Cray has always been a company with a singular focus on the high performance computing market, and with this acquisition, we have strengthened that commitment and will now be positioned to expand our portfolio of highly innovative supercomputing solutions,” said Peter Ungaro, president and CEO of Cray. “Appro is one of the market leaders in HPC cluster solutions, and this acquisition is another step forward as we continue to transform Cray into a company that provides world-class offerings to customers across all segments of the supercomputing market, including big data.”

Highlights of the transaction include:

·         Upon closing, Appro will become Cray’s newly-formed Cluster Solutions business, led by Daniel Kim, current CEO of Appro;

·         After the completion of the transaction, Cray expects to add approximately 90 Appro employees;

·         Cray will sell Appro’s HPC cluster products under the Cray brand;

·         The transaction is expected to close relatively soon, possibly within the next few days or weeks, subject to customary closing conditions.

“We are excited to be joining a company like Cray that is an industry leader in designing and building the most advanced supercomputing systems in the world,” said Daniel Kim, CEO of Appro. “Cray has the strongest brand in the supercomputing market, and this deal allows Appro to bring our cluster solutions to customers around the world with enhanced service, storage and Big Data capabilities.”

Next—Big Data Predicts When You Will Die >


Univa Reduces Hadoop Cost for Archimedes Prediction

Companies have been using “Big Data” to determine the behavior of consumers, but now there is software available to crunch medical Big Data to predict when you will die.

San Francisco-based healthcare company Archimedes Model, uses Big Data to predict the flow of a disease through the human body. This is much more than simply testing your cholesterol, body fat, weight etc. but uses data over several years to predict how a disease can affect a population and help prevent diseases in individuals and populations from spreading.  

To do the Big Data crunching they needed, Archimedes turned to Univa.  Univa allowed Archimedes to use existing idle hardware to run Hadoop software without the need to invest in an extra cluster – saving up to 50% on their Hadoop investment.

 Univa isn’t a super computer, but software that harnesses several computers to crunch Big Data and make it meaningful.  Univa has been helping all sorts of clients – from researchers mapping the human genome to aerospace providers using Big Data to design planes and cars.

  By deploying Univa Grid Engine, Archimedes was able to:

·         Save $62,500 – $125,000 in capital expenditure equating to up to a 50% saving. 

·         Save $125,000 in operating expenditure including storage, office space, IT support and electricity

·         Negate the need to invest in a separate cluster to run Hadoop

·         Optimize the use of existing servers and computers

·         Easily scale with no additional costs

·         Allow scheduling policies to apply to all computing jobs, ensuring jobs were handled in order of priority

Next– Spelling Hadoop without an R? >


Revolution Analytics Brings Hadoop Support to Predictive Analytics

Revolution Analytics unveiled the latest version of Revolution R Enterprise, its commercial-grade analytics software built upon the world’s most powerful open source R statistics language for R-based enterprise-class data analytics.

Revolution R Enterprise 6.1 introduces several new advances in high-performance predictive analytics. It gives users new ability to create big data decision trees, and easily extract and perform predictive analytics on data that is stored in the Hadoop Distributed File System (HDFS).

“Our new release delivers several new features to help organizations with complex and fast-growing data sets make sense of big data,” said David Rich, Revolution Analytics CEO.

Revolution R Enterprise 6.1 includes the following new capabilities:

  • Big data decision trees. The new “rxDTree” function is a tool for fitting classification and regression trees, which are among the most frequently used algorithms for data analysis and data mining. The implementation provided in Revolution Analytics’ RevoScaleR package is parallelized, scalable, distributable and designed with big data in mind.
  • New ability to analyze data from Hadoop Distributed File System (HDFS). With more and more data stored in Hadoop, this new option lets data scientists read data from HDFS and apply big-data statistical models from Revolution R Enterprise.
  • Improved performance for ‘Big Data’ files. With new compression technology the size of XDF files can be reduced, allowing for higher-performance analytics throughput and faster transfers into clusters or cloud processing systems.
  • Improved Linux installer. The installation process on Linux servers has been streamlined to meet stringent IT requirements, especially for non-root installs.
  • SiteMinder single-sign for applications: Authorized users of applications built on Revolution R Enterprise deployed via the RevoDeployR Web Services API may authenticate using CA SiteMinder.

Next—Actian and FlyingBinary Engage Government Analytics >


Actian Vectorwise & FlyingBinary Unlock Big Data Analytics for U.K. National & Local Government

Actian Corporation and its reseller partner, FlyingBinary  announced the availability of the record-breaking analytic database, Actian Vectorwise in the UK Government G-cloud initiative, known as the CloudStore. FlyingBinary will integrate Actian Vectorwise with Tableau’s data visualization capabilities in a business analytic solution that will enable national and local government to take advantage of an analytics platform without going through a lengthy procurement process.

The G-Cloud initiative has been adopted by the UK government to streamline public sector procurement of IT products and services. The aim is to provide access to systems that are flexible and responsive to demand, deliver faster business benefits and reduce cost. FlyingBinary is one of the small and medium enterprise (SME) providers awarded a framework agreement for the extension of existing G-Cloud services.

“The provision of a powerful analytic solution combining the fast Vectorwise analytic database along with Tableau’s data visualization capabilities in the CloudStore will enable government departments all across the country to take advantage of cloud-based, cost-effective and high performance business intelligence, analytics and reporting,” commented Jacqui Taylor, CEO of FlyingBinary.

 “Actian Vectorwise’s availability in the CloudStore eliminates onerous procurement procedures; it ensures that public sector organizations have easy access to quick business analytics leaving the government free to get on with its core job keeping the nation ticking,” said Steve Shine, CEO of Actian

Next—Tableau and Google on Analytics >


Tableau Software Furthers Big Data Analytics with Native Google BigQuery Connector

Tableau Software announced its integration with Google BigQuery, a fully managed cloud-based service that enables businesses to interactively analyze enormous amounts of data in the cloud.

Tableau made the announcement at the 2012 Tableau Customer Conference in San Diego and is releasing a preview of the BigQuery connector available for use by its current customers this month. The BigQuery native connection in Tableau will be generally available in Tableau 8.0, expected to be released in the first quarter of 2013.

BigQuery is designed for running interactive queries against very large datasets, even up to billions of rows and terabytes of data. Fast performance combined with the ability to scale automatically, as well as simplicity, sharing capabilities, security and multiple access methods make BigQuery a preferred Big Data analysis tool for businesses with growing data analysis needs.

Tableau and BigQuery together do analytics on Big Data in the Google’s cloud with characteristic self-service and simplicity from Google and Tableau.

“Google BigQuery’s scalable platform and Tableau’s interactive analysis will make it possible to discover insights from massive amounts of data by the business analyst,” said Dan Jewett, Vice President of Product Management at Tableau Software.

In addition to the BigQuery connector, Tableau has also joined Google’s Cloud Technology Partner program and is announcing a native connection to Google Analytics, Google’s website statistics service, also available in Tableau 8.0.

Next—Sherlock Unlocks Big Data at Supercomputing Center >


Pittsburgh Supercomputing Center, YarcData Build Sherlock Big Data Appliance

The Pittsburgh Supercomputing Center and YarcData, a Cray company, announced the deployment of “Sherlock,” a uRiKA graph-analytics appliance from YarcData for discovering unknown relationships or patterns “hidden” in extremely large and complex bodies of information.

Funded through the Strategic Technologies for Cyberinfrastructure (STCI) program of the National Science Foundation, Sherlock features innovative hardware and software, as well as PSC-specific enhancements, designed to extend the range of applicability to scales not otherwise feasible.

Sherlock will focus on extending the domain of applicability of these techniques to a wide range of scientific research projects.

“Sherlock,” says Nick Nystrom, PSC director of strategic applications, “provides a unique capability for discovering new patterns and relationships in data. It will help to discover how genes work, probe the dynamics of social networks, and detect the sources of breaches in Internet security.” Those diverse challenges, along with many others, he adds, have two important features in common: Their data are naturally expressed as interconnected webs of information called graphs, and data sizes for problems of real-world interest become extremely large.

“Until now, graph analytics has largely been impractical for big data,” says Nystrom. This is because, he explains, processing of graph structures requires irregular and unpredictable access to data. On ordinary computers and clusters, nearly all the time is spent waiting for that data to move from memory to processors. Even more challenging, graphs of interest typically cannot be partitioned; their high connectivity prevents dividing them into subgraphs that can be mapped independently onto distributed-memory computers. These factors have precluded large-scale graph analytics, especially for the interactive response times that analysts need to explore data.

 Sherlock enables large-scale, rapid graph analytics through massive multithreading, a shared address space, sophisticated memory optimizations, a productive user environment, and support for heterogeneous applications – all packaged as an enterprise-ready appliance.

“Many current approaches to big data have been about ‘search’ – the ability to efficiently find something that you know is there in your data,” said Arvind Parthasarathi, President of YarcData. “uRiKA was purposely built to solve the problem of ‘discovery’ in big data – to discover things, relationships or patterns that you don’t know exist.”

The project complements ongoing leadership in data-intensive computing at Carnegie Mellon University (CMU). Randal E. Bryant, Dean of the School of Computer Science at CMU, notes, “We’re very pleased that the PSC will have this new capability for analyzing large-scale, unstructured graphs. Such data structures pervade many of the big data applications being investigated by researchers in such diverse areas as biology (e.g., the connectivity between molecules in a protein), networks (e.g., the structure of the world-wide web), and artificial intelligence (e.g., the relationships between different concepts).”

YarcData’s uRiKA is a Big Data appliance for graph analytics that enables enterprises to discover unknown relationships in Big Data. uRiKA is a purpose-built appliance for graph analytics featuring graph-optimized hardware that provides up to 512 terabytes of global shared memory, massively-multithreaded graph processors supporting 128 threads/processor, and an RDF/SPARQL database optimized for the underlying hardware enabling applications to interact with the appliance using industry standard interfaces.

PSC customized Sherlock with additional nodes having standard x86 processors to add valuable support for heterogeneous applications that use YarcData’s Threadstorm nodes as graph accelerators. This heterogeneous capability will enable an even broader class of applications, such as genomics, astrophysics, and structural analyses of complex networks.

Prototype projects, led by researchers from across the country, will use Sherlock for research including understanding the natural language of the Web, learning about human social networks involving different types of online and telephone interactions, cluster finding in astrophysics, and genome sequence assembly.

Datanami