Follow Datanami:
July 13, 2012

This Week’s Big Data Big Five

Datanami Staff

This week we saw a number of stories unfolding from companies like Dell, which is pushing its data warehousing appliance for mid-market customers; Convey, which gave big data genomics a GPU boost for Iowa State University; more genomics news from Intel; as well as some key announcements from smaller companies seeking to alleviate data-intensive computing burdens for those in financial services and government.

Without further delay, let’s dive in with Dell to kick of this week’s top five:

Dell Sharpens Focus on Mid-Market Data Warehousing

This week Dell announced availability of their new Dell Quickstart Data Warehouse Appliance 1000 that they say can make it easier for mid-sized organizations to quickly and cost-effectively derive actionable business insights from existing data.

Traditionally, data warehouse platforms have been designed for large organizations with sizable staffs. The Dell Quickstart Data Warehouse Appliance is designed for easy installation and fast time to value, allowing organizations, even those with limited resources, to leverage information to make improved, more informed business decisions.

Powered by new Dell PowerEdge 12th generation servers and the new Microsoft SQL Server 2012 Data Warehouse Appliance Edition, Dell says it also includes Dell Boomi for easier integration of data from any source including cloud and includes start-up and training services as well as quarterly health checks.

Dell says the platform can support up to 5 TB of user data with a balanced configuration that offers optimal performance in a minimal footprint. The latest server and SQL Server database platform technologies from Dell and Microsoft are part of the package, as is support from Dell.

Next — Convey Backs University Big, Fast Genomics >>


Convey Hardware Backs University Big, Fast Genomics

When it comes to big data, there are few areas that require tough software and hardware solutions like those in genomics research.

This week students at Iowa State University won first place in the 2012 MemoCODE Conference design contest with their fast exact-match short-read aligner. Using a Convey HC-1, the students’ solution achieved the highest overall performance — more than 24 times faster than the second place finisher on some incredibly large datasets.

Experts from all segments of the commercial and academic world embarked upon the month long challenge, using a variety of design tools, hardware and software. The contest addressed a common challenge of DNA sequence alignment: efficiently map millions of 100 base pair sequences to a reference human genome of 3.1 billion base pairs, improving the performance of the reference implementation. In genomics it is necessary to map short DNA sequences (reads) generated by next generation sequencers to a reference genome in order to detect genetic variations.

Advised by Iowa State professors Phillip Jones and Joseph Zambreno, several graduate students in the Department of Electrical and Computer Engineering divided into two teams — each using a different coprocessor technology. Jones’ group selected the Convey HC-1 with a field-programmable gate array (FPGA) architecture, while Zambreno’s team used graphics processing units (GPUs).

Although the GPU-based team got off to a fast start, it soon became clear that the Convey FPGA solution was far superior. “This particular challenge had a big memory bandwidth issue — and having local memory was vital,” explained Zambreno. “By their nature, GPUs are fairly limited to how much on-chip, easily accessible memory is available.”

After a month of design planning and long hours of implementation, the students on the Convey team debuted a solution they called Shepard. “The meat of the application — doing the actual alignment — took about one second,” explained Kevin Townsend, one of the grad students on the Convey team. “We won because we were able to get 80 gigabytes of memory bandwidth on Convey’s coprocessor.”

The team credits their success to the Convey server. “The Convey system makes it easy to develop algorithms because of its design and toolset,” said Jones. “The development infrastructure is amazing compared to other FPGA solutions I have seen. And from a user’s point of view, the Convey system simplifies how to get access to memory.” Jones concluded, “Overall, the development infrastructure makes the HC-1 an ideal architecture for this solution.”

Next — NextBio, Intel Set Sights on Hadoop for Genomics >>


NextBio, Intel Set Sights on Hadoop for Genomics

In another genomics-related story this week, NextBio and Intel announced a collaboration aimed at optimizing and stabilizing the Hadoop stack and advancing the use of big data technologies in genomics.

As a part of this collaboration, the NextBio and Intel engineering teams will apply experience they have gained from NextBio’s use of Big Data technologies to the improvement of HDFS, Hadoop, and HBase. Any enhancements that NextBio engineers make to the Hadoop stack will be contributed to the open-source community. Intel will also showcase NextBio’s use of Big Data.

“The use of Big Data technologies at NextBio enables researchers and clinicians to mine billions of data points in real-time to discover new biomarkers, clinically assess targets and drug profiles, optimally design clinical trials, and interpret patient molecular data”

“NextBio is positioned at the intersection of Genomics and Big Data. Every day we deal with the three V’s (volume, variety, and velocity) associated with Big Data – We, our collaborators, and our users are adding large volumes of a variety of molecular data to NextBio at an increasing velocity,” said Dr. Satnam Alag, chief technology officer and vice president of engineering at NextBio. “Without the implementation of our algorithms in the MapReduce framework, operational expertise in HDFS, Hadoop, and HBase, and investments in building our secure cloud-based infrastructure, it would have been impossible for us to scale cost-effectively to handle this large-scale data.”

“Intel is firmly committed to the wide adoption and use of Big Data technologies such as HDFS, Hadoop, and HBase across all industries that need to analyze large amounts of data,” said Girish Juneja, CTO and General Manager, Big Data Software and Services, Intel. “Complex data requiring compute-intensive analysis needs not only big data open source, but a combination of hardware and software management optimizations to help deliver needed scale with a high return on investment. Intel is working closely with NextBio to deliver this showcase reference to the Big Data community and life science industry.”

“The use of Big Data technologies at NextBio enables researchers and clinicians to mine billions of data points in real-time to discover new biomarkers, clinically assess targets and drug profiles, optimally design clinical trials, and interpret patient molecular data,” Dr. Alag continued. “NextBio has invested significantly in the use of Big Data technologies to handle the tsunami of genomic data being generated and its expected exponential growth. As we further scale our infrastructure to handle this growing data resource, we are excited to work with Intel to make the Hadoop stack better and give back to the open-source community.”

Next– Software Partners Focus on Financial Services>>


Composite, Armanta Focus on Financial Services

Data virtualization performance company, Composite Software, and Armanta, Inc., which provides an integrated business intelligence and analytics platform, announced today a partnership to deliver end-to-end enterprise risk management solutions to the financial services industry.

Armanta’s Integrated Business Intelligence Platform lets business users view and analyze enterprise data in real-time. The Armanta platform is an integrated technology suite comprised of a UI for reporting, analytics and visualization, a grid-based massively parallel in-memory engine, and a data virtualization and integration layer. The Composite Data Virtualization Platform provides a data integration approach that they say overcomes data complexity and disparate silos to provide the complete, high quality and actionable information that agile businesses require.

The Armanta/Composite Risk Management Solutionclaims to allow risk managers to further improve risk management analysis by enabling access to up-to-the-minute information from disparate systems so they can react more quickly and intelligently to unexpected market changes.

Th companies are coming together to show off their new finserv wares at this week’s Toronto Financial Information Summit. In addition, Armanta CEO Peter Chirlian is participating in a panel session on big data analytics and information delivery; Dan Yu, the company’s managing director, head of global sales is on a second panel session regarding preparing businesses for future alternative scenarios. Armanta and Composite are currently teaming at several financial services firms in New York and Toronto.

Next — Partnership Targets Government Predictive Analytics >>

Partnership Targets Government Predictive Analytics

Fast Enterprises, LLC (FAST) and KXEN, the leading provider of predictive analytics for business users, have established a partnership to provide government organizations with a solution for developing and deploying predictive models that can be used to help identify fraud, estimate revenue, improve customer service, and enhance overall government program administration.

“Many agencies have successfully used our Data Warehouse and Discovery modules to collect hundreds of millions of dollars in new revenue”

The partnership combines the two companies’ software products, FAST’s GenTax government program administration software and KXEN’s InfiniteInsight predictive analytics solution, to put the power of data mining and predictive analytics into the hands of those who know government business best—an agency’s business analysts, subject matter experts, and other program administration professionals.

Three U.S. state tax agencies have already incorporated the solution into their GenTax-based systems, providing agency personnel with a powerful toolset for developing predictive models and adding a new level of insight into the agencies’ tax administration business processes.

KXEN’s InfiniteInsight™ delivers an automated approach to modeling. To build predictive models and perform data mining analysis, users only require an understanding of the problems that need to be solved and access to the data that is used to build or train a model. This makes the solution perfectly suited to government personnel who are highly knowledgeable about their agency’s business issues and have quick access to data through GenTax®.

Agency personnel can quickly and efficiently develop, refine, and apply models that deliver measurable results. This “self-service” approach can be conducted by personnel who do not have formal training in statistics, allowing agencies to adopt a progressive and ongoing strategy for integrating data analytics and predictive modeling into their business practices. This is typically more efficient and cost effective than hiring consultants to develop custom predictive models that become less relevant over time and can ultimately become obsolete.

Datanami