Follow Datanami:
September 7, 2012

This Week’s Big Data Big Five

Datanami Staff

Our top five big data stor picks this week range from national lab cancer centers being fitted with data-intensive gear to more funding being thrown out to solve complex data storage and management issues. We also touch on news from companies seeking to indentify best in class data analysts, not to mention our first story from DDN, which is taking advantage of the Vertica platform.

Let’s jump right in… If you want to backtrack first, here is last week’s top five.

DDN Accelerates Complex Big Data Queries

DataDirect Networks (DDN), a leader in massively scalable storage, today announced that the award-winning DDN SFA Big Data storage platform paired with the Vertica Analytics Platform from HP was able to process complex queries against a one-trillion row database in less than 20 seconds during US public sector laboratory tests.

The results also proved the query speed scales linearly, demonstrating that it could be scaled to produce sub second queries and maintain the same speeds for significantly larger databases. 

“In data-intensive processing and analytic environments, DDN technology is the de facto standard for organizations operating at massive scale, enabling customers around the world to extract rapid insights from extremely large data sets,” said Jeff Denworth, Vice President of Marketing, DDN. “This benchmark is a remarkable validation of the million lines of software we’ve perfected for data ingest, processing and delivery – reinforcing DDN leadership in accelerating analytic runtimes of the most demanding Big Data applications.”

The DDN SFA10K-X, configured to optimize the customer application with SSD and hard disk drives, was recognized for its ability to provide significantly faster time to actionable information from large datasets compiled from hundreds or thousands of sensors, social media channels and other data collection sources. The DDN solution also exceeded requirements for data ingest speed, solution scalability, query speed, system flexibility, and ease of management.

NEXT — Accunet Touts Role in Data-Intensive National Cancer Center Build >


Accunet Touts Role in Data-Intensive National Cancer Center Build

Accunet Solutions, Inc. has completed the build-out of a fully integrated and scalable computing infrastructure for data-intensive operations at the Frederick National Laboratory for Cancer Research (FNL).

“The system, housed in our new, state-of-the-art R&D facility in Frederick, Maryland, will enable us to keep pace with the escalating amounts of biomedical data that our scientists work with every day,” said Greg Warth, Director of IT Operations at SAIC-Frederick, Inc., the prime operations and technical support contractor for FNL.

  • Fully optimized across all tiers, the efficient, cost-effective and scalable infrastructure includes:
  • Fabric technology and cloud-capable UCS platform servers from Cisco Systems, Inc.
  • Server and data center virtualization technologies from VMware, Inc.
  • SAN and NAS storage for Tiers 1-3 from EMC Corporation and EMC Isilon
  • Network data management from CommVault Systems, Inc.

The next-generation platform ensures high-performance “big data” protection, availability and management for a distributed, worldwide network of biomedical researchers.

“This project is a significant source of pride for all of us at Accunet Solutions,” commented Alan Dumas, President and CEO. “Informed by our deep experience with the unique IT concerns of bioinformatics organizations, we were able to work side-by-side with the visionary National Cancer Institute and SAIC-Frederick team to architect an Advanced Technology Research Facility solution that is capable of supporting their vital work — both now and in the future.”

The project was funded by the National Cancer Institute through a $15 million subcontract with SAIC-Frederick

NEXT — Pentaho Triples its Reach, Tells How >


Pentaho Triples its Reach, Tells How

Pentaho announced that it has identified five top trends driving big data analytics following a record 340% increase in big data sales for Q2 over Q1 2012. 

After completing a detailed analysis of Q2 big data sales, it emerged that more than 70% of new Pentaho customers in the quarter are deploying on Hadoop, with the remaining new customers split evenly between NoSQL databases (MongoDB and Cassandra) and analytic databases (primarily Greenplum and Vertica). 

Top big data analytics drivers include:

Developer community driving adoption: Big data is entering organizations because of key business drivers, while technical developer resources are the people designing, developing and deploying these applications. In recognition of these developers, Pentaho contributed key big data technologies to open source under the Apache License. The purpose was to help foster adoption of big data analytics, speed up innovation and development of these technologies, and work more closely with communities from existing open source big data technologies like Hadoop and NoSQL databases. With tighter integration between the core big data open source projects, more and more IT organizations, business analysts, data scientists and software developers turned to Pentaho – downloads for its big data technology has totaled more than 65,000 over the last few months.

New big data use cases: Big Data analytics is being applied across a growing spectrum of large and small organizations spanning digital media, mobile apps, gaming, healthcare, security, finance and government sectors. Compelling use cases from Q2 include customer behavior analytics, lead conversion analytics, security threat pattern analytics, social media marketing analytics, supply chain optimization and more..

New tools are making big data more accessible:Although initially, the complexity of big data technology was an obstacle for many companies, the growth of tools that simplify complex scripting, programming and integration jobs is making it more accessible to a wider audience of non-developers. For example, Pentaho’s visual design studio creates MapReduce jobs without coding and provides a high-performance engine to execute them in-Hadoop across the cluster leading to observed customer performance gains of as much as 15x for development and 17x for job execution compared to traditional manual coding methods.

Device mobility and cloud apps driving demand: Withmore and more business applications now available as software-as-a-service and accessible on a range of mobile devices, people who work in the field, such as salespeople and health care professionals are starting to expect these platforms to host advanced analytics applications.

Expanding Hadoop ecosystem:  More software, hardware and services are emerging on the big data scene every day. According to the recent IDC report  “Worldwide Hadoop-MapReduce Ecosystem Software 2012–2016 Forecast (doc #234294, May 2012)” in 2011, the Hadoop ecosystem generated revenues of around $77 million and it projects a compound growth rate (CAGR) of 60 percent over the coming years to reach revenues of $812 million by 2016. IDC further estimates in its’ Worldwide Big Data Technology and Services 2012-2015 Forecast, doc #233485, March 2012 that the total industry revenue for big data will reach nearly $17 billion by 2015, growing at a rate of about seven times faster than the overall IT market.

NEXT — Intel Capital Kicks Terascala Funding for “Fast Data” >


Intel Capital Kicks Terascala Funding for “Fast Data”

Terascala, the Fast Data company, today announced that Intel Capital has made a financial investment in Terascala. The investment is part of the company’s $14 million Series B funding round previously announced April 24, 2012, that is being used to fund growth in research and development, customer support, marketing, and sales.

“This investment will help us gain better market traction with major system vendors, including Dell, EMC, and NetApp”

The burgeoning growth of unstructured data has ushered in an era known as Big Data computing, which challenges IT organizations to store, manage and more quickly deliver petabyte-range data to large scale, compute-intensive applications. Parallel data delivery can make these applications run 5x to 10x faster, leading to more responsive business results. Purpose-built on Intel servers with Xeon E5-2600 processors—and coupled with one of the industry’s most widely deployed open source parallel file systems, Lustre—Terascala offers a high performance storage appliance that enables enterprises to utilize Big Data now while leveraging their existing investments.

“With Big Data and compute capacity both growing at exponential rates, the important problem to solve is how to get the two together more rapidly,” said John Mascarenas, Investment Director, Intel Capital. “Conventional data delivery technology is not fast enough, but this problem can be solved with a parallel file system delivering data to and from applications running on Intel’s latest family of Xeon processors. Together with our recent acquisition of Whamcloud, we are committed to making Big Data actionable for enterprises.”

“This investment will help us gain better market traction with major system vendors, including Dell, EMC, and NetApp,” said Steve Butler, CEO of Terascala. “Our goal is to make high performance storage solutions enterprise-ready—optimized for performance and capacity, easy to set up and use, and with toolsets to manage data workflow within an existing IT infrastructure.”

The Terascala LustreStack appliance suite offers tools for system monitoring, health check and break/fix; event logging and alerts; application tuning; and workflow.

NEXT — Something for Business Analytics Buffs to “Stew” Over >


Something for Business Analytics Buffs to “Stew” Over

SAS will again recognize outstanding data stewards through the “Stewie” awards as part of the second annual Data Stewards Day. Scheduled for Oct. 11, the event honors champions of data and everything it represents.

Established in collaboration with the Data Roundtable, the event highlights the integral role of data stewardship in enterprise data management and governance. This year, nominations are open for Data Stewards Day Awards, also known as the “Stewies.”

Data Stewards Day honors data management leaders who, despite having a major impact on their companies, receive little recognition. Some actually bear the title of “data steward.” Others may simply be known as the person who bridges business and IT through data. Regardless, a data steward is anyone who makes a difference by managing the information that drives an organization forward.

“Bring us the names of those within your enterprise who toil tirelessly to preserve, maintain and value data. The most worthy of them will be appropriately honored by the committee,” said Mark Troester, SAS CIO and IT strategist. “And those that don’t win the Stewie can take satisfaction in the exemplary good they do their organizations. We salute them all.”

Who deserves a Stewie? A data steward who works hard and loves the job; who knows everything there is to know about the organization’s data; who has saved the day by enabling accurate data-driven decisions – maybe more than once.

The Data Steward of the Year will be judged by a panel from the Data Roundtable, a group of data governance, master data management (MDM), data integration and data quality experts, including Jill Dyché, David Loshin, Jim Harris, Phil Simon, Karen Lopez, Joyce Norris-Montanari and last year’s Data Steward of the Year, Barbara Deemer of Sallie Mae.

Judges encourage submitters to create a YouTube video that details why the data steward should be appointed to the Data Stewards Hall of Fame. Entries will be judged on creativity, ingenuity and sincerity. The winners will be announced on Data Stewards Day, Oct. 11, 2012.

For more information on Data Stewards Day or to enter go to datastewardsday.com and submit a nomination by Sept. 24. Nominators should provide a detailed explanation of no more than 300 words on why their nominee should win a Stewie.

Datanami