Follow Datanami:
October 5, 2012

This Week’s Big Data Big Five

Datanami Staff

This week’s big data top five arrives as the news cycle leading into HadoopWorld and other big data-driven events begins to ramp up. Today, we narrow in some key announcements from software giant CSC, which went shopping for a government big data in; from GE and its key data-driven initiatives; Virginia Tech, which is focusing on HokieSpeed, and more..

Without further delay, let’s dive right in:

CSC Snaps Up 42Six

CSC announced that it has acquired 42Six Solutions, LLC (42Six). Based in Columbia, Md., 42Six is a software development company that specializes in big data processing and analytics and advanced applications support for the U.S. Government Intelligence Community (IC) and the Department of Defense (DoD). Terms of the acquisition were not disclosed.

 “Data services and analytics capabilities are rapidly becoming essential elements of commercial and government operations, and these product offerings hold unmistakable benefits to the intelligence community,” said Sashi Reddi, vice president, Big Data and Analytics, CSC. “This acquisition aligns directly with CSC’s strategy of offering customers greater value through big data expertise and intellectual property. 42Six brings some of the most highly capable developers in the intelligence field to our team and we are excited at the opportunity to infuse their entrepreneurial spirit into CSC’s global operations.”

Next — GE Wants to Industrialize Big Data >


GE Wants to Industrialize Big Data

GE Intelligent Platforms announced Proficy Historian 5.0 and Proficy Historian Analysis, which aims to provide a reliable, secure, high speed, scalable platform for aggregating Industrial Big Data.

Robust Operations Data Management is a capability and a requirement that impacts all industries where often IT and technology systems that have evolved over time cannot support strategic growth initiatives. The new Proficy Historian 5.0, combined with Proficy Historian Analysis, provides features that facilitate a conversation between IT and Senior Management because the data provides an Enterprise Operations Intelligence Platform that provides complete context in the decision making process.

In addition, Historian 5.0 allows for multiple data stores so companies can separate regulatory data from process data for easy reporting for differing requirements.

The Proficy software platform provides a highly scalable data management platform. Its lower total cost of ownership and quicker time to value allow companies to quickly make sense of data. The platform features built-in data collection, high data compression, and control and business system independence which translate to reduced customization.

“We are using our Historian as a foundation,” said Jim Walsh, Software Vice President for GE Intelligent Platforms, “for our own solutions in huge business situations that monitor thousands of assets around the world in critical infrastructure facilities.”

At GE Energy’s Monitoring & Diagnostics Centers, Proficy Historian has reduced the data footprint of monitoring more than 1500 turbines in 60 countries from 50 terabytes to 10. Year-over-year fleet analysis requiring access to massive time series data sets, which previously took days to complete, can now be completed in minutes. In addition, the company has saved millions of dollars by doing predictive analysis on the turbines facilitating planned downtime for repairs and maintenance.

Adding the new Proficy Historian Analysis to the platform provides web-based visualization of trend data, reporting, search capability and collaboration enabled by the Industrial Internet. This analysis tool gives process engineers a good view into Historian data for information. They can use the tool to drag and drop Historian tags into trend analysis to create reports on the fly facilitated by Internet access. Users can pull data and graphics into a Microsoft Word-like environment to create professional reports.

Proficy Historian 5.0 features more than 15 million tags on a single server with upwards of 3,000 client/collector connections and is able to archive 256GB with microsecond sampling. Its advanced data management capabilities feature tiered data management strategies using server-to-server capability that are designed to help companies comply with the Food & Drug Administration’s 21CFR Part 11 and other regulatory requirements. And, as part of GE’s Proficy Software platform, it provides HMI presentation and Workflow capabilities as well as analytics.

Next — Virginia Tech to Tackle the ‘Big Data’ with HokieSpeed >

 

Virginia Tech to Tackle the ‘Big Data’ with HokieSpeed

The National Science Foundation (NSF) and the National Institutes of Health (NIH) today announced nearly $15 million in new big data fundamental research projects. These awards aim to develop new tools and methods to extract and use knowledge from collections of large data sets to accelerate progress in science and engineering research.

Among the awards is a $2 million grant to Iowa State, Virginia Tech, and Stanford University to develop high-performance computing techniques on massively parallel heterogeneous computing resources for large-scale data analytics.

Such heterogeneous computing resources include the NSF Major Research Instrumentation (MRI) funded HokieSpeed supercomputing instrument with in-situ visualization. HokieSpeed was the highest-ranked commodity supercomputer in the U.S. on the Green500 when it debuted in November 2011.

Specifically, the three-university team intends to develop techniques that would enable researchers to innovatively leverage high-performance computing to analyze the data deluge of high-throughput DNA sequencing, also known as next generation sequencing (NGS).

The research will be conducted in the context of grand challenge problems in human genetics and metagenomics or the study of metagenomes, the genetic material received directly from environmental samples.

On this grant, working together are Srinivas Aluru, a chaired professor of computer engineering at Iowa State University and principal investigator; Patrick S. Schnable, a chaired professor of agronomy, also at Iowa State; Oyekunle A. Olukotun, a professor of electrical engineering and computer science at Stanford University; and Wu Feng, http://www.cs.vt.edu/user/feng who holds the Turner Fellowship and who is an associate professor of computer science at Virginia Tech. Olukotun and Feng are co-principal investigators.

In previous research Aluru has advanced the assembly of plant genomes, comparative genomics, deep-sequencing data analysis, and parallel bioinformatics methods and tools. Aluru and Schnable previously worked together on generating a reference genome for the complex stalk of corn genome that will help speed efforts to develop better crop varieties.

Feng’s relevant prior work lies at the synergistic intersection of life sciences and high-performance computing, particularly in the context of big data. For example, in 2007, Feng and his colleagues created an ad-hoc environment called ParaMEDIC, short for Parallel Metadata Environment for Distributed I/O and Computing, to conduct a massive sequence search over a distributed ephemeral supercomputer that enabled bioinformaticists to “identify missing genes in genomes.”

Feng said, “With apologies to the movie Willy Wonka and the Chocolate Factory, one can view ParaMEDIC as WonkaVision for Scientific Data – a way to intelligently teleport data using semantic-based cues. “

Feng also recently studied how heterogeneous computing resources at the small scale and large scale, e.g., HokieSpeed, could be used for short-read mapping and alignment of genetic sequences in support of the philanthropic award that he received from NVIDIA Foundation as part of its “Compute the Cure” program.

With this award, Feng, the principal investigator, and his colleagues created a framework for faster genome analysis to make it easier for genomics researchers to identify mutations that are relevant to cancer.

In all, NSF and NIH announced a total of eight new projects in response to a call for proposals on “Core Techniques and Technologies for Advancing Big Data Science & Engineering,” or “Big Data,” in March of 2012. They run the gamut of scientific approaches with possible future applications in scientific disciplines, such as physics, psychology, economics, and medicine.

“I am delighted to provide such a positive progress report just six months after fellow federal agency heads joined the White House in launching the Big Data Initiative,” said NSF Director Subra Suresh. “By funding new types of collaborations–multi-disciplinary teams and communities enabled by new data access policies–and with the start of an exciting competition, we are realizing plans to advance the complex, science and engineering grand challenges of today and to fortify U.S. competitiveness for decades to come.”

“To get the most value from the massive biological data sets we are now able to collect, we need better ways of managing and analyzing the information they contain,” said NIH Director Francis S. Collins. “The new awards that NIH is funding will help address these technological challenges—and ultimately help accelerate research to improve health—by developing methods for extracting important, biomedically relevant information from large amounts of complex data.”

The TopCoder Open Innovation platform and process allows U.S. government agencies to conduct high risk/high reward challenges in an open and transparent environment with predictable cost, measurable outcomes-based results and the potential to move quickly into unanticipated directions and new areas of software technology.

Big data is a term applied to data sets whose size is beyond the ability of commonly used software tools to capture, manage and process the data within a tolerable elapsed time. Big data sizes are a constantly moving targets currently ranging from a few dozen terabytes to many petabytes of data in a single data set.

Next — The Pervasive Opera Harmony

 

The Pervasive Opera Harmony

Pervasive Software Inc. today announced Opera Solutions, LLC is leveraging Pervasive DataRush in its technology stack.

Opera Solutions is using the high-performance parallelism of Pervasive DataRush as an enabling component in its Signal Hub technologies. Signal Hubs convert Big Data into productivity and profit gains for customers in many sectors, including government, healthcare, finance and capital markets. The performance of Pervasive DataRush helps ensure that Signal Hubs deliver insights faster and more accurately. Pervasive DataRush’s capabilities support a growing range of Opera Solutions’ Signal Hub technologies including the latest Mobiuss Portfolio and Front Office solutions.

The Pervasive DataRush analytic engine provides automatic scaling of solutions, from single servers to large Hadoop clusters, by fully utilizing every available core. It comes with a range of built-in data manipulation and predictive analytics operators and, most important for Opera Solutions, the ability to add its own advanced data science functions. As the scale of data sets explodes, Pervasive DataRush ensures Opera Solutions can expand the scope and speed of its Big Data analysis.

“Pervasive DataRush’s efficiency and ability to automatically scale,” said Armando Escalante, Chief Operating Officer of Opera Solutions, “whether on a single server or a Hadoop cluster, supports our vision for consistent, reusable, scalable Big Data analytics.”

“Our Pervasive Big Data and Analytics team is committed to helping data scientists rapidly extract actionable intelligence for business executives,” said Mike Hoskins, Pervasive CTO and general manager of Pervasive Big Data and Analytics.

Next — Investors Grant NGDATA $2.5 Million>

 

Investors Grant NGDATA $2.5 Million

NGDATA announced that it has closed $2.5 million in funding from ING, Sniper Investments, Plug and Play Ventures and U.S. angel investors. With the funds, NGDATA will further invest in product development and innovation as well as drive new business opportunities worldwide. As part of this global expansion, the company has established offices in San Francisco and just outside New York City to serve its roster of U.S.-based customers.

NGDATA enables consumer-oriented enterprises to make better recommendations and product offers to its customers as a result of advanced one-to-one marketing. The company’s flagship software solution, lily, combines internal and external structured and unstructured data —from point of sale (POS) systems, enterprise applications, social media sites and more –into one platform. lily uses machine-learning to generate a precise snapshot of consumer preferences and behaviors, providing businesses with a personalized way to target consumers and increase sales and customer loyalty.

“Billions of data points are being generated by consumers every second,” said ING Investment Director, Tom Bousmans, a lead investor, “creating a massive opportunity for enterprises to better understand their customers and prospects. NGDATA makes it easier than ever for enterprises to quickly use the data to initiate more personalized and dynamic interactions with consumers.”

Luc Burgelman, CEO of NGDATA, said, “With our solution lily, more enterprises are able to easily classify consumers to create a very personalized interaction – which ultimately results in increased sales, customer loyalty and a significant competitive advantage.”

NGDATA’s lily is a consumer intelligence solution that combines an easy-to-use, interactive big data management platform and consumer intelligence application. It enables consumer-centric organizations to store, index and analyze massive data sets, and provides a smart application that creates a 360 degree view of the consumer. The platform is already in use by financial services, news/media, and retail companies worldwide. Additionally, the platform learns from end-users’ profiles and their data interaction through machine-learning to provide business users with the ability to do real-time segmentation.

lily provides consumer-oriented enterprises with actionable insights gleaned from billions of unstructured and structured items stored in internal databases, operations systems, POS systems, social media sites, mobile traffic and web video sites. lily uses this data to help consumer-centric companies create timely and highly-personalized campaigns. lily’s consumer intelligence platform sits on top of existing IT infrastructure and integrates Apache Hadoop and Apache HBase, while simplifying the platform complexity with high-level data modeling, easy to access APIs, and tools for easy installation and deployment.

Datanami