Intersect360 HPC500
Language Flags

Translation Disclaimer

More Tabor Communications Publications

HPCwire HPC in the Cloud Digital Manufacturing Report Intersect360 HPC500

January 12, 2012

The Evolving Art (and Business) of Data Curation


Over the course of 2012, one of the key phrases we can expect to start hearing a lot more often is data curation. While the term has been around for quite some time, particularly in academia and library science, it is finding a new home in enterprise contexts—and is attracting new attention in academia from funding agencies like the National Science Foundation.

For instance, the Johns Hopkins University Sheridan Libraries was awarded a $20 million NSF grant to build a “data research infrastructure” for managing the growing volumes of digitized research information. The five-year grant was one of two awarded by the NSF for what is being called “data curation.”

According to Sayeed Choudhury who directs the Digital Research and Curation Cetner and serves as associate dean at the university’s libraries, the impetus for the Data Conservancy was the need for large-scale digital data management in the science community. This is especially important, he says, because the application extend beyond these research communities, into the humanities and social sciences and beyond.

“Data curation is not an end but a means.” He says “Science and engineering research and education are increasingly data-intensive, which means that new management structures and technologies will be critical to accommodate the diversity, size and complexity of current and future data sets and streams.”

According the University of Illinois’ Graduate School of Library and Information Science, “Data curation is the active and on-going management of data through its lifecycle of interest and usefulness to scholarship, science and education.” The university further defines data curation, noting that it should “enable discovery and retrieval, maintain its quality, add value and provide for re-use over time.” According to their definition, this includes everything from authentication, archiving, management, preservation, retrieval and representation.”

The importance of data curation, especially over the coming years as organizations acquire and store increasingly diverse, mounting data sets, is being felt in the enterprise as well.

As Edward Curry and his team from the Digital Research Institute in Ireland noted, “With increased utilization of data within their operational and strategic processes, enterprises need to ensure data quality and accuracy…traditional approaches to curation are struggling with increased data voumes and near real-time demands for curated data.” He says that in response to these pressures, businesses have an ever-mounting set of motivations behind exploring data curation options.

Curry’s team points to Gartner’s figures that many organizations that are swimming in massive amounts of data have few policies in place to ensure the clarity, consistency and reliability of all of their data, in part due to lackluster data curation policies. In this era of big data and poor management of it, when then does it seem, on initial search, that there are no companies to be found offering pure-play data curation services on a consultancy basis?

It turns out there are plenty of companies with data curation possibilities—it’s just called something far different. Think in terms of data integration, data quality, data management—all of the phrases that tied into the major enterprise analytics platforms, but just not tied directly to the more academic-sounding concept of data curation.

Most of the major enterprise analytics and data management platforms offer the key characteristics of data curation as cited by the University of Illinois above. Information discovery, retrieval, quality checking, and usability over time are all parts of various all-in-one data management tools from companies, but for some reason, no vendors are tying their messaging around this important keyword phrase.

Consider companies like Informatica, Pentaho, Pervasive, IBM, SAS and others which offer platforms for all of the aspects of a sound data curation as cited above.In academia, projects like CAPS and the new curation effort at JHU are making headway by associating new solutions with the data curation movement.

Ina vendor landscape that hinges on the next big buzzword, the concept of data curation as the catchall term to replace the wider concept of “data management” seems fitting given the growing needs for more than just management of data—it requires a careful, even meticulous approach to defining the very environment and policies that management will work within.

Related Stories

Beyond Big Data: Addressing the Challenge of Big Applications

Pervasive Lends Supercomputing Center Analytical Might

New Techniques Turbo-Charge Data Mining

Share Options


Subscribe

» Subscribe to our weekly e-newsletter


Discussion

There are 0 discussion items posted.


Intel Accelerate Your Innovation http://www.intel.com/go/hpc



Sponsored Links

Sponsored Whitepapers

Advancing Bioinformatics Using an Integrated MapReduce Framework

05/15/2012 |

The integration of new technologies in bioinformatics is essential to accelerating breakthroughs that will improve lives. However, these approaches bring tremendous amounts of data and complex analysis, adding to the challenges in biologically-tailored medicine. This whitepaper examines the deployment of bioinformatics compute models that leverage MapReduce in both public and private clouds. The approach uses the Platform Symphony MapReduce reference architectures to enable greater resource utilization and performance gains within the genomic data science center.

Download this Whitepaper...

Parallel Programming and the IMSL Numerical Libraries

02/13/2012 | Rogue Wave Software

This white paper provides an overview of the fundamentals of parallel computing with related details regarding the use of Rogue Wave’s IMSL numerical libraries in the area of High Performance Computing (HPC).

Download this Whitepaper...

View the White Paper Library

Sponsored Multimedia

Take a Look Inside the Analytics 2011 Conference

In this Inside Analytics 2011 video series, you will hear from a number of key conference participants on topics including high-performance analytics and why it is a game-changer for businesses, the development of the SAS® High-Performance Analytics suite and how to empower the analytical expert.

View Multimedia

Stop losing sleep: Remedies for managing Data Center complexity worries

Complimentary Webcast! Break free from the database vendors that force you to keep investing in additional skills and hardware to accommodate the inefficiencies of their software. Learn how you can achieve higher DBA efficiency and give your DBAs more time to focus on strategic projects and add more value to your business. Join us to hear best practices and client experiences on reducing both the risk and cost associated with growing Data Center complexity.

View Multimedia

More Multimedia



Datanami Conferences Ad

Featured Events

May 23-25, 2012
Business Analytics Innovation Summit
Chicago , IL

June 5-7, 2012
TIA 2012: Inside the Network
Dallas, TX
United States

June 17-21, 2012
ISC '12 - The HPC Event
Hamburg
Germany

June 20-21, 2012
GigaOm Structure Data
San Francisco, CA
United States

June 21-22, 2012
Advanced Analytics for the Retail Industry
Boston, MA
United States

» View/Search Events

» Post an Event