Language Flags

Translation Disclaimer

HPCwire HPC in the Cloud Digital Manufacturing Report Green Computing Report
ISC'13

August 28, 2012

Do Botanists Dream of Electric Sheep?


Botany is a tricky subject when it comes to big data. The study of plants, like the study of most biological subjects, has expanded into the genome-mapping realm.

Storing all of the DNA bits of any genome is quite data-intensive, especially when it comes to exotic plant life. Further, while physics and chemistry have for the most part settled on universally accepted terms that split into neat categories when they need to be queried, botany is much more diverse, making it difficult to recall necessary information when conducting a study.

To remedy this situation, Ramona Walls of the New York Botanical Garden and several colleagues across the world of botany have developed an ontological guide for “accessing and analyzing the rapidly growing pool of plant genomic and phenomic data.” Essentially, Walls et al were trying to accomplish two things: providing standard definitions and divvying up these standard definitions such that they can easily be found by computers while cross-referencing them so they can be easily analyzed by computers.

 “By providing standardized definitions for the terms used by scientists to represent these classes, and by defining the logical relationships among these terms, ontologies make information about content explicit for computers, allowing them to discover common meaning in diverse data sets.”

A big difference between humans and computers is the ability to understand nuance in language. It is a skill we develop as we learn language for the first time, making it more or less natural and therefore more difficult to teach or, in a computer’s case, program. In this specific case, a trained botanist would know that the words petiole, midrib, and frond are related to the word leaf in that frond is a type of leaf and petiole and midrib are parts of a leaf. A simple computer search engine would not.

This would not be a problem if a researcher could themselves sift through the research to complete a study he or she was doing. But with as many papers that exist in the botanical world and all the data that backs up those papers, it becomes necessary to invoke the computer’s help. For example, as of right now, there are 25 species of plant whose genomes have been completely mapped.

“Data overload is an issue for nearly every branch of plant science. Complete genomes exist for 25 plant species, with more in progress (Joint Genome Institute, 2012), and new high throughput gene expression, proteomics, and phenomics data sets are being generated continuously.”

Walls is not just interested in being able to search easily, but also in being able to do analysis. One of the four key areas the paper identified as a major future uses of ontology was comparative genetics, genomics, phenomics, and development. Big data analytics has been very helpful to many a medical researcher studying human genomes in developing personalized medicine. Walls hopes said analytics can be similarly useful to botany.

So why has botany been relatively slow? According to Walls, it has a lot to do with the exotic nature of the study subjects. “It is common for the same biological entity to have different names in different taxa. For example, vascular leaf may be called ‘frond’ in cycads, ferns, or palms and ‘needle’ in some conifers. In another example, ‘BBCH principal growth stage 6’ is used in a very specialized way by the Z. mays community for flowering stage.”

It should be noted that Walls’s paper is not itself an ontology, but rather a discussion of both the most significant ontologies as well as those ontologies’ importance to botany. The paper especially lauded Plant Ontology (PO) for providing both rigorous and flexible definitions that fit well into a computer model. “The ontology approach embraced by the PO does not, however, seek to impose a single, inflexible vocabulary across the whole of plant science. Rather, its strategy of using ontology terms to enhance existing data through annotations is compatible with an approach that involves the use of multiple terminologies by different communities of scientists.”

As a result of its complexity and diversity, botany was slow to the analytics arena, according to Walls, and the need to incorporate big data systems into it has been increasing with each new study. However, Walls sees the PO being what makes botanical data accessible to computers, allowing her subject to advance with the rest of science.

Related Stories

Researchers Germinate Novel Approach to Big Bio Data

A Big Data Revolution in Astrophysics

Elsevier on the State of Big Science Data

Share Options


Subscribe

» Subscribe to our weekly e-newsletter


Discussion

There are 0 discussion items posted.

 
Cray CS300-LC

Sponsored Links

Sponsored Whitepapers

Best Practices in Big Data Storage - Sponsored by Cleversafe, Cray, DDN, NetApp, & Panasas

05/10/2013 | Cleversafe, Cray, DDN, NetApp, & Panasas

From Wall Street to Hollywood, drug discovery to homeland security, companies and organizations of all sizes and stripes are coming face to face with the challenges – and opportunities – afforded by Big Data. Before anyone can utilize these extraordinary data repositories, however, they must first harness and manage their data stores, and do so utilizing technologies that underscore affordability, security, and scalability.

Download this Whitepaper...

Big Data, Big Brains – Sponsored By NetApp

04/22/2013 | NetApp

Big data has proven to be one of the most promising yet challenging technologies for both government and industry. But, before IT leaders can harness the full potential of big data, there are key issues to address surrounding infrastructure, storage, personnel, and training.
MeriTalk surveyed 17 visionary big data leaders to find out what they see as the big data challenges and opportunities as well as how government can best leverage big data. Download the “Big Data, Big Brains Report”.

Download this Whitepaper...

View the White Paper Library

Sponsored Multimedia

SGI President and CEO, Jorge Titinger, on Big Data

SGI President and CEO, Jorge Titinger, talks about SGI's history and leadership in HPC and how that has converged into Big Data Solutions.

View Multimedia

Cray CS300-AC Cluster Supercomputer Air Cooling Technology Video

The Cray CS300-AC cluster supercomputer offers energy efficient, air-cooled design based on modular, industry-standard platforms featuring the latest processor and network technologies and a wide range of datacenter cooling requirements.

View Multimedia

More Multimedia



Job Bank

Datanami Conferences Ad

Featured Events

May 22-23, 2013
Business Intelligence Innovation Summit
Chicago, IL
United States

June 4-4, 2013
The Economist's Information Forum
San Francisco, CA
United States

June 10-13, 2013
Cloud & Big Data Expo
New York City, NY
United States

June 19-20, 2013
GigaOM Structure
San Francisco, CA
United States

June 26-27, 2013
2013 Hadoop Summit
San Jose, CA
United States

June 26-27, 2013
Big Data World Congress
London
United Kingdom

» View/Search Events

» Post an Event