February 23, 2012

Supercomputing Center Set to Become Big Data Hub

Nicole Hemsoth

This week the Texas Advanced Computing Center (TACC) at the University of Texas at Austin announced a $10 million commitment from the O’Donnell Foundation to enhance their data-intensive science capabilities.

TACC says that this funding will be used for new data infrastructure that will allow the center to broaden the scope of big data problems in science. Researchers in diverse fields, including bioinformatics, neuroscience, structural biology and astrophysics, among others will be able to advance beyond current constraints—hopefully yielding new discoveries along the way.

According to TACC officials, the data infrastructure plans include:

high-performance, petascale data storage system accessible to all of TACC’s computing and visualization systems, and easily expandable to hundreds of petabytes in the coming years;
a computational system with embedded high-speed storage that is optimized for data-intensive computing, including massive data processing and analysis; and
new servers and storage to host innovative Web-based and cloud computing services, including science portals and gateways that enable researchers around the world to use the university’s research applications.

To delve further into this, we talked about the funding and what it means for the future of data-intensive science at TACC with the center’s director, Dr. Jay Boisseau. He shed light on some of the specifics of the upcoming technology purchases and exploration areas—and also lent insight about how the center is already working with big data problems in science with existing clusters.

Datanami: What elements is TACC seeking to fulfill the need for new high-end data-intensive capabilities? Does this mean a new cluster entirely–and if so, can you say who or what type of system you’re considering?

Boisseau: We will deploy a new high-speed parallel filesystem (estimated start size 20 PB) that is accessible from all TACC resources, and that can be scaled up from 100+ petabytes.

We will also deploy a cluster optimized for MapReduce/Hadoop-style calculations–lots of node-level disk for persistent storage of data collections for which this programming model is optimal.

We will also provide a new high-throughtput computing capability and larger shared memory memory capabilities. We already provide these capabilities on Ranger and Lonestar, but we will provide them at greater scale/emphasis in the new systems. They may be part of the same cluster that provides the MR/Hadoop capability.

We will also provide a new hosting environment for science portals and gateways that host front-end applications to workflows that leverage the new data resources ( as well as our current and future HPC and vis resources).

We will also evaluate opportunities for using SSD and other technologies for new data applications.

Datanami: How is TACC benchmarking or evaluating data-intensive computing solutions? Is this different than making HPC/supercomputing decisions in that speed might not be the defining factor and how do rankings like the Graph500 or other HPC/data-intensive benchmarks fit into your decision-making process?

Boisseau: It is different, and we’re still working through some of this. We are going to host a workshop on May 22-24 (announcement coming next week) at which we expect to discuss these and other relevant questions. We think there is a great need even for clearer definitions of terms and requirements for ‘data intensive computing,’ ‘data driven science,’ etc., and understanding the science and technology requirements of classes of applications will help us develop a methodology for carefully designing the configurations for new data intensive computing systems.

Datanami: TACC is already home to high end HPC systems; where will a new data-intensive system fit into your existing technology “portfolio” of supers—and what applications will be specific to any new machine that might not have been acceptable to run on other TACC clusters

Boisseau: We’re home to high-end HPC (Ranger, Lonestar) *and* scientific visualization systems (Longhorn, Stallion), and we just upgraded our existing data management systems: our data collections hosting system (Corral) and our archival system (Ranch).

We have added new ‘data intensive computing’ capabilities to some systems: software to bundle jobs and enable HTC on Ranger and Lonestar; large shared memory nodes (1TB) on Lonestar; and a Hadoop-style subsystem on Longhorn. The major new systems we will deploy with this new funding are: a separate cluster designed and optimized for more data driven science applications by offering larger MR/Hadoop style capability, larger shared memory, better HTC capabilities, etc.; and also a large high-speed filesystem that all HPC, visualization, and data clusters can access.

Thus, our HPC, visualization, and data intensive computing cluster systems will all have access to a high-speed parallel file system, a data collections management system, and a data archival system. A gateway hosting environment will host portals and other applications that can leverage all of the back-end systems.

In addition to TACC’s upcoming capabilities for departmental projects, the center says the new resources will also augment TACC’s ability to support research at related university institutions, including biomedical research at UT Southwestern Medical Center. As the statement noted, “Novel data-driven projects such as consumer energy usage behaviors being studied at Austin’s Pecan Street Inc. will also benefit, as will major national projects in which the university is a key partner such as the iPlant project, a $50 million National Science Foundation-funded project to help with plant research, including improving food yields and producing more effective biofuels.”

The O’Donnell Foundation has already contributed $6 million of the commitment to The University of Texas at Austin and will provide $2 million more in each of the next two years. The university will also provide an additional $2 million over five years to hire new technology professionals at TACC, who will support and accelerate new research in ICES and other university programs that leverage these data resources.

Related Articles

The New Era of Computing: An Interview with “Dr. Data”

Pervasive Lends Supercomputing Center Analytical Might

Cray Opens Doors on Big Data Developments

Applications: Research Analytics, Visualization

Technologies: Network, Processors, Storage, Systems

Sectors: Academia

Tags: big data, data intensive, infrastructure, petascale, storage, supercomputing, tacc, visualization

Only registered users may comment. Register using the form below.

Check off newsletters you would like to receive*
- HPCwire
- EnterpriseTech
- Datanami
- Technology Conferences & Events
- Advanced Computing Job Bank
- Technology Product Showcase
Email*
Name*
First Last
Organization*
Job Function*
Industry*
Country*
City*
State*
Province*
- Please check here to receive valuable email offers from Datanami on behalf of our select partners.

Supercomputing Center Set to Become Big Data Hub

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

May 10, 2024

May 9, 2024

May 8, 2024

May 7, 2024

Sponsored Partner Content

Get your Data AI Ready – Celebrate One Year of Deep Dish Data Virtual Series!

Supercharge Your Data Lake with Spark 3.3

Learn How to Build a Custom Chatbot Using a RAG Workflow in Minutes [Hands-on Demo]

Overcome ETL Bottlenecks with Metadata-driven Integration for the AI Era [Free Guide]

Gartner® Hype Cycle™ for Analytics and Business Intelligence 2023

The Art of Mastering Data Quality for AI and Analytics

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Top 6 Strategies for Reducing Data Warehouse Costs

Building an Operational Data Warehouse for Real-time Analytics

Sponsored Multimedia

The Power of DataOps: Bring Automation to Life
No Comments

Tactical Steps for Cloud Migration
No Comments

Immuta Data Access Platform
No Comments

Data Mesh: Fact or Fiction?
No Comments

Contributors

Featured Events

AI & Big Data Expo North America 2024

CDAO Canada Public Sector 2024

AI Hardware & Edge AI Summit Europe

AI Hardware & Edge AI Summit 2024

CDAO Government 2024

Supercomputing Center Set to Become Big Data Hub

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

May 10, 2024

May 9, 2024

May 8, 2024

May 7, 2024

Most Read Features

Most Read News In Brief

Most Read This Just In

Sponsored Partner Content

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Sponsored Multimedia

Contributors

Featured Events

Share

Copy short link