November 24, 2015

DeepSQL Kicks Evolutionary Genetic Research Into High Gear

Alex Woodie

The Ruđer Bošković Institute (RBI) in Croatia blasted through a bottleneck in its evolutionary genetic research by implementing a parallelized storage engine for MySQL from Deep Information Sciences that serves genomic data to its analytic computer cluster.

RBI is Croatia’s leading scientific institute, with 850 scientists conducting research into various natural and biomedical fields. In particular, its Department of Molecular Biology has created novel ways of looking into how human genes evolved by comparing each human gene to all the genes in the world’s organisms.

This research requires good stewardship of data. Prior to implementing the DeepSQL storage engine from DeepIS, the RBI relied on a MySQL database from Oracle equipped with the InnoDB storage engine from Percona to ingest, prep, and serve terabytes worth of fresh genomic data to a 100-node cluster that does the heavy analytic lifting. The MySQL database lived on a single-node cluster equipped with 200GB of RAM and 2TB.

This setup worked fine when the size of the genomic data set was relatively small. But as the research project ramped up, the researchers found that database was quickly becoming a bottleneck.

“The database is not growing slowly–its growing faster and faster because the cost of sequencing goes down,” explains RBI’s Dr. Martin Sebastijan Šestak, a post-doctoral researcher at RBI’s Department of Molecular Biology, Laboratory of Evolutionary Genetics. “That’s actually a big problem for us. The number of genomes is constantly increasing, and we want to keep up to date with that information, which means we need to constantly update our database.”

Eventually, it took Šestak’s team three to four days to load fresh genomic data from various public sources into the single-core MySQL database, which currently has data measuring in the hundreds of millions of rows. Then it took another one to two days to run the queries on the high-performance computing cluster. While the size of the data wasn’t particular huge, the need to continually join 50GB tables into the database was becoming a real bottleneck.

“As our database grew to 250GB with joins, larger than our 200GB RAM server, InnoDB got slower and slower,” he says. “Everything slowed down to a crawl. It was impossible to get anything done on schedule.”

Goosing MySQL

Šestak looked at different database technologies, including Percona‘s TokuDB, an open source storage engine that plugs into MySQL, and essentially replaces InnoDB. While performance improved a bit with TokuDB due to its use of fractal tree indexing, it was still not as fast as Šestak desired.

While attending a Percona conference in Santa Clara this spring, Šestak heard of another MySQL storage engine called DeepSQL. “I saw that there were some new storage engines that I didn’t hear about before,” he tells Datanami, “so I decided to download it and benchmark it against other solutions.”

DeepSQL, if you’re not familiar with it, replaces the B-Tree indexing used in most database storage engines with something that DeepIS calls Continuously Adaptive Sequential Summarization of Information, or CASSI. Instead of continually writing data to disk, CASSI uses machine learning algorithms to better predict the optimal moment to write data to disk, based on the particular configuration and capability of a computer. It also implements parallelism to boost performance.

These approaches can erase bottlenecks in an analytics pipeline (or at least push them elsewhere). When DeepIS launched the technology at the Percona conference earlier this year, it claimed a MySQL database equipped with the DeepSQL storage engine could up to 64 times faster over a highly tuned instance of InnoDB. This “hyper-indexing” capability of a DeepSQL database makes it seem like it’s running on SSDs, even if it’s on plain old HDDs, the company claims.

What’s more, DeepSQL delivers all this performance boost without requiring underlying changes to the database or new APIs for the application, since it plugs into the MySQL architecture.

Genomic Boost

Earlier this year, RBI switched to DeepSQL Community Edition (free for organizations with less than $1 million in annual revenues) to power the genomic database. The impact on performance was dramatic and instantaneous.

According to RBI, the periodic uploads of fresh data take just one day, instead of the previous four. Data load times are three times faster than under TokuDB, while queries run five times faster. DeepSQL also shrank RBI’s storage footprint by delivering 40 percent compression.

This has freed Šestak and his colleagues to concentrate on their research rather (they have developed a geologic-like method of gradually uncovering genetic “layers” in the genomic codebase to track evolution) than fiddling with computers. “When the size grows even bigger, I’ll still be able to analyze that without moving it to Hadoop or some other technology that I need to learn and administer,” he says.

Speed and scalability are critical when you’re dealing with the type of data that RBI is, says Deep Information Sciences’ Chief Strategy Officer Chad Jones. “Tuning the application and tweaking my MySQL configurations only get you so far–they consume a lot of time for not a lot of reward,” he says. “We’re thrilled that RBI is using DeepSQL to supercharge its research into biological evolution and that, even without a DBA, they’re able to achieve orders of magnitude better performance and scale from their MySQL environment.”

The parallelization of the DeepSQL storage engine gives RBI lots of headroom to grow their analytic pipelines. “They [Oracle, the owners of MySQL] really should implement parallel processing,” Šestak says. “But it’s not really a priority for them. But it is for us, and that’s why we fit DeepSQL into our pipeline.”

Health Care Emerges as Hadoop Use Case

NIH Effort Looks to Compress Big Genomics Data

Applications: Research Analytics

Technologies: Frameworks, Storage

Sectors: Academia, Biosciences, Government

Vendors: Deep Information Sciences, Oracle, Percona

Tags: deepsql, genetic research, genomic, MySQL, Ruđer Bošković Institute

Only registered users may comment. Register using the form below.

Check off newsletters you would like to receive*
- HPCwire
- EnterpriseTech
- Datanami
- Technology Conferences & Events
- Advanced Computing Job Bank
- Technology Product Showcase
Email*
Name*
First Last
Organization*
Job Function*
Industry*
Country*
City*
State*
Province*
- Please check here to receive valuable email offers from Datanami on behalf of our select partners.

DeepSQL Kicks Evolutionary Genetic Research Into High Gear

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

April 25, 2024

April 24, 2024

April 23, 2024

April 22, 2024

Sponsored Partner Content

Get your Data AI Ready – Celebrate One Year of Deep Dish Data Virtual Series!

Supercharge Your Data Lake with Spark 3.3

Learn How to Build a Custom Chatbot Using a RAG Workflow in Minutes [Hands-on Demo]

Overcome ETL Bottlenecks with Metadata-driven Integration for the AI Era [Free Guide]

Gartner® Hype Cycle™ for Analytics and Business Intelligence 2023

The Art of Mastering Data Quality for AI and Analytics

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Top 6 Strategies for Reducing Data Warehouse Costs

Building an Operational Data Warehouse for Real-time Analytics

Sponsored Multimedia

The Power of DataOps: Bring Automation to Life
No Comments

Tactical Steps for Cloud Migration
No Comments

Immuta Data Access Platform
No Comments

Data Mesh: Fact or Fiction?
No Comments

Contributors

Featured Events

AI & Big Data Expo North America 2024

AI Hardware & Edge AI Summit Europe

AI Hardware & Edge AI Summit 2024

CDAO Government 2024

DeepSQL Kicks Evolutionary Genetic Research Into High Gear

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

April 25, 2024

April 24, 2024

April 23, 2024

April 22, 2024

Most Read Features

Most Read News In Brief

Most Read This Just In

Sponsored Partner Content

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Sponsored Multimedia

Contributors

Featured Events

Share

Copy short link