September 18, 2020

WANdisco Drops In On the Hadoop Cloud Migration Party

Alex Woodie

Organizations that want to move their on-prem Hadoop cluster to the cloud may be interested in a new solution unveiled yesterday by WANdisco. Called LiveData Migrator, the software allows customers to move their Hadoop data to any public cloud without taking the cluster offline, and guaranteeing the data is up-to-date and accurate in both locations until the migration is complete.

Moving small amounts of data, such as a few terabytes, to the cloud is not that difficult. You can pipe the data over the Internet or ship it on a disk drive. But when you have large amounts of transactional data – that’s where it gets interesting, says David Richards, the CEO of WANdisco.

“When you have millions of transactions per second, the data is changing all the time, and petabyte scale data–how do I move that data from on-premises to the cloud? Richards says. “The answer is, quite frankly, you don’t, because it’s a massive service gig, or you have to undergo an elongated outage that’s just not realistic. So companies just don’t do it.”

That’s the situation that confronted GoDaddy, the website domain registration company, which operated an extremely active Apache Hadoop cluster that processed millions of transactions per day. Turning off its 800-node Hadoop cluster while it migrated data to AWS would cause a prolonged outage for the company.

But with LiveData Migrator, GoDaddy was able to migrate 70 TB of HDFS data from its on-prem cluster to AWS S3 in just five days, according to Wayne Peacock Chief Data and Analytics Officer, GoDaddy.

GoDaddy is using LiveData Migrator to move 500 TB of data from an on-prem Apache Hadoop cluster to AWS and to keep the two clusters in synch in a hybrid set-up (Michael Vi/Shutterstock)

“We found WANdisco’s LiveData Migrator to be the optimal approach to deliver the best time to value, rather than running a more time-consuming and costly manual migration project internally,” Peacock states in a press release.

WANdisco has been an active player in the Hadoop scene for years, and developed some of the most sophisticated data replication technology to provide high availability for Hadoop clusters. That technology formed the basis for LiveData Migrator, says Richards.

“We’ve got patented technology that surrounds our distributed coordination engine that understands the sequence of transactions at massive scale and is able to maintain that order of transactions in light of all sorts of [disruptions] to the wide area network,” Richards said. “So in essence, what we’ve done is leveraged that technology to the specific use case. It’s taken us two years to build it from a mathematical design. But we’ve done it.”

Richards says that math allows LiveData Migrator to do something unique in the industry: To scan HDFS only once, and then maintain an up-to-date copy of the files, no matter the volume nor the velocity of those changes. Depending on transactional volume, the software can move 1PB of data to the cloud in 30 to 60 days, Richards says.

“We realized we needed to come up with a screaming fast Formula 1 engine to move data on premise to cloud,” he says. “We guarantee one scan of the data. There’s not multiple scans of the data. Everybody else is going to have to recursively scan data until they get it down to an invisible point where they say, OK unplug it. We just do one scan. So we’re exponentially faster than anybody else just for that one reason. That requires some pretty complex math.”

David Richards is the CEO and co-founder of WANDisco

While the math behind LiveData Migrator is complex, using the product is relatively simple. According to Richards, customers simply install the software as a client application on the Hadoop cluster. The software, which most customers install on a repurposed data node, monitors the cluster’s name node.

The software does not require a beefy machine. Typically, a machine with 50% to 70% of the RAM of the name node will do the trick. We just key off of the name node,” Richards says. “The number of transactions we’re going to hold is bound by the size of the name node anyway….You can overspec the server if you want. It won’t make much difference. The cleverness is in the technology, how do a single scan very fast. We can multiplex the number of connections so I can saturate the bandwidth if I need to. But the scan is not CPU intensive.”

Customer can migrate the data using the network or they can write the HDFS data to a dedicated storage device that will then be mailed or shipped to the cloud. Some customers do not want to use the network for data migration, Richards says, and LiveData Migrator gives them that option.

Clients don’t need any special authorities to run this, either with the on-prem Hadoop cluster or the big Hadoop cluster in the sky that they’re moving to (all three cloud vendors offer their own versions of Hadoop). The software supports AWS, Microsoft Azure, Google Cloud, IBM Cloud, and Alibaba Cloud, according to WANDisco’s website. Richards says the company will support destinations like Databricks, Snowflake, and “a whole raft of them in the future.”

“In the case of AWS, this is pretty much a turnkey solution,” Richards says. “You can be up and running and doing a migration now in about 15 seconds. Our old product required a deep understanding of Hadoop. We were in the write path and so on. You don’t even have to be in the write path of Hadoop anymore. We run like a client application on Hadoop. I don’t need any administration access. I don’t need any special skills in Hadoop. I just plug it in, turn it on, connect it to the cloud, put my cloud credential in, and away you go.”

WANdisco charges for the product based on the transactional volume of the host cluster. It provides the first 5TB free. Moving 1PB would cost around $150,000, he says.

Once customers have successfully migrated their on-prem Hadoop cluster to the cloud, Richards hopes they maintain an active license to LiveData Migrator.

“Nobody is going to choose a single cloud vendor,” he says. “As much as every single cloud vendor thinks that’s what’s going to happen. That’s just not going to happen. We’re seeing increasing demand for active-active, multi-cloud. In other words, I need to arbitrary run applications either in a Azure or AWS or Google Cloud. I’m going to choose on any given day, minute, second, where I’m going to run my applications. That’s also something we’re providing.”

WANdisco Opens Up the Dance Floor Beyond Just Hadoop

WANdisco Plots Growth Solving Hadoop’s NameNode

Applications: Enterprise Analytics

Technologies: Cloud, Middleware

Sectors: Financial Services

Vendors: AWS, Google Cloud, Microsoft Azure, WANdisco

Tags: AWS, cloud, cloud migratoin, David Richards, Hadoop, LiveData Migrator, migration

Only registered users may comment. Register using the form below.

Check off newsletters you would like to receive*
- HPCwire
- EnterpriseTech
- Datanami
- Technology Conferences & Events
- Advanced Computing Job Bank
- Technology Product Showcase
Email*
Name*
First Last
Organization*
Job Function*
Industry*
Country*
City*
State*
Province*
- Please check here to receive valuable email offers from Datanami on behalf of our select partners.

April 26, 2024

April 25, 2024

April 24, 2024

Sponsored Partner Content

Get your Data AI Ready – Celebrate One Year of Deep Dish Data Virtual Series!

Supercharge Your Data Lake with Spark 3.3

Learn How to Build a Custom Chatbot Using a RAG Workflow in Minutes [Hands-on Demo]

Overcome ETL Bottlenecks with Metadata-driven Integration for the AI Era [Free Guide]

Gartner® Hype Cycle™ for Analytics and Business Intelligence 2023

The Art of Mastering Data Quality for AI and Analytics

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Top 6 Strategies for Reducing Data Warehouse Costs

Building an Operational Data Warehouse for Real-time Analytics

Sponsored Multimedia

The Power of DataOps: Bring Automation to Life
No Comments

Tactical Steps for Cloud Migration
No Comments

Immuta Data Access Platform
No Comments

Data Mesh: Fact or Fiction?
No Comments

Contributors

Featured Events

AI & Big Data Expo North America 2024

CDAO Canada Public Sector 2024

AI Hardware & Edge AI Summit Europe

AI Hardware & Edge AI Summit 2024

CDAO Government 2024

WANdisco Drops In On the Hadoop Cloud Migration Party

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

April 26, 2024

April 25, 2024

April 24, 2024

Most Read Features

Most Read News In Brief

Most Read This Just In