Why the Cloud and Big Data? Why Now?
The potential benefits of putting big data in the cloud have been known for years. The simplicity and flexibility that comes with having infrastructure as software appeared to be an attractive solution to the complex management big data projects require. And yet, enterprises were reticent to move their big data projects to the cloud. Five years ago, the enterprises that were experimenting with big data did so primarily in the form of one-off projects. The infrastructure demands for a big data project, while complex, were manageable. On top of that, concerns about security, privacy and cost largely outweighed the benefits that would come with the flexibility and resource efficiency of the cloud.
But now, enterprises have moved beyond big data projects and are striving to operationalize and leverage big data across their entire organization. Big data can improve every part of a business from providing the insights for new analytic applications to augmenting traditional on-premise systems. And as big data aspirations grow, managing complex big data infrastructure has gone from being a painful annoyance to a full-blown IT infrastructure bottleneck that threatens to stifle the value of data.
The cloud has emerged as an increasingly popular means of scaling big data usage. Further signaling this sea of change, enterprise software giants Oracle and IBM both announced big data analytics cloud platforms late this year, joining a push already underway by AWS, Google and Microsoft.
But why the sudden change, and why now?
Cloud Becomes Enterprise Ready
In the beginning, cloud providers were very focused on mid-market companies where ease of use and flexibility were at a premium, but large enterprises come with a laundry list of requirements. Over the last five years, cloud providers have placed great focus on security and compliance while proving that scale, reliability and scale could translate to the enterprise, increasing large enterprises’ trust in the cloud.
Some of the best examples of the trust large enterprises now have in the cloud come from highly regulated industries, like finance and healthcare. Capital One is one of the nation’s largest banks and now leverages AWS to develop, test, build, and run its most critical workloads, including its new flagship mobile-banking application. Philips is a Dutch company that focuses on the areas of healthcare, consumer lifestyle, lighting products and services. The Philips HealthSuite digital platform analyzes and stores 15 PB of sensitive patient data gathered from 390 million imaging studies, medical records and patient inputs to provide healthcare providers with actionable data. Philips uses AWS to help protect patient data as its global digital platform grows at the rate of one petabyte per month.
These use cases demonstrate just how much work leading cloud providers, like AWS, have put into meeting the requirements of large enterprises. Five years ago, these highly regulated industries shied away from the cloud, now the same industries trust the cloud with some of the most sensitive, regulated data.
Early Adopters Paved the Way
While many enterprises were reticent to move their big data projects to the cloud because of concerns for security and compliance, some of the most influential and innovative companies today were being born in the cloud. These companies, like Airbnb, Lyft and Netflix, have set the standard for best practices and gained a competitive edge over incumbents by moving faster with the cloud.
Airbnb has disrupted the travel industry, leveraging the convenience and flexibility of the cloud to redefine travel. Lyft has grown spectacularly while maintaining a small engineering team in large part because of the ease of use and automation the company has gained from the cloud. Netflix is synonymous with watching movies at home, but it wasn’t all that long ago that we’d have to drive to a rental store if we wanted a night in with popcorn and a movie.
Enterprises Are Feeling the Pain
As big data projects grow, the complexities multiply exponentially. Even as the cost of storage has fallen, the cost of managing big data infrastructure grows exponentially as big data scales. That includes the cost of hiring rare, expensive talent like the modern data engineer.
Data engineers are the rare individuals who understand infrastructure and architecture, but can also think about how to process data and how the data will be used. They understand open source and how to fit the latest project into the organization and build processes around it. The problem is that as infrastructure and processes grow more complex with scale and the demands of analysts’ mount, data engineers simply can’t scale in tandem. Too much of their time is eaten up by mounting manual labor, like capacity planning and software updates.
On top of that, depending on the processing tools used – MapReduce, Spark, Hive, etc. – you likely need additional experts with specific expertise. That means that scaling your data infrastructure team gets very difficult and very expensive, fast. Before long, enterprises find it necessary to maintain small armies of administrators and engineers just to hold up their big data projects, let alone innovate.
The cloud removes much of the infrastructure and software management burden, and with the right tools in place, one administrator can serve hundreds of users with little to no administrative handholding. The thinking on cloud and on-premise has done almost a 180-degree flip. Now, the cost, slowness and risk of running big data initiatives on-premise are making enterprises pause at the prospect of running their own data centers.
Finally, cloud providers are innovating at a pace that in-house teams simply can’t keep up with, ensuring that an enterprise’s infrastructure is consistently cutting edge, and allowing internal teams to focus on the real challenges – creating best practices and encouraging a company culture that uses big data effectively to drive the business forward.
About the author: Ashish Thusoo is the CEO and co-founder of Qubole, a cloud-based provider of Hadoop services. Before co-founding Qubole, Ashish ran Facebook’s Data Infrastructure team; under his leadership the team built one of the largest data processing and analytics platforms in the world. Ashish helped create Apache Hive while at Facebook.