Follow Datanami:
January 9, 2019

Build on the AWS Cloud with Your Eyes Wide Open


Building data applications on public clouds like Amazon Web Services is a no brainer for many organizations these days. The tools for ingesting, storing, and processing data in the cloud are rapidly maturing, and best of all, they’re largely pre-integrated, which saves data scientists and engineers time and money. So what’s the catch?

Amazon Web Services has become the 800-pound gorilla for the public cloud business. The $27-billion company currently commands 40% of the market for IaaS and PaaS services, which is more than all of its major competitors combined. Revenue growth for AWS is approaching 50% annually as the enterprise warms to the idea of letting somebody else worry about the infrastructure stack. Best of all, as AWS’s cloud gets bigger, storage and processing costs go down for existing customers, which is sort of the opposite of what IT buyers have grown to expect.

On the data front, AWS offers a suite of integrated tools that is unrivaled in the industry. The company just launched a pre-built data lake service called Lake Formation that runs on the S3 object store and rivals on-premise Hadoop clusters in data management capabilities. AWS offers Glue for ETL jobs, Lambda for building data pipelines, and a host of pre-built processing services (EMR, Kinesis, ElasticSearch Service, Quicksite) and databases (Athena, Redshift, Neptune, Timestream) – that can crunch and transform data in just about any manner.

Data scientists can get their hands dirty with low-level machine learning tooling like SageMaker, which AWS unveiled in late 2017 and which has already been adopted by customers like Cox Automotive, Expedia, Major League Baseball, and FICO. AWS moved up the stack in late 2018 when it unveiled higher-level applications with pre-built AI capabilities, such as the new Personalize and Forecast offerings, which are largely built on SageMaker, not to mention a slew of new SageMaker offerings, like data labeling, reinforcement learning, and a third-party marketplace.

Ten years ago, AWS appealed to companies that no longer wanted to run their own machines to power standard IT fare, like Web servers, file servers, and databases. Today the company is doing the same thing for big data and AI.

So the question is: Why would a company not use AWS for big data and AI?

Cloudy Forecast

According to Paul Miller, a senior analyst at Forrester who covers the IoT and cloud, it’s perfectly logical for most organizations to avail themselves of what AWS and its cloud competitors Microsoft Azure and Google Cloud have built for many types of applications.

“It may not make sense to worry about spinning up and managing a Spark or Hadoop cluster in your own data center unless it’s the core of your business,” Miller tells Datanami. “For most organizations, the overhead of creating and managing and operating those clusters simply doesn’t make sense. It makes more sense to move those workloads to some of the large cloud providers. We’ve seen Google, Amazon, and Microsoft all see some success there with their managed offerings in that space.”

However, running machine learning workloads on the cloud is not a black and white issue, Miller says. Despite the hullabaloo over cloud, the majority of enterprise workloads continue to run on premise, and cloud providers should recognize that not every application is a good fit for the cloud. In particular, the cloud-versus-on-premise question gets a bit more murky when one brings edge computing into the conversation.


“If you’re working on an ecommerce engine or something in the heart of a city with a lot of bandwidth, then yes you may rely on the cloud to do the training and inference,” the Forrester analyst says. “But if you’re inferring the performance of a wind farm in the highlands of Scotland over 3G or 4G network connection, you don’t want to wait for the cloud to tell you what to do, so you need the hybrid model there.”

Miller applauds AWS’s move into the hybrid model via its partnership with Dell EMC, which will bring many AWS capabilities to on-premise VMware images. “That makes a lot of sense and it’s going to be very interesting to see how companies adopt them, particularly as they look to put things like machine learning and IoT with Greengrass onto those devices at the edge.”

Don’t Get Boxed In

There’s little doubt that the big data and AI offerings of AWS are compelling, and the same goes for Microsoft and Google, which have created their own niches in the industry. But there’s another variable at play here – namely that organizations worry they’ll be unable to move away from the cloud providers once they grow dependent on them.

Forrester’s Miller says the question of lock-in comes up quite often. “We actually talk to clients all the time who will come and say ‘We want to move to the cloud, but we want to avoid lock in,'” he says.

There are a couple of ways to think about lock-in, Miller says. For starters, it’s true that AWS wants to make it hard for customers to leave. However, that’s true for all companies. The question, then, comes down to what AWS (or any other vendor) does to prevent customres from leaving. If it’s by providing a quality service at a decent price, there’s not much to complain about. But if it’s by making it burdensome to reclaim one’s data, then that’s another story.

(Ljupco Smokovski/Shutterstock)

“They just need to do this with their eyes open,” Miller says. “It’s not really a question of being locked in or not. It’s more a question of the friction that each of these choices bring…They want their services to be stickier, sure they do. But they also want them to be more valuable to their users, and those two go hand in hand.”

Organizations can lower the risk of lock-in situations by sticking with the most basic services possible. That means avoiding higher-order services like Redshift and Kinesis, and installing and running Spark or Kafka directly on EC2. Organizations will need DataOps skills to pull this off, not to mention being well-versed in things like Kubernetes, but the reward will be the capability to pick up and move to Azure or Google Cloud whenever they want.

While sticking with basic AWS services can lower the risk of lock-in, it also brings its share of downsides, according to Miller.

“When you talk to a client, [you ask] why are you moving to the cloud at all? What you doing this for?” he says. “And they’ll begin to say things like faster time to market, greater agility, greater access to new services, more innovation. And that’s totally opposed to the approach they just said they wanted to take around building it all themselves.”

In other words, there is no free lunch. If you want to avail yourself of all the data goodness that AWS is building into its platform, you must accept some degree of lock-in, or stickiness, as the tradeoff. If you want to avoid lock-in at all costs, then you can’t really benefit from the ecosystem of integrated tools that AWS is building. The same goes for Microsoft Azure, Google Cloud, and every other cloud, according to Miller.

“Yes, it might be a little painful to get out of any of them,” he says. “But the value gained from using them properly far outweighs the cost in a year or two year’s time of reversing course — so long as you’re doing it with your eyes open and reviewing your options all the time.”

Related Items:

As Cloud Grows, Is Resistance to AWS Futile?

AWS Bolsters Machine Learning Services at Re:Invent