Follow Datanami:
June 1, 2022

EMR Serverless Now Available from AWS

Amazon EMR, which ostensibly is the world’s most popular hosted Hadoop environment, is now generally available as a serverless offering, AWS announced today.

Amazon EMR Serverless will save customers time and money in several different ways, according to AWS. For starters, the new service automatically provisions and manages the underlying compute and memory needed based on the specific frameworks the customer is using, such as Apache Spark, Apache Hive, Presto, Flink, or good old MapReduce.

EMR Serverless also scales the underlying cluster up and down as dictated by changing data volumes and processing demands, the company says. That will help customers prevent over-provisioning a cluster to meet peak demand, only to have it sit mostly idle for long periods of time.

“With EMR Serverless, you can run analytics workloads at any scale with automatic scaling that resizes resources in seconds to meet changing data volumes and processing requirements,” AWS Principal Developer Advocate Channy Yun says in a blog post. “EMR Serverless automatically scales resources up and down to provide just the right amount of capacity for your application, and you only pay for what you use.”

To start an EMR Serverless job, customers select the open source framework they want to use, and then trigger their application to run using either APIs, CLIs, the AWS Management Console, or from EMR Studio, AWS says.

EMR Serverless eliminates the need for customers to configure and tune the cluster to optimize performance and cost for specific open source fameworks like Spark, Flink, Presto, and Hive, AWS says (Image source: AWS)

Pricing for EMR Serverless is based on the number of workers that the service scales up at each stage of a customers’ job. Customers are charged for the aggregate vCPU, memory, and storage resources used from the time a worker starts running until it stops, rounded up to the nearest second with a 1-minute minimum, the company says.

EMR Serverless was first unveiled at the re:Invent 2021 conference last year, where AWS also unveiled Amazon Redshift Serverless and Amazon MSK (Kafka) Serverless. Prior to the conference, AWS offered a single serverless analytics service, for Athena, its hosted Presto environment.

AWS plans to eventually offer serverless versions of all of its analytics services, according to Rahul Pathak, AWS’s vice president of analytics. “It requires customer to do less work and that’s always a win from a customer point of view,” Pathak told Datanami in an interview last year. “Anything we can offload from them when it comes to the muck and the undifferentiated pieces of running infrastructure, the better.”

For more information on EMR Serverless, see aws.amazon.com/emr/serverless/.

Related Items:

Most AWS Analytics Customers Will Go Serverless, VP Says

Why AWS Keeps It Simple

 

Datanami