Follow Datanami:
January 23, 2014

Keeping Tabs on Amazon EMR Performance

The Hadoop architecture may appear simple at first, but it’s deceptively complex and may lull you into a false sense of security as you try to scale your cluster. If you run Hadoop on Amazon Elastic MapReduce, there are a number of potentially wallet-draining performance issues to keep an eye on. This week, Compuware launched an Amazon-resident solution to help prevent EMR jobs from running amok.

By all accounts, Amazon’s EMR is a tremendously popular service that is helping to bend the cost curve for would-be Hadoop users. It’s a great place for playing around with new Hadoop applications or testing code. The trouble is, Amazon doesn’t provide much in the way of technical handholding or troubleshooting with EMR. You’re pretty much on your own. If your Hadoop jobs run slowly or not at all, it’s up to you to figure out why.

The folks at Compuware know a thing or two about troubleshooting performance issues in enterprise applications. The Detroit, Michigan-based company has a well-respected application performance management (APM) that has traditionally focused on conventional database, ERP, Web, and mobile applications workloads.

But now Compuware’s APM division is getting into the big data space with the launch of a Hadoop APM offering that lives in the Amazon cloud. The new offering, called Compuware APM for Elastic MapReduce, is built on the company’s existing PurePath technology, and provides the same sort of performance gathering and troubleshooting capabilities as the APM solutions for traditional apps.

Compuware’s APM shows how Hadoop is running

The offering can pinpoint the root cause of failed jobs or performance “hotspots” in EMR workloads, Compuware says. The software works by reading the exceptions, stack traces, and logs generated by EMR, running it through its PurePath technology, and then generating results.

“By profiling Hadoop jobs in production, operations teams can quickly identify the issues, whether they are misconfigured or unbalanced clusters, poorly-coded workloads, or unhealthy hosts,” the company says. This information can then be used to make changes, either to the code, the configuration, or to the underlying Amazon infrastructure (such as by changing instance types).

Compuware delivers a graphical dashboard with its APM for Elastic MapReduce, which also can generate reports depicting adherence to service level agreements (SLAs) and potential charge-backs. The software is geared toward smaller Hadoop environments on EMR; it also supports HBase.

APM for Elastic MapReduce is available on the AWS Marketplace now, with pricing starting at $.72 per hour on a Standard Large (m1.large) EC2 instance (enough for 12 Hadoop JVMs) and ranging up to $3.13 per hour on a M3 XL (m3.xlarge) instance (enough for 100 Hadoop JVMs). A free 30-day trial is also available.

Related Items:

Hitching a Big Data Ride on Amazon’s Cloud

Highlighting Business Signals on the Noisy Web

Amazon Tames Big Fast Data with Kinesis Pipe

Datanami