DataTorrent
Language Flags

Translation Disclaimer

HPCwire Enterprise Tech HPCwire Japan


November 15, 2013

Amazon Tames Big Fast Data with Kinesis Pipe


Amazon Web Services added another engine to its big data powerhouse this week when it unveiled Kinesis for real-time streaming data. Kinesis allows users to create new apps that analyze high-throughput data streams, such as log files, financial transactions, and click-stream data, at rates of more than 100 TB per hour.

There are all types of fast-moving data streams that organizations would like to tap for actionable insight. But getting a handle on data streams--such as stock tickers, social media feeds, geospatial data, results from massive multi-player games, inventory levels, and any machine data from the Internet of Things--is easier said than done.

Enter Kinesis, which sits upon AWS' EC2 cloud and allows users to quickly start analyzing their streaming data with just a few clicks of the mouse. Instead of sending these data streams off to some server where they may never be analyzed, Kinesis is designed to make it easy to pipe in big data streams to AWS servers, analyze the data, and then discard the digital waste byproduct, or recycle it into brand new streams.

Amazon envisions all sorts of uses for Kinesis. In an e-commerce setting, Kinesis can be used to generate product recommendations based on Web clickstreams generated by mobile users. Financial services companies can use it to ingest stock ticker information, which can be used to refactor financial models on a near-continuous basis. A manufacturer could use it to monitor inventory data, and generate alerts when inventories get too low. Kinesis can be used to mine millions of Tweets to identify patterns or trends, or used with Facebook's social graph for purposes of consumer sentiment analysis.

Amazon's rendition of what "big data pipes" might look like.

There are all sorts of potential uses for a streaming data analysis machine such as Kinesis, particularly in the area of machine-generated data and machine learning. This real-time processing need is driving big interest in products such as Splunk and Apache Storm, which sits atop Hadoop. What Amazon brings to the table is the capability to spin up a real-time stream processing system without the need to actually build or deploy any infrastructure. It's a powerful concept, and a great real-time complement to Amazon's batch-oriented Hadoop offering, Elastic MapReduce.

Users can get started with Kinesis by provisioning a new data stream from their AWS web management console. Alternatively, a data stream can be provisioned by using the Kinesis API or SDK. Amazon provides client libraries to allow developers to integrate Kinesis data processing into their Java applications.

Amazon CTO Werner Vogels introducing Kinesis during a keynote speech at the AWS re:Invent conference yesterday

From a user standpoint, Kinesis operates on data streams in terms of shards. According to Amazon, each shard ingests data, using Kinesis' HTTP-based PutRecord function, in blocks of 1,000 write transactions, at rates up to 1 MB per second. Conversely, each shard egresses data, using the GetNextRecords function, in blocks of 20 read transactions at rates up to 2 MB per second. Users can scale their shards, or blocks, up or down on the fly, without restarting the stream or impacting the data sources pushing data into Kinesis, Amazon says.

Kinesis limits a user to analyzing data streams from the past 24 hours. This trailing 24-hour window should give the user enough time to extract the useful bits of information. If the user still wants to hang onto the data after the 24-hour window, he can move it to another Kinesis stream, or move it to other AWS offerings, including S3, DynamoDB, or RedShift, each of which is pre-integrated with Kinesis.

Amazon charges for Kinesis based on the number of PUTs and for each shard of throughput capacity. The company charges $.028 for each 1 million PUT transactions, and $0.015 per hour for the sharding capacity. Since Kinesis runs inside of AWS EC2, a user must pay for their EC2 capacity as well.

Kinesis is currently in limited preview.  The new offering was unveiled yesterday during Amazon CTO Werner Vogels' keynote address at the AWS re:Invent conference that took place in Las Vegas. "This is an amazing new service where we can build tremendously innovative real time applications," Vogels says during his keynote.

You can watch Vogels' entire presentation via YouTube below.

 

Related Items:

Splunk Pumps Up Big Data with Hunk

LinkedIn Open Sources Samza Stream Processor

Apache Takes Storm Into Incubation

Share Options


Subscribe

» Subscribe to our weekly e-newsletter


Discussion

There are 0 discussion items posted.

 

Most Read Features

Most Read News

Most Read This Just In

Cray Supercomputer

Sponsored Whitepapers

Planning Your Dashboard Project

02/01/2014 | iDashboards

Achieve your dashboard initiative goals by paving a path for success. A strategic plan helps you focus on the right key performance indicators and ensures your dashboards are effective. Learn how your organization can excel by planning out your dashboard project with our proven step-by-step process. This informational whitepaper will outline the benefits of well-thought dashboards, simplify the dashboard planning process, help avoid implementation challenges, and assist in a establishing a post deployment strategy.

Download this Whitepaper...

Slicing the Big Data Analytics Stack

11/26/2013 | HP, Mellanox, Revolution Analytics, SAS, Teradata

This special report provides an in-depth view into a series of technical tools and capabilities that are powering the next generation of big data analytics. Used properly, these tools provide increased insight, the possibility for new discoveries, and the ability to make quantitative decisions based on actual operational intelligence.

Download this Whitepaper...

View the White Paper Library

Sponsored Multimedia

Webinar: Powering Research with Knowledge Discovery & Data Mining (KDD)

Watch this webinar and learn how to develop “future-proof” advanced computing/storage technology solutions to easily manage large, shared compute resources and very large volumes of data. Focus on the research and the application results, not system and data management.

View Multimedia

Video: Using Eureqa to Uncover Mathematical Patterns Hidden in Your Data

Eureqa is like having an army of scientists working to unravel the fundamental equations hidden deep within your data. Eureqa’s algorithms identify what’s important and what’s not, enabling you to model, predict, and optimize what you care about like never before. Watch the video and learn how Eureqa can help you discover the hidden equations in your data.

View Multimedia

More Multimedia

NVIDIA

Job Bank

Datanami Conferences Ad

Featured Events

May 5-11, 2014
Big Data Week Atlanta
Atlanta, GA
United States

May 29-30, 2014
StampedeCon
St. Louis, MO
United States

June 10-12, 2014
Big Data Expo
New York, NY
United States

June 18-18, 2014
Women in Advanced Computing Summit (WiAC ’14)
Philadelphia, PA
United States

June 22-26, 2014
ISC'14
Leipzig
Germany

» View/Search Events

» Post an Event