Can AWS Crack the Code for Data Exchanges?
Data has become a tangible commodity with the potential to be traded like any other good. Like petroleum, it’s the raw feedstock used to create numerous downstream products. However, unlike petroleum, we lack a central market or clearinghouse where parties can easily buy and sell data. That’s the challenge that Amazon Web Services is hoping to solve with its newly announced AWS Data Exchange.
AWS yesterday launched its Data Exchange, a digital market that brings data sellers together with data buyers. The market allows AWS customers to browse and purchase a variety of data sets offered by data sellers to use with their own analytics and machine learning projects.
“On one hand, it’s hard for data subscribers to securely find, access, and analyze relevant data sets,” AWS says in a video on the Data Exchange webpage. “On the other hand, it’s hard for providers with useful data sets to share them. AWS is here to remove the friction, build a bridge between these two groups, and unlock the value of data.”
Currently, there are nearly 1,300 data sets for sale from 90 companies selling data on the AWS Data Exchange, according to the listing on the AWS Marketplace. There are some familiar names selling data on the exchange, including Dun & Bradstreet, Pitney Bowes, Reuters, Acxiom, Foursquare, and Deloitte.
Customers can access data from the Data Exchange automatically using an API, or they can do it manually from a GUI console. AWS makes it easy to move the purchased data into a data lake running on S3 storage. From there, AWS customers can bring various analytics and machine learning applications to bear on the data.
When a data provider updates their data, AWS automatically notifies the subscribers to that data with an alert, called a CloudWatch Event. AWS says this allows data subscribers to keep their data lakes, data warehouses, and machine learning models up to date with the latest data.
AWS says its approach benefits customers by centralizing data acquisition tasks. Instead of managing credentials, fiddling with FTP sites, or (gasp!) mounting physical media shipped via mail, customers get instant access to a host of different data types, all of it accessible via a single API. What’s more, the invoice for the purchased data is consolidated into the customer’s AWS bill. It’s all just that simple.
“Customers have asked us for an easier way to find, subscribe to, and integrate diverse data sets into the applications, analytics, and machine-learning models they’re running on AWS. Unfortunately, the way customers exchange data hasn’t evolved much in the last 20 years,” said Stephen Orban, general manager of AWS Data Exchange, in a press release. “AWS Data Exchange gives our customers the ability to quickly integrate third-party data in the workloads they’re migrating to the cloud, while giving qualified data providers a modern and secure way to package, deliver, and reach the millions of AWS customers worldwide.”
There are some interesting data sets among the 1,300 or so that are listed for sale in the Data Exchange. For example, for $5,000, you can get a month of access to Spire Aviation’s October archive of Automatic Dependent Surveillance-Broadcast (ADS-B) aviation surveillance technology. And for $30,000, you can get a 12-month subscription to Weather Trends International’s weekly predicted demand for fishing rods, sliced by US region. Epsilon, meanwhile, is giving away a free month of access to its Consumer Data Insights file aggregated at the ZIP Code level.
Customers will be able to incorporate these data sets into various Amazon services, such as its Hadoop product, Elastic MapReduce (EMR); its SQL data warehousing product Redshift; its low-latency SQL warehouse Athena; its data integration and ETL environment, AWS Glue; or its pre-built data lake, AWS Lake Formation. AWS is also making room for partners, like Databricks, which offers its Apache Spark-based offerings on AWS.
“By integrating the AWS Data Exchange API into Databricks, our customers can seamlessly combine third-party data with their existing data lakes to perform advanced data science and analytics at scale on AWS,” Pankaj Dugar, Databricks’ vice president of technology and data provider partnerships, said in a press release.
With AWS re:Invent less than a month away, AWS Data Exchange provides a teaser of sorts for the types of services that the cloud giant has up its sleeve. But the big question for AWS is whether the Data Exchange will catch fire with customers.
While third-party data is growing in popularity and use, it’s subject to a core limitation, in that its value decreases as more people get access to it. When everybody is making decisions that are fueled, in part, by data originating with D&B, then it ceases to be a competitive advantage.
Perhaps by greasing the skids on the availability and integration of third party data into customers’ big data analytics projects, AWS can help pave the way to the next level: the buying and selling of more valuable and exclusive second-party data and alternative data on a one-to-one basis. This is a much more difficult task, since it can’t be centralized. But it’s potentially more valuable to buyers and sellers alike.
Bloomberg is one data aggregator that has been doing work with alternative data. The media company, which is not currently participating in the AWS Data Exchange, is working with over a dozen providers of alternative data in various realms, from general ones like satellite data to obscure ones like the “shadow” market for metals.
AWS Data Exchange will likely lower the barrier of entry for the use of third-party data in analytics, which will benefit those companies that haven’t yet incorporated this potentially useful data into their schemes. Considering how early many companies are in their data journeys, this is a good thing.
But in making third-party data more ubiquitous, the AWS Data Exchange will likely also lower the value of this data in the long run. It could also encourage those firms pushing the data envelope to seek better, more exclusive, and more differentiating data than what AWS is offering to the world on its Data Exchange.