Follow Datanami:
January 29, 2015

9 Places to Get Big Data Now

Alex Woodie

Discussions of big data often revolve around what new technologies and processes people are using to analyze data. Hadoop, in-memory databases, and machine learning algorithms are getting lots of attention in this regard. But sometimes we tend overlook the most important ingredient in big data analytics: the data itself.

Much of big data that organizations want to analyze exists within their own four walls. Relational databases make great repositories for structured data like account records, orders, and customer lists. Depending on the industry, organizations usually have access to good bit of semi-structured data too, in the form of JSON files, XML files, and emails.

This is all well and good, and gist for the big data mill. Today’s emerging big data analytics tools can definitely help you squeeze more actionable information out of these sources than earlier generations of business intelligence tools. But the real promise of big data analytics is not just about crunching existing sources of data in new and improved ways–it’s about fundamentally transforming our approach to data, and, most importantly, incorporating new data sources into our analyses.

Prospecting for external sources of big data sounds like a daunting task. But in fact, the data is everywhere. Not all of the data is free, but much of it is, and just waiting for the taking. Here are nine sources of external data that you can start incorporating into your analysis:

  • Data.gov – The Federal Government has made a concerted effort to share the vast amounts of data collected by its various agencies. Everything from historical weather data and crime statistics to consumer complaints and hospitals charges is available, free of charge, via thousands of downloadable datasets maintained by the government. Other countries have followed suit; for example, the UK offers its own version of free public data at data.gov.uk.
  • AWS Pubic Data Sets – Amazon maintains large repositories of public data that its AWS users are free to incorporate into their applications. If you’re looking for a corpus of 5 billion web pages or genome data for humans and other species, the Seattle, Washington company will make it available to you.
  • New York Times – The Gray Lady has been collecting, analyzing, and distributing news about the world for more than 150 years, and now the newspaper is making its complete archive of articles available to you, for free, via a handy API.
  • Social Media — Sure, you may have locked down the privacy settings for your own Facebook page. But plenty of other Facebook users haven’t, and the social media giant makes it easy to get gobs of summary data about these folks through its Facebook Graph API. Similarly, you can get access to the Twitter fire hose via Twitter’s GNIP service. It’s not free, and 99 percent of the Tweets are probably useless, but there’s the chance of uncovering a nugget of useful information, particularly around customer sentiment.
  • Google Trends – Google’s Internet search is a portal into the thoughts and feelings of people around the world. You can tap into that collective consciousness via Google Trends, which gives you various tools to analyze what people are searching for, at Google and YouTube, by topic, geography, and time.
  • Gapminder –Did you ever want to know the traffic death rate per 100,000 people, sliced by age, sex, and type of road, for various locations around the world. This and other interesting datasets collected by non-governmental organizations (NGOs) are available for download from Gapminder World.
  • Mobile phone companies – The big mobile carriers (AT&T, Spring, Verizon) are sitting on a treasure trove of data about their smartphone users, such as what apps they’re using, what websites they’re visiting, and where and when they’re doing this. Marketers and advertising companies use this information to push targeted ads to their customers.
  • Hoover’s and Nielsen – Procured business data from companies like Hoover’s (a subsidiary of Dun & Bradstreet) and Nielsen have been around for ages, but they’re still relevant in today’s big data age. Increasingly, these firms can sell you data about what people are buying from local stores, which can be an invaluable asset for retailers.
  • Geographical data – For some types of analysis, it’s critical that you place people or events on the map. Lucky for you, there’s a host of free geographic data made available by government cartographers and geo-spatial experts. You can see a list of geographical information systems (GIS) data sources at Free GIS Data.

Today’s data-driven companies are finding ways to mix and match various sources of data like these. In the old days, data analysts would feel at the top of their game if they were blending three or four data sources, but today’s companies are finding interesting and potentially profitable patterns by mixing a dozen or more sources together. With a bit of data science acumen, hard work, and luck, you too may find that a mixture of big data sources is more than the sum of its parts.

Related Items:

Procured Data Feed Promises To Be ‘Disruptive,’ 1010data Says

‘What Is Big Data’ Question Finally Settled?

How To Not Get Overwhelmed by Big Data

 

open data

Datanami