9 Places to Get Big Data Now
Discussions of big data often revolve around what new technologies and processes people are using to analyze data. Hadoop, in-memory databases, and machine learning algorithms are getting lots of attention in this regard. But sometimes we tend overlook the most important ingredient in big data analytics: the data itself.
Much of big data that organizations want to analyze exists within their own four walls. Relational databases make great repositories for structured data like account records, orders, and customer lists. Depending on the industry, organizations usually have access to good bit of semi-structured data too, in the form of JSON files, XML files, and emails.
This is all well and good, and gist for the big data mill. Today’s emerging big data analytics tools can definitely help you squeeze more actionable information out of these sources than earlier generations of business intelligence tools. But the real promise of big data analytics is not just about crunching existing sources of data in new and improved ways–it’s about fundamentally transforming our approach to data, and, most importantly, incorporating new data sources into our analyses.
Prospecting for external sources of big data sounds like a daunting task. But in fact, the data is everywhere. Not all of the data is free, but much of it is, and just waiting for the taking. Here are nine sources of external data that you can start incorporating into your analysis:
- Data.gov – The Federal Government has made a concerted effort to share the vast amounts of data collected by its various agencies. Everything from historical weather data and crime statistics to consumer complaints and hospitals charges is available, free of charge, via thousands of downloadable datasets maintained by the government. Other countries have followed suit; for example, the UK offers its own version of free public data at data.gov.uk.
- AWS Pubic Data Sets – Amazon maintains large repositories of public data that its AWS users are free to incorporate into their applications. If you’re looking for a corpus of 5 billion web pages or genome data for humans and other species, the Seattle, Washington company will make it available to you.
- New York Times – The Gray Lady has been collecting, analyzing, and distributing news about the world for more than 150 years, and now the newspaper is making its complete archive of articles available to you, for free, via a handy API.
- Social Media — Sure, you may have locked down the privacy settings for your own Facebook page. But plenty of other Facebook users haven’t, and the social media giant makes it easy to get gobs of summary data about these folks through its Facebook Graph API. Similarly, you can get access to the Twitter fire hose via Twitter’s GNIP service. It’s not free, and 99 percent of the Tweets are probably useless, but there’s the chance of uncovering a nugget of useful information, particularly around customer sentiment.
- Google Trends – Google’s Internet search is a portal into the thoughts and feelings of people around the world. You can tap into that collective consciousness via Google Trends, which gives you various tools to analyze what people are searching for, at Google and YouTube, by topic, geography, and time.
- Gapminder –Did you ever want to know the traffic death rate per 100,000 people, sliced by age, sex, and type of road, for various locations around the world. This and other interesting datasets collected by non-governmental organizations (NGOs) are available for download from Gapminder World.
- Mobile phone companies – The big mobile carriers (AT&T, Spring, Verizon) are sitting on a treasure trove of data about their smartphone users, such as what apps they’re using, what websites they’re visiting, and where and when they’re doing this. Marketers and advertising companies use this information to push targeted ads to their customers.
- Hoover’s and Nielsen – Procured business data from companies like Hoover’s (a subsidiary of Dun & Bradstreet) and Nielsen have been around for ages, but they’re still relevant in today’s big data age. Increasingly, these firms can sell you data about what people are buying from local stores, which can be an invaluable asset for retailers.
- Geographical data – For some types of analysis, it’s critical that you place people or events on the map. Lucky for you, there’s a host of free geographic data made available by government cartographers and geo-spatial experts. You can see a list of geographical information systems (GIS) data sources at Free GIS Data.
Today’s data-driven companies are finding ways to mix and match various sources of data like these. In the old days, data analysts would feel at the top of their game if they were blending three or four data sources, but today’s companies are finding interesting and potentially profitable patterns by mixing a dozen or more sources together. With a bit of data science acumen, hard work, and luck, you too may find that a mixture of big data sources is more than the sum of its parts.
September 23, 2021
- AtScale Expands Semantic Layer Solution for Microsoft Excel
- CNCF End User Technology Radar Provides Insights into DevSecOps
- At Annual OCEANS 2021, Sofar Ocean Debuts First-of-Its-Kind Maritime Open Standard, Bristlemouth
- Elastic Announces the General Availability of Elastic App Search Web Crawler, New Features for Elastic Enterprise Search
- Securonix Achieves FedRAMP In-Process Authorization
- EDJX and Cubic Corporation Partner to Launch the Internet of Military Things Edge Platform
September 22, 2021
- GigaOm Names Moogsoft an Industry Leader in “Radar for AIOps Solutions” Report
- Clearsense Acquires Plug-and-Play AI Analytics Firm
- Purdue University Global Launches Master of Science in Data Analytics
- Dihuni OptiReady CognitX Deep Learning Servers and Workstations Powered by NVIDIA Ampere Architecture-based GPUs
- Scality Awarded New U.S. Patent for Breakthrough Technology in Hyper-Scale Data Protection
- MicroAI to Bring AI Training to Renesas MCUs
- Recent Gartner VP Analyst Sanjeev Mohan Joins Okera as a Strategic Advisor
- C3 AI Reinvents Enterprise Software UX With C3 AI Data Vision
September 21, 2021
- Healthcare Analytics Summit 21 Virtual Kicks Off Today
- Tesco Selects Teradata Vantage to Drive Enterprise-Wide Analytics at Scale
- Ketch Secures $20 Million in Series A1 Funding, Accelerating its Rapid Growth
- Yandex Spins Off ClickHouse into Standalone Company
- Analytics Vidhya Announces $5.5 Million Strategic Investment from Fractal, Aims to Train Half a Million Full Stack AI Professionals
- Nutanix Cloud Platform Breaks Down Silos in Hybrid Multicloud Operations
Most Read Features
- One on One with Google Cloud Product Director Irina Farooq
- Big Data File Formats Demystified
- Tabular Seeks to Remake Cloud Data Lakes in Iceberg’s Image
- What’s the Difference Between AI, ML, Deep Learning, and Active Learning?
- Who’s Winning In the $17B AIOps and Observability Market
- SambaNova Brings Custom Silicon To Bear on High-End AI Workloads
- In Search of the Modern Data Stack
- COVID-Driven Cloud Surge Takes a Toll on the Data
- Rethinking Education in an AI-First World
- Did Rockset Just Solve Real-Time Analytics?
- More Features…
Most Read News In Brief
- LinkedIn Open Sources Tech Behind 10,000-Node Hadoop Cluster
- Data and AI Salaries Continue Upward March, O’Reilly Says
- Gartner Shuffles the Technology Deck with Latest ‘Hype Cycle’ Report
- Data Prep Still Dominates Data Scientists’ Time, Survey Finds
- Who’s Winning in Open Source Data Tech
- Can Apple Right its Privacy and Security Cart?
- Hands-Off: Manual Data Integration Tasks Plummeting, Gartner Says
- Why Is SAS Going Public?
- Apollo CEO Bullish on GraphQL’s Potential in the Enterprise
- Why Young Developers Don’t Get Knowledge Graphs
- More News In Brief…
Most Read This Just In
- TIBCO NOW 2021 Showcases Limitless Power of Data
- Cribl Raises $200M in Series C Funding on Traction with Global Enterprise Customers
- Toloka Launches Data Research Grants, Announces First Eight Recipients
- Anaconda Announces Support for Pyston, Hiring Lead Developers Kevin Modzelewski and Marius Wachtler
- MariaDB Announces SIS Provider Campus Cloud Services Migration to MariaDB SkySQL
- Transaction Processing Performance Council (TPC) Launches an Artificial Intelligence Benchmark (TPCx-AI)
- Kinetica Fuses Streaming and Contextual Analysis At Scale
- DataRobot Launches “DataRobot AI Cloud” Platform
- OneStream Previews New AI and ML Capabilities at Splash 2021
- JetBrains Launches Public Early-Access Program for JetBrains DataSpell IDE
- More This Just In…
Sponsored Partner Content
October 5 - October 7
October 12 - October 14
October 19London United Kingdom
October 27 - October 28
November 29 - December 3
December 6 - December 10San Diego CA United States