9 Places to Get Big Data Now
Discussions of big data often revolve around what new technologies and processes people are using to analyze data. Hadoop, in-memory databases, and machine learning algorithms are getting lots of attention in this regard. But sometimes we tend overlook the most important ingredient in big data analytics: the data itself.
Much of big data that organizations want to analyze exists within their own four walls. Relational databases make great repositories for structured data like account records, orders, and customer lists. Depending on the industry, organizations usually have access to good bit of semi-structured data too, in the form of JSON files, XML files, and emails.
This is all well and good, and gist for the big data mill. Today’s emerging big data analytics tools can definitely help you squeeze more actionable information out of these sources than earlier generations of business intelligence tools. But the real promise of big data analytics is not just about crunching existing sources of data in new and improved ways–it’s about fundamentally transforming our approach to data, and, most importantly, incorporating new data sources into our analyses.
Prospecting for external sources of big data sounds like a daunting task. But in fact, the data is everywhere. Not all of the data is free, but much of it is, and just waiting for the taking. Here are nine sources of external data that you can start incorporating into your analysis:
- Data.gov – The Federal Government has made a concerted effort to share the vast amounts of data collected by its various agencies. Everything from historical weather data and crime statistics to consumer complaints and hospitals charges is available, free of charge, via thousands of downloadable datasets maintained by the government. Other countries have followed suit; for example, the UK offers its own version of free public data at data.gov.uk.
- AWS Pubic Data Sets – Amazon maintains large repositories of public data that its AWS users are free to incorporate into their applications. If you’re looking for a corpus of 5 billion web pages or genome data for humans and other species, the Seattle, Washington company will make it available to you.
- New York Times – The Gray Lady has been collecting, analyzing, and distributing news about the world for more than 150 years, and now the newspaper is making its complete archive of articles available to you, for free, via a handy API.
- Social Media — Sure, you may have locked down the privacy settings for your own Facebook page. But plenty of other Facebook users haven’t, and the social media giant makes it easy to get gobs of summary data about these folks through its Facebook Graph API. Similarly, you can get access to the Twitter fire hose via Twitter’s GNIP service. It’s not free, and 99 percent of the Tweets are probably useless, but there’s the chance of uncovering a nugget of useful information, particularly around customer sentiment.
- Google Trends – Google’s Internet search is a portal into the thoughts and feelings of people around the world. You can tap into that collective consciousness via Google Trends, which gives you various tools to analyze what people are searching for, at Google and YouTube, by topic, geography, and time.
- Gapminder –Did you ever want to know the traffic death rate per 100,000 people, sliced by age, sex, and type of road, for various locations around the world. This and other interesting datasets collected by non-governmental organizations (NGOs) are available for download from Gapminder World.
- Mobile phone companies – The big mobile carriers (AT&T, Spring, Verizon) are sitting on a treasure trove of data about their smartphone users, such as what apps they’re using, what websites they’re visiting, and where and when they’re doing this. Marketers and advertising companies use this information to push targeted ads to their customers.
- Hoover’s and Nielsen – Procured business data from companies like Hoover’s (a subsidiary of Dun & Bradstreet) and Nielsen have been around for ages, but they’re still relevant in today’s big data age. Increasingly, these firms can sell you data about what people are buying from local stores, which can be an invaluable asset for retailers.
- Geographical data – For some types of analysis, it’s critical that you place people or events on the map. Lucky for you, there’s a host of free geographic data made available by government cartographers and geo-spatial experts. You can see a list of geographical information systems (GIS) data sources at Free GIS Data.
Today’s data-driven companies are finding ways to mix and match various sources of data like these. In the old days, data analysts would feel at the top of their game if they were blending three or four data sources, but today’s companies are finding interesting and potentially profitable patterns by mixing a dozen or more sources together. With a bit of data science acumen, hard work, and luck, you too may find that a mixture of big data sources is more than the sum of its parts.
March 5, 2021
- Google Charting a Course Towards a More Privacy-First Web
- Trifacta Partners with Google Cloud to Host First-Ever Data Engineering Summit
- Dataiku Again Named a Leader in the Gartner 2021 Magic Quadrant for Data Science and ML Platforms
March 4, 2021
- Autonomous Discovery: What’s Next in Data Collection for Experimental Research
- Sumo Logic Expands Observability with Deeper Insights of Microservices
- Snowflake Reports Financial Results for the Fourth Quarter and Full Year of Fiscal 2021
- Splunk Named Cloud Observability Leader and Only Outperformer
- Pavilion Data Systems Announces 2020 Partners of the Year
- Reading the Physics Hiding in Data
- Elastic Announces General Availability of Searchable Snapshots and Introduces Runtime Fields
March 3, 2021
- National Security Commission on AI Releases Final Report to Congress and the President
- Lenovo Reveals New ThinkEdge Portfolio of Embedded Computers
- Graph + AI Summit 2021 Returns April 21-22
- Clearlake and TA Associates to Acquire Data Integrity Software Leader Precisely
- Hewlett Packard Enterprise Advances Edge Leadership with HPE SimpliVity
- Redis Labs Announces General Availability for Integrated Enterprise Tiers of Azure Cache for Redis
- OmniSci Launches Free Edition of Platform to Democratize the Power of Accelerated Analytics
March 2, 2021
- Datanami Unveils 2021 People to Watch
- DataRobot Opens Applications for Second AI for Good: Powered by DataRobot Cohort
- Spectra Logic and OpenDrives Partner to Provide End-to-End Data Storage, Management
Most Read Features
- Big Data File Formats Demystified
- He Couldn’t Beat Teradata. Now He’s Its CEO
- Apache Iceberg: The Hub of an Emerging Data Service Ecosystem?
- Why Data Science Is Still a Top Job
- Big Data Predictions: What 2020 Will Bring
- One Model to Rule Them All: Transformer Networks Usher in AI 2.0, Forrester Says
- Snowflake: Not What You May Think It Is
- Empowering the Data Consumer: Living, and Breathing Data Governance, Security, and Regulations
- Understanding Your Options for Stream Processing Frameworks
- What’s the Difference Between AI, ML, Deep Learning, and Active Learning?
- More Features…
Most Read News In Brief
- Databricks Edges Closer to IPO with $1B Round
- Researchers Use Deep Learning to Plow Through NASA Snow Radar Data
- The AI Inside NASA’s Latest Mars Rover, Perseverance
- Databricks Plotting IPO in 2021, Bloomberg Reports
- Soda Launches Open Data Monitoring
- Data Prep Still Dominates Data Scientists’ Time, Survey Finds
- AI Infrastructure Gets a Stack
- Databricks Now on Google Cloud
- The Rise and Fall of Qlik
- Matillion Rides Cloud ETL to $100 Million Round
- More News In Brief…
Most Read This Just In
- Cal Poly Team Working on Cross-disciplinary Data Science and Analytics Effort
- UCL Reports: Online Search Activity Can Help Predict Peaks in COVID-19 Cases
- Collibra Acquires Predictive Data Quality Vendor OwlDQ
- DataRobot Announces Feature Discovery Integration with Snowflake
- Sinequa Announces Strong Momentum and Fiscal Year 2020 Results Amid COVID-19 Pandemic
- Alation Announces Release of 2021.1 Data Intelligence Platform
- Datanami Unveils 2021 People to Watch
- Wharton Research Data Services Expands RavenPack Analytics
- NVIDIA Deep Learning Institute Releases New Accelerated Data Science Teaching Kit
- Alluxio Achieves 3.5x Year-Over-Year Revenue Growth in FY21
- More This Just In…
Sponsored Partner Content
March 17 - March 18Kensington London United Kingdom