Follow Datanami:
January 27, 2020

Google Advances Data Set Search Tool

via Shutterstock

A new, granular search engine released by Google Research indexes nearly 25 million web-based data sets, allowing users to filter searches based on data type, location and terms of usage.

Google (NASDAQ: GOOGL) said its Dataset Search tool just released from beta testing allows users to filter queries to track down text, images or tables. Other features include determining whether published data is free or for sale and specifics on where data resides. The tool also allows mobile access, and the company said it has upgraded data set descriptions to improve the quality of search results.

The search tool utilizes the Schema.org for structuring data on the Internet and web pages. The open-source framework was launched by Google, Microsoft (NASDAQ: MSFT), Yahoo and Yandex (NASDAQ: YNDX). The tool is designed to help data scientists and web developers describe data set properties. Most government agencies around the world describe and publish data using Schema.org, Google said.

Dataset Search provides “a snapshot of the data out there on the Web,” Natasha Noy, a Google Research scientist, noted in a blog post announcing the release.

The ability to scan tens of millions of indexed data sets has greatly increased access to open government data. For example, the U.S. government leads the world with more than 2 million free data sets. Tables are the most popular data format, with about 6 million available on Dataset Search.

Given the huge amounts of data collected by government agencies, Google said the broadest topics include geosciences information such as weather data along with biology and agriculture. For challenges like climate change, a growing business is combining historic weather patterns with crop data to predict feature yields as average global temperatures rise.

Google said the most common queries on Dataset Search during beta testing included “education,” “weather,” “cancer,” and “dogs”(!).

Recent items:

Return of the Living Data

Big Data File Formats Demystified

Google Adds AutoML to Kaggle

 

Datanami