Follow Datanami:
August 23, 2013

Census Bureau Ponder Role of Outside Data Sources

Alex Woodie

For more than 200 years, the US Census Bureau has delivered official statistics about the state of American life. Over this time, the agency has been at the forefront of technology, including the first widespread use of Herman Hollerith’s punch card in 1890, the creation of core geographic information system (GIS) components for the 1970 census, and more. Now, the Census Bureau is weighing the pros and cons of implementing outside data sources and using big data collection techniques.

Every citizen knows (or should know) that the Census Bureau is tasked with counting the number and location of Americans every 10 years for the purpose of re-allocating seats in the US House of Representatives. But even well-informed citizens may be surprised at the breadth and depth of the department’s other information gathering projects and information delivery products, which keep it plenty busy in the years that don’t end in zero.

In addition to the actual census, the Census Bureau is involved in gathering and disseminating information related to: unemployment, income, government program participation and eligibility, purchases of specific goods and services, time use, education characteristics, illnesses, disability, health, types and incidence of crime, science/engineering work force, housing, poverty, and health insurance coverage. The group also performs regular economic sample surveys that measure retail sales, inventory, wholesale trade activities, and a whole lot more.

The official statistics generated by the Census Bureau, often in concert with other federal agencies, is critical to the decision-making processes of governments. The data is considered to be very high quality, as the Census Bureau has always strived to maintain integrity of its data and deliver transparency in how it’s gathered.

In an article posted this month, officials with the Census Bureau and the Center for Statistical Research and Methodology debate the risks and benefits of mixing its official statistics with “big data” sourced from the open market, and in using “big data” gathering techniques. “While Big Data are generally not official, we believe there are opportunities where they can enhance official statistics,” the officials write.

The officials say they’re looking at several specific uses. For example, the Census Bureau is considering using additional sources of phone numbers to keep the costs of telephone surveying down. It’s also considering using anonymous GPS tracking data from smartphones to help gauge transportation levels. To improve its economic data products, the agencies are exploring whether it makes sense to tap commercial e-transaction data, “perhaps to provide lower-level geography estimates and housing foreclosure data.”

“The Census Bureau is continuing to experiment internally with web scraping,” the officials write in the article. “There may be useful Internet data available for residential housing permits and sales, crime incidence, state and local government sales, and property taxes. Corporate finance data also might be available. State and local government web application program interfaces (APIs) exist to download available state and local data.”

The most promising use of novel big data sources–especially considering the questionable quality of externally sourced data– may not be in delivering finished official statistics, but in bolstering and checking the models and assumptions that it uses in the creation of its official products and estimates, perhaps to deliver more small-area estimates (at the county level, for instance) or in delivering more timely reports.

“Can Big Data reliably supply the social, demographic, health behavior, and business activity information required for a 21st-century society? Our current answer to this question is, ‘Not yet,’” write the authors. “Given the growing concern over privacy and confidentiality related to Big Data, our nation may not ever want or trust Big Data to serve as a source for official statistics.

“However, statistical agency infrastructures are in place to critique and address the accuracy, consistency, and interpretability of the results produced from Big Data. With this infrastructure, the Census Bureau is in position to incorporate relevant Big Data sources while ensuring the consistency of official statistics, providing interpretation of them, and improving their relevance and timeliness.”

Related items:

Big Data Dispelling Preconceived Notions in the NFL 

Providing Hidden Benefits With Predictive Analytics

This is Your Brain on GPUs 

Datanami