Follow Datanami:
April 25, 2024

US Dept. of Commerce Asks for Help to Make Data GenAI-Ready

(Aaban/Shuttestock)

Data is at the heart of AI. Without good data, the odds of developing useful AI models are somewhere between slim and none. With that in mind, the Department of Commerce last week issued a public request for advice on how it can better prepare its many public data sets for building generative AI models.

The Commerce Department issued a request for information (RFI) on April 17 for assistance from “industry experts, researchers, civil society organizations, and other members of the public” on ways that it can develop “AI-ready open data sets” for the public to use. You can read the RFI as it was recorded in the Federal Register here.

Commerce, which refers to itself as “America’s Data Agency,” collects, stores, and analyzes all sorts of data about the country, including data about the economy, its people, and the environment. The quick search of the Commerce Data Hub reveals more than 122,000 publicly accessible datasets on topics ranging from climate and weather to patents to census information.

As technology has changed and improved over the years, the department has repeatedly turned to private industry and public institution for assistance in keeping its data-curation and data-sharing activities up to current standards. Making data electronically accessible via machine-readable formats or through Web services and APIs are all examples of Commerce adapting its data services to the times.

Now, with the advent of the GenAI revolution, the department is now looking to position its data most appropriately for using it to build AI models.

“Today, Commerce is facing a new technological change with the emergence of AI technologies that provide improved information and data access to users,” Oliver Wise, the Commerce Department’s chief data officer, writes in its RFI. “Commerce is specifically interested in generative AI [GenAI] applications, which digest disparate sources of text, images, audio, video, and other types of information to produce new content. GenAI and other AI technologies present both opportunities and challenges for both data providers such as Commerce and data users including other government entities, industry, academia, and the American people.”

Wise says Commerce’s biggest challenge is to give AI developers access to its data “without losing the integrity,” including the quality the data. The “interpretation and use” of data “is no longer solely executed by human experts,” Wise writes. The loss of this “shared disciplinary knowledge” that goes into data curation and use is the big concern, he says.

“Recent AI systems are trained on tremendous amounts of digital content and generate responses based on the contextual properties of that content,” Wise writes in the RFI. “However, these systems do not truly ‘understand’ the texts in a meaningful way.”

Oliver Wise is the Chief Data Officer of the Department of Commerce

Future AI systems must have access to data that is not only machine readable but “machine understandable,” Wise writes. “Today’s AI systems are fundamentally limited by their reliance on extensive, unstructured data stores, which depend on the underlying data rather than an ability to reason and make judgments based on comprehension.”

Commerce is looking for assistance in how it can share data that takes these fundamental GenAI limitations into account. It’s looking for input on the creation of new data dissemination standards for human-readable and machine-understandable data, including licensing standards. On the data accessibility and retrieval front, Commerce wants advice on how it can make its data more accessible, such as through APIs or “web crawlability.

It’s specifically asking for help in how it can use knowledge graphs that utilize metadata to better link human terms to data. It also wants direction on the adoption of standard ontologies, such as Schema.org or NIEM, as well as how knowledge graphs can help to “harmonize and link” ontologies and vocabularies.

The department wants input from the community on how it can move forward on these data standardization efforts, while maintaining the highest standards when it comes to data integrity, quality, security, and ethics.

Wise asks interested parties to send their suggestion Victoria Houed via email at [email protected], with “AI-Ready Open Data Assets RFI” in the subject line. The department would like to receive input or feedback on these topics by July 16.

Related Items:

Data Quality Getting Worse, Report Says

Where US Spy Agencies Get American’s Personal Data From

Commerce Department to Hire Data Czar

 

Datanami