Follow Datanami:
October 4, 2023

Ethical Web Data Collection Initiative Launches Certification Program

WASHINGTON, Oct. 4, 2023 — The Ethical Web Data Collection Initiative (EWDCI) is an industry-led consortium of web data collectors focused on strengthening public trust, promoting ethical guidelines, and helping businesses and their customers make informed data extraction choices. The association aims to raise the bar for ethics in the process widely known as “data scraping” with the goal of enhancing trust—a key component of a free, fair, and open Internet. This international, industry-led, and member-driven consortium is announcing an accreditation program developed to bring greater accountability and build consumer confidence in the data collection industry.

Over the past several months, the EWDCI has collaborated on a set of core web scraping principles that revolve around legality, ethics, ecosystem engagement, and social responsibility, inviting everyone from across the globe to participate in the development of these principles. The EWDCI launched a public comment period to gather insights that zero in on the most important concerns of companies and individuals about how data is gathered and used.

We are proud to announce the launch of the EWDCI accreditation program, wherein eligible companies can receive an EWDCI Certified designation. All companies that receive the EWDCI Certified designation are showing the world that they adhere to these agreed-upon principles and the highest degree of ethics when collecting public web data, while also further advancing the industry’s best practices and accountability.

Starting today, companies may apply to become EWDCI Certified. We encourage companies who collect and manage web data to join the consortium—and, most importantly, join the conversation to further develop these principles. The inaugural group of web data aggregators that have earned EWDCI accreditation includes Coresignal, Oxylabs, ProxyEmpire, Rayobyte, Smartproxy, and Zyte.

The EWDCI Certified designation isn’t so much the result of our work but rather the culmination of the first stage of a longer process. The web data collection industry is still young, but it’s growing very quickly. As more data-hungry AI tools fall into corporate and private hands, there is a limited opportunity to shape how data-collection practices are developed and perceived. This is why the EWDCI is dedicated to defining positive and beneficial uses of the important abilities and potential of data collection and aggregation at scale.

The EWDCI is now focused on furthering the consortium’s mission and scope of practice through the acquisition of public commentary on various topics, which include:

  • How scraped data can be used to ethically train large language models (LLMs) and generative AI models.
  • Government access to data and due process.
  • Balance between scrapers and target websites.
  • Privacy compliance when scraping personal data.
  • Preventing tactics that undermine consent and consumer choice.
  • Anti-stalkerware efforts.

“The EWDCI seal is a crucial stamp of approval, but it’s also a way to build industry-led influence with a clear goal of making the free and open Internet a better and safer place,” said Christian Dawson, Executive Director of the i2Coalition.

Companies working with web data collection can earn the EWDCI Certified designation by contacting Hilary Osborne at [email protected].

About the Ethical Web Data Aggregation Integrity Initiative

The Ethical Web Data Collection Initiative (EWDCI) seeks to foster cooperation in the web data collection and aggregation industry and leverage collective first-hand knowledge and insights to advocate for beneficial technical standards and business best practices regarding the extraction of web data. The EWDCI is dedicated to serving as the voice of the industry, collaboratively strengthening public trust in the practice of data scraping, promoting ethical guidelines, and helping businesses make informed data extraction choices. Learn more about the EWDCI:

About i2Coalition

The Internet Infrastructure Coalition (i2Coalition, i2C) is the leading voice for web hosting companies, data centers, domain registrars and registries, cloud infrastructure providers, managed services providers, and related tech. The i2C works with Internet infrastructure providers to advocate for sensible policies, design and reinforce best practices, help create industry standards, and build awareness of how the Internet works. The i2Coalition also spearheaded the creation of the VPN Trust Initiative to establish and promote best practices for that vital industry. Learn more about the i2Coalition:

Source: EWDCI