Unstructured Data Vs. Environment Reflected in Earth Day Commitments
Today is Earth Day, which means millions of people are taking time to think about how their actions are impacting the planet. The reflections are especially important for those in the IT business, where every byte of data processed expands our collective carbon footprint. However, data of the unstructured type seems to bear a heavier burden on the earth, especially when it’s used to power AI initiatives.
Data centers in the United States consume about 1.8% of all the electricity generated in the country, according to a May 2021 paper published in the Environmental Research Letters. What’s more, these American data centers–which account for 20% of all the data centers in the world–also consume a large amount of fresh water, they found.
However, not all data centers are equal. AWS, which has committed to using 100% renewable energy by 2025 and reaching net-zero with its carbon emissions by 2040, claims that its massive public data centers have a relatively smaller footprint than the private data centers operated by individual companies.
“When companies move to the AWS Cloud from on-premises infrastructure, they typically reduce carbon emissions by 88% because our data centers can offer environmental economies of scale,” the company says. “Organizations generally use 77% fewer servers, 84% less power, and tap into a 28% cleaner mix of solar and wind power in the AWS Cloud versus their own data centers,” it says.
Last month, AWS launched a new carbon footprint tool that allows customers to determine the impact their IT operations are having on emissions. That brings it up to speed with its competitor, Google Cloud, which launched a similar tool last October. Google Cloud, by the way, is committed to using 100% renewable energy in its data centers by 2030.
Just as different data centers have different environmental impacts, so do different types of data. Unstructured data, such as video, audio, images, and text, consumes a disproportionate amount of storage and processing power that more structured data, such as the tabular data that you would find in a traditional relational database. Semi-structured data, such as the JSON data you would find in a NoSQL database, falls somewhere in between.
Data automation and intelligence firm Aparavi is asking companies to reconsider not only how much data they store and process, but also consider how much unstructured data they have, since storage and processing of unstructured data releases more carbon into the atmosphere.
“The more unstructured data a company has, the bigger the data footprint,” says Adrian Knapp, the CEO and founder of Aparavi. “Companies can go green by identifying redundant, outdated, and trivial data, also known as ROT data. Unaccounted for data is detrimental to the environment as it takes up space on servers and slows down processings.”
A similar theme is coming out of Datadobi, a provider of unstructured data management solutions. Unstructured data is a big factor in achieving environmental, social justice, and governance (ESG) initiatives at Datadobi, says the company’s co-founder and CRO, Michael Jack.
“In honor of Earth Day, I want to remind enterprises that unstructured data plays an important role in the ESG conversation,” Jack says. “A holistic approach to ESG involves enterprises being encouraged and enabled to move away from legacy models where data is stored in a digital ‘landfill’ and is taking up space, money, and precious resources and giving very little in return.”
Unstructured data typically is the feedstock for today’s massive AI initiatives. Breakthroughs in neural network architectures are enabling companies to automation actions based on imagery processed through a computer vision or text processed through a natural language processing (NLP) model. As this popular form of AI grows, so too does the environmental impact.
“AI usage has been growing year after year in many sectors, this is particularly true for insurance and financial services, where the rate of AI adoption is the highest among all industries, up more than 37% in one year according to a report from KPMG.”
“CPU and GPU processors are now designed specifically for more and more demanding machine learning tasks,” says Linh C. Ho, the CMO at Zelros, which develops AI solutions for insurance companies. “The environmental impact of this is significant, and is not getting lower anytime soon.”
The global carbon footprint shrunk by 6% in 2020, thanks to the COVID-19 pandemic that shuttered factories, closed schools, and eliminated flights. But now that the global economy is growing again, the world’s carbon footprint is growing, too.
If there’s a bright spot to all this, it’s the potential to use advanced technologies to not only monitor carbon emissions on a more granular level, but to start managing the emissions in a better manner. Forrester analyst Abhijit Sunil is hopeful that will have a meaningful impact.
“In the coming year, we will see more emerging technologies, such as blockchain, AI/ML, automation, digital twins, semiconductor advancements, edge and IoT, being applied in sustainability use cases,” Sunil says. “They can help organizations reduce carbon emissions, streamline measurement and accounting, and monitor waste reduction, among other use cases.”
However, there’s a danger inherent in relying on these emerging technologies, as they are themselves computationally expensive, and therefore energy intensive, Sunil says.
“They also have the risk of increasing carbon emissions at the edge and increasing e-waste,” he says. “As services and solutions providers rapidly look to integrate such technologies with sustainability, it is important to reconcile the risks associated with these technologies as they mature, especially where less carbon intensive alternatives exist. Identifying the right use case, scale and scope is key to making technologies more beneficial than risky for climate action.”