Let It Go: The Financial Benefits of Data Deletion
Conventional wisdom holds that the key to winning in big data is scaling one’s data collection, storage, and analysis capabilities faster than one’s competitor. The more data you have, the more trends you can uncover, and the better your ML models will be, the thinking goes. But in some situations, holding onto data is a liability that can cost you millions. That’s why the smarter companies know when to let the data go.
While some data clearly has value and should be kept, other pieces of data can actually be deleterious to a company’s financial and legal health and should be disposed as soon as legally possible, says Bill Tolson, vice president of global compliance for Archive360.
“Part of records management, information management, is not just storing data. It’s getting rid of stuff too,” Tolson tells Datanami. “That’s what a lot of people don’t really acknowledge. A good records management capability is getting rid of stuff. And that’s the art of the whole thing: When does something become valueless to the company?”
To get an answer to that question–When does a piece of data become valueless?–Tolson points to a series of reports that were published in last decade by the Compliance, Governance and Oversight Council (CGOC) a forum of legal, IT, privacy, security, records and information management professionals that was backed by IBM.
As data ages, it loses value. In fact, according to a 2019 report dubbed “The Information Governance Process Maturity Model,” the CGOC (which disbanded earlier this year) calculated the “cost to value gap” and the “risk to value gap” for various types of data, including regular office documents, product research, sales, customer data, HR, financial, and messaging. After about seven years, there isn’t much value to any of this (see figure).
In another report, the CGOC concluded that 1% of corporate data is susceptible to litigation holds and e-discovery and must be retained, while 5% was regulated data that also needed to be retained for some period of time, depending on the specific regulation, Tolson says. The CGOC found that another 25% of data had some business value, he says.
“The bottom line was they said 65% to 69% of all corporate data is junk and should be gotten rid of,” Tolson says. “Get rid of that stuff that’s junk, because you’re paying for the spinning disk. But even more so, you’re paying for the time wasted in going through it.”
The word “e-discovery” should send shivers down your CFO’s spine, if they’ve been around for any length of time. If a company is involved in litigation (which most companies will experience at some point in their lives), then it could be required to provide documents to the plaintiff. Since most companies store documents electronically these days, discovery is done electronically.
The problem with e-discovery is that we haven’t fully digitized junior lawyers and their $300-per-hour rates. According to Tolson, 70% of the costs involved in e-discovery can be attributed to the need to pay lawyers to manually pore over every document the e-discovery process highlighted as a potential piece of evidence.
“The tradeoff with keeping too much data is the cost of keeping it and managing it and the cost of securing it,” Tolson says. “But also the cost associated with having to go through it for e-discovery.”
A recent study by the chemical company DuPont shows just how costly it can be to hold onto documents past their expiration date. According to Tolson, DuPont analyzed the e-discovery costs in nine or 10 lawsuits, and discovered they had spent $25 million to $30 million to review 24 million documents to determine if it was privileged or not.
“They later found out that, of those 24 million documents they reviewed, 11 million of them or thereabouts were actually expired and shouldn’t have existed,” Tolson says. “They should have been disposed of. But because of that, they spent an additional $12 million reviewing documents that should not have been there anymore.”
Every company needs to figure out what the happy medium is between minimizing legal exposure on the one hand, and holding onto data sets that could be useful for the company, including for data analytics projects and training machine learning algoritihms.
In the recent legal battle between Samsung and Apple, Samsung at one point declared that they were only holding email for two weeks, unless it was specifically tagged as legally protected, Tolson says.
“Two weeks is a little weird,” he says. “With emails, you can get into some pretty weird things. Did somebody open it? Did they read it? If they did open it, did they try to delete it? These are all kind of things in a court of law that are interesting.”
AI has improved, and in fact, Archive360 is one of several software and services companies that are using the latest natural language processing (NLP) capabilities to replicate what human information professionals (and perhaps even lawyers) are very good at.
But the average e-discovery project still costs several million dollars, Tolson says, so it’s best to avoid the business altogether. “If those documents don’t exist legally, then you don’t have to review them,” he says.
One of the ways that Archive360 is using AI is classifying data to determine how it fits into a customers’ information retention policy. Humans are still relied upon to make 90% to 95% of the determinations of whether a given piece of data could be subject to e-discovery requests and should therefore be moved to a protected folder, Tolson says.
“If you can automate that with a 98 or 99% accuracy, then records people, information people, are going to go wild, because that’s their biggest problem,” he says.
Archive360 also has an AI that functions as a “supervisor” application inside of a customer’s communication programs to oversee what employees are saying and whether they’re breaking laws, such as stock brokers guaranteeing financial returns, which is illegal. So AI is increasingly playing a bigger role in records retention and compliance initiatives.
But at the end of the day, humans are driving the boat, and need to provide the guidance on what data should be kept and what data should be sent to the e-shredder. Companies’ livelihoods and profitability depend on it.
“Defensible disposition is one of those things that is becoming much bigger to companies now because they’re spending gigantic sums, whether it’s in the cloud or on prem, keeping stuff they shouldn’t keep,” Tolson says. “Adding that additional functionality and capability to say you should get rid of this stuff is lowering their overall TCO.”