Follow BigDATAwire:
August 23, 2024

Coming To Grips with Unstructured Legal Data

(Ilya Lukichev/Shutterstoc)

The growth of unstructured data poses real challenges. Many organizations struggle to manage unstructured data like text, images, videos, and PDFs due to the sheer size of the data and their growth rate. For the folks at the legal firm Katten Muchin Rosenman LLP, better known as Katten Law, regulations and security introduced another layer of concern.

It’s tough to get one’s mind around the sheer magnitude of unstructured data. As part of its Global Datasphere study a few years ago, the analyst firm IDC predicted that by 2025, the planet will generate over 175 zettabytes of data over a 12-month period (it has since lowered the estimate to 163 ZB).

Just storing 163 ZB of raw data would take more than 700 billion 1TB drives, which obviously isn’t going to happen, as the world only has about 13 ZB of installed storage capacity across all mediums (HDDs, flash, tape, even phones), IDC said. For the record, only about 7.5 ZB of data is actually written to a storage medium, according to IDC, meaning most data is never written down, and storage is actually overprovisioned.

Katten Law is familiar with large growth rates. The law firm, which employes 700 attorneys around the world, must store hundreds of millions of documents from thousands of its clients’ cases going back decades. All told, the firm stores about 240 TB of data, and the figure is growing by 20% to 25% every year, according to Alexander Diaz, the firm’s director of infrastructure and datacenter operations.

Source: IDC

Until recently, the law firm operated its own unstructured data archival system, which took data from the primary Windows file systems and moved it to archival storage servers installed in the firm’s data center co-los.

However, Katten Law ran into several operational issues around the archives that drove it to seek an alternative, Diaz told Datanami in a recent interview. The firm brought in Komprise, a manager of unstructured data management solutions, to do a proof of concept.

“During the POC, we identified that about 70% of the files that we were storing on our file servers were stale and hadn’t been accessed in over three years, or the case had been closed,” Diaz said. “The other reason that I proposed doing a large-scale archiving project was to limit our exposure if we ever did encounter a ransomware event, because now those files couldn’t be impacted.”

As Katten Law explored the software, they found other benefits. For instance, many archiving solutions implement a stub in the production file system to represent the data that’s been archived. If the data needs to be retrieved, the user presents that stub to the archiving solution, which fetches the data. However, if something happens to the stub, then it can be very difficult to regain access to the archived data, Diaz said.

“Komprise has a different approach,” he said. “They use a symbolic link…basically like a shortcut. So on your Windows desktop you, have a shortcut that references the path to the actual file or to the program on the operating system. And even if that that shortcut or symbolic link were to break or disappear, you still can go and find the original file and or program.”

Time-based archiving of unstructured data is another benefit of using the Komprise software, Diaz said. With many traditional archive packages, the files are archived based on a set period of time. So if the documents associated with a case haven’t been accessed in three years, for instance, it will automatically be archived.

That doesn’t work so well in the law business, Diaz said.

“A lot of times within legal, especially litigation cases, they may become dormant for a while and they may get picked up,” he said. “Let’s say we were representing someone. There’s a verdict, and then there’s time between that original case and maybe an appeal. So just basing it on time doesn’t always work.”

Komprise gave Katten Law the capability to archive the files associated with a case based on when the case is actually closed, not some arbitrary number of years when it hasn’t been touched. After the documents are archived, if the user needs to pull up a read-only copy of the data, users can do that by simply clicking a shortcut on the desktop, which initiates the data being pulled from the Komprise archive to a local storage appliance, where the user can retrieve it, Diaz said.

The firm is in the middle of transitioning its primary storage platforms from traditional spinning disks to flash storage. Moving more of the data to a the Komprise-based archive running on Microsoft Azure BLOB store helps to keep costs down while also giving the users the benefits of faster primary storage, Diaz said.

(Tatiana Shepeleva/Shutterstock)

“Komprise has very, very consistent for us,” he said. “We started with either closed cases or data being not accessed for over three years. About six months ago, we lowered the threshold to two years of no access or the cases closed, and we ended up moving another 40TB up to Azure.”

Reducing file storage for the Windows file shares will also help to save the law firm money, particularly as it transitions to a new platform later this year. “I won’t have to buy as much storage, so it’ll save us on this future purchase,” Diaz said.

The benefit from improving the security of Katten Law’s data is harder to measure. But with ransomware on the uptick once again this year, it’s clear that it brings real value to the law firm.

“I can’t emphasize enough that it also reduced our exposure because any of the files that are archived would never be impacted by any type of hacker or ransomware event,” Diaz said. “They wouldn’t have access to those files. They wouldn’t be impacted by any type of security event.”

Related Items:

It’s Still Early Days for Unstructured Data Management, Komprise Says

Getting the Upper Hand on the Unstructured Data Problem

Unstructured Data Growth Wearing Holes in IT Budgets

BigDATAwire