Follow Datanami:
February 15, 2024

Mythbust Your Way to Modern Data Management

JB Baker

(Pavel Ignatov/Shutterstock)

Artificial intelligence (AI) – arguably the most data-hungry technology to exist – is roaring into the mainstream. The amount of data AI must crunch to produce content or respond to queries is voracious. Describing the volume of data as a needle in a haystack is an understatement; it’s more like a needle in an ocean.

As I/O leaders struggle to optimize and manage the volume of data to process, a spotlight lands on how to make storage more efficient. It will take all the tools and tricks that data science has up its sleeves to balance the trade-offs made across compute, storage, and networking infrastructure resources, including that old familiar favorite…data compression.

Data compression is usually one of the first data management tools discovered by almost everybody not long after running out of capacity on their first computer (Hello WinZip, you old friend.) However, that early knowledge of data compression means its value for the enterprise is often unquestioned and underestimated.  After all, compression trades performance for efficiency…everybody knows that.  But does that assumption still hold true today?

As we work to build more sustainable IT, we need to revisit some of our assumptions, including data compression. The reality is every organization’s path is different and personal. The implementation of data compression isn’t a Point A-to-Point B path. Instead, it works like the popular Choose Your Own Adventure books. In those, the reader is the main character. Every few pages, you’re offered a choice and turn to another page based on your choice to read the next part of the story. You have multiple paths.  So, let’s make things a bit clearer by myth-busting three of the most common myths about data compression and strategies.

(Mongta Studio/Shutterstock)

 Myth #1 –There’s Just One Path to Data Compression

Enterprises often believe there is one path for data compression. They may think that data compression is done exclusively in software on the host CPU. Because the CPU does the processing, there is the risk of a performance penalty under load, making it a non-starter for critical performance workloads.

In the same way, the data pipeline within your organization is unique and tailored to your requirements, and architecting how data flows offers plenty of options. Data compression can be done in many ways, and the outcomes of choosing how and where compression should be processed can lead to benefits that cascade throughout the architecture. For example, there is untapped potential in Flash technology that is currently bottlenecked by the effects of write-amplification.  By performing compression directly on the SSD in hardware, latency consistency, capacity, and endurance can all be multiplied several times while offloading CPUs and GPUs.

Mythbusting strategy: Consider data compression early and evaluate all the workloads that could benefit from it from both performance and lifespan perspectives. There is a data compression approach that can work for your unique needs.

Myth #2 – Data Compression Can’t Solve Business Problems

How can you improve the overall cost of ownership of your infrastructure? How can you increase storage and performance while decreasing power consumption? How can you make the data center more sustainable? When organizations try to solve these sorts of problems, data compression may not immediately leap to mind as the answer.

Data compression doesn’t get more attention because organizations simply aren’t thinking about it as a problem-solving tool. This becomes clear when you look at search trends related to data and see that “enterprise data compression” is orders of magnitude lower down the results than something like “data management.”

This is unfortunate because data compression is an easy, fundamental technology that can significantly address these issues if done in a modern way. Nearly all data is somewhat compressible; even a small compression ratio can significantly alleviate business burdens.

(phive/Shutterstock)

Mythbusting strategy: Data compression can solve many business problems when done right. Compression is best done in hardware to maximize the benefits, close to where data resides. Build data compression considerations early into the design of your data pipeline.

Myth #3 – Compression Results In Performance Penalties

Contrary to long-standing wisdom, compression avoids performance penalties when done in hardware. It acts as an “accelerator” for applications by freeing up bottlenecks in the overall systems. Data compression can optimize flash by compressing data during writes and decompressing it on reads without any host action.

Examples of the positive impact data compression can have on performance include:

  • Reading and writing fewer bits can elevate sustained random write performance, enhance read tail latency in mixed workloads, and reduce write amplification. This improves endurance and usable capacity, particularly in high IOPS environments.
  • The ability to harness even minimal data compressibility can lead to significant gains in performance and endurance.
  • If the data is highly compressible and the workloads are heavy in mixed-IO, it can extend the capacity beyond physical limits to increase storage density and reduce data storage costs, all while significantly increasing performance.

Mythbusting strategy: Be thoughtful about the technology and how to apply it. Look beyond capacity to see how compression can extend the life of your hardware and reduce your power consumption.

Benefit from Data Compression in Your Organization

Data compression is a powerful tool once you bust through the myths. No single path exists to compression. It can solve numerous business problems and accelerate performance. Consider how compression can help lead you to a more sustainable data center – and to all the benefits that come with it.

About the author: JB Baker is the vice president of marketing at ScaleFlux. 

Related Items:

Is Your Data Management Strategy Ready for AI? 5 Ways to Tell

Data Management Predictions for 2024

How to Speed Your Data Warehouse by 148x

 

Datanami