June 9, 2022

Is Index-Free The Answer to the Looming Data Deluge Problem?

John Smith

(whiteMocca/Shutterstock)

The legacy log management industry is in trouble. With data volumes expected to reach 181 zettabytes by 2025, many log management vendors are rapidly approaching a breaking point. Their prevalent technology is index driven, meaning their indices will soon dwarf the actual reference data users are trying to collect!

In a recent Stack Overflow thread, I noted the following response to a question about why an Elasticsearch index was 30GB when the log source was only 3GB.

“If you’re working with a source of 3GB and your indexed data is 30GB, that’s a multiple of about 10x over your source data. That’s big, but not necessarily unheard of. If you’re including the size of replicas in that measurement, then 30GB could be perfectly reasonable.”

This reads like the definition of technical debt. The yield of this equation is only 10%. And we’re not just dealing with an increase in storage; that additional 30GB will also significantly impact the hardware required to support it. What ends up happening is the memory, IO and cores you want to use for threat-hunting, alerting and ad hoc queries are instead spent on servicing these indices.

This begins to resemble the poor debtor whose interest payments have become so enormous that they can never get out from under their obligation. Like our poor debtor, we’re left wondering how our legacy log management solutions went from racks to rows in our data center.

It’s not uncommon to have index-driven solutions that require hundreds of servers to accommodate existing logging needs, just to get to tens of terabytes of data ingestion. So what’s the solution to the data deluge? Do you add more nodes? Cores? Memory?

How Did We Get Here?

(cybrain/Shutterstock)

Let’s start by looking at the evolution of logs over the last 25 years. When my career started I might purchase a server and would run some flavor of Unix on it (AIX, Solaris, HP-UX, etc.). In addition to my routers and switches I’d also send syslogs from this server to a SIEM or log management solution. Later, not only did hardware drop in price, the evolution of the hypervisor came about, and my single piece of iron was now running a hypervisor and 10-15 guest operating systems.

Today, we have Docker/Kubernetes, where a pod OS and hundreds of containers run on the same piece of iron. What was once a single source of logging now sends logs from hundreds of systems. If you couple this with applications that actually embed the capability to log, the number of connected devices will increase by 5x over the next 10 years. Welcome to the data deluge.

The Index-Free Answer

At a minimum, the time has come to investigate a hybrid solution that combines new and old approaches. In this scenario, you can have one system designed to handle the bulk of log collection while still giving teams the flexibility to forward important logs to their incumbent solutions. This enables teams to share the burden of log management with a better performing and less expensive alternative.

You may have heard the term “index-free” technology being discussed. But what exactly is it? And how can dropping indices lead to faster searches and reduced storage requirements?

Index-free is a combination of several different technologies that changes the way data is processed when it’s ingested. By removing indexing from the ingestion process, it opens up new ways for teams to relate to their data by speeding up search results and reducing costs.

A traditional log management approach would require writing the data, querying it and then populating the results to a dashboard. With an index-free approach, when searching data stored on disk, it’s limited to interactive, ad-hoc queries during the incident leveraging bloom filters.

(Lane V. Erickson/Shutterstock)

In addition, an index-free logging approach can also be a supplementary solution to augment your existing investment and remove some of the burden of high-capacity logs. With an index-free architecture operating at petabyte scale, you can finally say yes to what you are currently saying no to, and use event forwarding to send important events to your index-driven solutions. This is huge for organizations who want to aggregate, manage and use log data to make real-time decisions across both the IT and business landscapes.

The data deluge is upon us. With indices soon to outgrow the data you’re trying to collect, your ability to interact with the data in an ad hoc fashion is limited, as they become difficult to compute. Eventually, the size of the index severely impacts the performance to the point that the very reason for the index in the first place (faster queries) is undermined altogether.

The time has come to investigate alternatives to heavy indices and find a way to coexist with legacy log management solutions, so your DevOps, ITOps and SecOps teams can reclaim visibility of their infrastructure and handle the data deluge at scale.

About the author: John Smith is the director of technical marketing engineering at Humio, a Crowdstrike. John has more than 20 years of experience holding a variety of roles from big data, DevOps, SecOps to Sales, Marketing and Integration leadership. He has worked in security for more than 13 years, including pioneering work with event correlation, behavioral analytics and remote access.

Related Items:

Rethinking Log Analytics at Cloud Scale

Log Storage Gets ‘Chaotic’ for Communications Firm

Index-Free Log Management: Surf the Approaching Tidal Wave of Data Instead of Drowning in It

Applications: Enterprise Analytics

Technologies: Middleware

Vendors: Crowdstrike, Humio

Tags: index-free analytics, log analytics, log data

Only registered users may comment. Register using the form below.

Check off newsletters you would like to receive*
- HPCwire
- EnterpriseTech
- Datanami
- Technology Conferences & Events
- Advanced Computing Job Bank
- Technology Product Showcase
Email*
Name*
First Last
Organization*
Job Function*
Industry*
Country*
City*
State*
Province*
- Please check here to receive valuable email offers from Datanami on behalf of our select partners.

Is Index-Free The Answer to the Looming Data Deluge Problem?

How Did We Get Here?

The Index-Free Answer

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

April 26, 2024

April 25, 2024

April 24, 2024

Sponsored Partner Content

Get your Data AI Ready – Celebrate One Year of Deep Dish Data Virtual Series!

Supercharge Your Data Lake with Spark 3.3

Learn How to Build a Custom Chatbot Using a RAG Workflow in Minutes [Hands-on Demo]

Overcome ETL Bottlenecks with Metadata-driven Integration for the AI Era [Free Guide]

Gartner® Hype Cycle™ for Analytics and Business Intelligence 2023

The Art of Mastering Data Quality for AI and Analytics

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Top 6 Strategies for Reducing Data Warehouse Costs

Building an Operational Data Warehouse for Real-time Analytics

Sponsored Multimedia

The Power of DataOps: Bring Automation to Life
No Comments

Tactical Steps for Cloud Migration
No Comments

Immuta Data Access Platform
No Comments

Data Mesh: Fact or Fiction?
No Comments

Contributors

Featured Events

AI & Big Data Expo North America 2024

CDAO Canada Public Sector 2024

AI Hardware & Edge AI Summit Europe

AI Hardware & Edge AI Summit 2024

CDAO Government 2024

Is Index-Free The Answer to the Looming Data Deluge Problem?

How Did We Get Here?

The Index-Free Answer

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

April 26, 2024

April 25, 2024

April 24, 2024

Most Read Features

Most Read News In Brief

Most Read This Just In

Sponsored Partner Content

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Sponsored Multimedia

Contributors

Featured Events

Share

Copy short link