Follow Datanami:
August 11, 2021

Log Storage Gets ‘Chaotic’ for Communications Firm

(Best Backgrounds/Shutterstock)

As the dedicated WiFi provider for the Toronto subway system, BAI Communications generates a fair amount of log data, which it stores and analyzes using the Elastic family of products, including Logstash and Kibana. But when it was forced to bank a year’s worth of log data for regulatory purposes, it turned to an upstart software and services firm called ChaosSearch for a more affordable solution.

BAI Communications designs, builds, and operates communication networks for subway systems in major cities, including Toronto, New York City, and London. Before COVID, it was providing ad-supported Internet connectivity to 150,000 users per day on behalf of the Toronto Transit Commission. Worldwide, the number was close to 600,000 users per day.

In Toronto (where BAI is based), the company logs about 10,000 events per second across its network, amounting to about 50GB per day. This log data, which originates from routers, siwtches, firewalls, and access points, flows to an on-prem Elastic cluster, which the company uses to troubleshoot network problems, among other uses cases, according to BAI’s Head of Data Analytics Jeremy Foran, who installed the system.

“I’m huge Elastic guy,” Foran says. “I have been an Elasticsearch guy since 2.4. I’m not trying to brag, but they’re at 7.x now.”

Foran discovered Elastic and the ELK Stack (now just called the Elastic Stack) back in 2015, when he was tasked with building BAI’s log management system. Having never built a log management system before, he did what any self-respecting technology professional would do: He Googled it. The search result for “best syslog server” directed him to a video by Logstash creator Jordan Sissel, and he was on his way.

“It was the guy who wrote Logstash, giving a demonstration about all the problems you face and how he solved them for me,” Foran tells Datanami. “That’s great. It was Logstash doing the heavy lifting, and Elastic [providing] the interface to investigate. So our roots are in Elasticsearch, and we have a ton of other use cases for Elasticsearch.”

BAI provides WiFi for the Toronto Subway (Iakov-Filimonov/Shutterstock)

Foran’s journey to the Elastic Stack was not unlike the journey taken by millions of others, with the exception of the lack of security use cases (BAI outsources security to an outside firm).  The popularity of Elastic was so great among IT professionals, security experts, and data analysts that it eventually became a public company in 2018, and today it sports a market capitalization of around $14 billion.

At some point, BAI’s demands evolved, and Foran was given another task: figure out a way to store all the syslog data for at least a year. The company’s status as a PCI- and ISO 27001-compliant company were staked on this archive being built and maintained.

As Foran started running the numbers on that archival project, a problem emerged.  Installing and running the new disk arrays needed to maintain a years’ worth of data in the Elastic cluster was going to be pricey.

“We had some spinning disk, a few arrays,” Foran says. “We had to go from what we needed operationally, maybe two or three weeks’ [worth of data], to a year-plus. The cost of having that much logging went up dramatically. We weren’t going to be able to afford to buy all of those disks.”

Around that time, Foran started to hear about a new company called ChaosSearch. Founded by computer scientist Thomas Hazel, ChaosSearch essentially provides an abstraction layer between a customers’ Elastic Stack products and the NoSQL database that underlies the Elastic cluster. By storing log data in a highly compressed state on an AWS S3 data lake, while maintaining API compatibility with Elastic products, it allows customers to basically “lift and shift” their Elastic system to the cloud.

Foran admits that he was skeptical when he first heard about what Hazel, the CTO, was claiming ChaosSearch could do.

“When I first met Thomas, he said, ‘Oh, well, you can store it in S3 and it will save you money,’” Foran says. “And I was like, well if I’m throwing it in an S3 bucket, how will it save me money?

“He says, ‘Well, we have an 80% compression algorithm,’ Foran continues. “And I was like, well I don’t believe you, sir. People write PhDs on compression. And if you really were achieving that, you wouldn’t be here trying to flog me software. He said, no, no, no, trust me. And as we got into it, to kick the tires, he was right.”

Convinced that ChaosSearch would cut storage costs, Foran signed BAI up for the cloud data lake analytics service. The original idea was just to keep the data there for compliance purposes. But the company has found other uses for the data.

“People write things on Twitter, like ‘WiFi sucks,’” Foran says. “There’s really not a lot of rich troubleshooting information there, so we need to be able to go in with a system and effectively validate, has there been a change? And some trends you can’t detect over two or three days. You need to have a much broader approach.”

BAI Communications Head of Analytics Jeremy Foran

With its analytics team so steeped in the Elastic Stack, BAI is able to analyze the terabytes of log data that it has stored in ChaosSearch to find answers to questions. The company’s cloud platform provides a familiar environment for BAI employees to work.

“It turns out they’re using Kibana and Elasticsearch on top of the S3 bucket. They’ve written drivers to interact with their compressed data, so it’s a familiar interface,” Foran says. “We built dashboards over here [for the on-prem Elastic cluster]. We can build them over here [for the hosted ChaosSearch environment] as well. It is, in a way, standardized on Elasticsearch. It just so happens that the backend is cheaper because of ChaosSearch.”

ChaosSearch recently added an SQL interface, giving customers the ability to query their log data using familiar BI tools, like Google’s Looker and Microsoft’s PowerBI. But you won’t catch Foran using the SQL interface, as he much prefers using Elastic’s document search language

“Maybe somebody on my team would like to use that,” he says. “I’m more of a hardcore DSL  guy, to get in there with the Elastic search queries. SQL is–I don’t know. It’s been around since the 70s. It doesn’t seem modern enough. I know there’s going to be some data scientists who want to stab me after that.”

In addition to complying with industry data regulations, BAI has managed to save quite a bit of money by adopting the ChaosSearch system. The company still maintains the on-prem Elastic cluster, for the simple reason that an Internet-based analytics system is not much use for troubleshooting why the network is down. But for inspecting long-term trends in the data, as well as maintaining regulatory compliant, ChaosSearch provides an affordable vehicle that BAI intends to drive for a while.

“At the end of the day, if we wanted to put this in Elasticsearch, this would have been tens of thousands of dollars per month,” Foran says. “If we went to put it in ChaosSearch, it’s hundreds of dollars a month. It’s an order of magnitude difference. It’s the difference between getting an Uber and buying a car.”

Related Items:

ChaosSearch Widens the Zone for Data Lake Analytics

Momentum Builds to Break Elasticsearch Licensing Deadlock

Rethinking Log Analytics at Cloud Scale

Datanami