MinIO Enjoying Role in Emerging Cloud Architecture
In the post-Hadoop world, object storage systems have become the new favored place to park petabytes of data, with Amazon S3 leading the way among cloud providers. But when organizations are looking for on-premise object stores, one name that keeps popping up again and again is MinIO.
When MinIO debuted its S3-compatible open source object storage system two-and-a-half years ago, you may have wondered what Anand Babu (AB) Periasamy was talking about when he declared that he wanted to “solve storage.”
After all, the Hadoop Distributed File System (HDFS) had already proved that one could store petabytes of data on commodity hardware at relatively low cost. The momentum clearly belonged to Hadoop. An entire market emerged around Hadoop as the centerpiece for a next-gen data-centered operating system. Big data storage had already been solved.
Fast-forward to 2019. Demand for Hadoop is dropping while Amazon S3 just keeps growing. In the battle of big data architectures, Hadoop-style computing is losing and Amazon’s cloud architecture is winning. The market has spoken, and the new cloud architecture, which combines an S3-compatible object storage with Kubernetes for managing compute containers — is currently trouncing Hadoop’s combination of HDFS and YARN.
Periasamy’s prediction was eerily prescient. Maybe he should take credit for predicting the demise of the yellow pachyderm. But the truth is that Periasamy was caught off guard, like most of us were, at how rapidly Hadoop fell apart.
“Hadoop imploded, creating a vacuum and pulling us in,” he tells Datanami. “There are a lot of things I did not know. I did not know the Hadoop situation would happen for us. There was no way that anyone from outside of the Hadoop community could go and say that there is something better than HDFS. There was no hope for us. But how the market evolved.”
Developers Love S3
The market for big data software certainly did evolve, and the shift has been quite beneficial to MinIO. As the co-creator of GlusterFS years, Periasamy knew a thing or two about building performant and scalable distributed file systems. As he saw AWS take off, he realized that the market was responding very positively to the Amazon experience, but that there was a gap when it came to on-prem deployments.
“What customers are saying is I want it to look like AWS where storage and compute are desegregated,” he says. They want the simplicity of an S3 data lake that can handle a variety of data types, scales infinitely, and is accessed via a popular API, and a compute environment like EC2 that’s stateless, containerized, orchestrated, elastic, and multi-tenant.
At MinIO, Periasamy developed an object storage system that was designed from the beginning around the S3 API, not modified after the fact to support S3 (which he says is the case with most of his competitors). He also designed it to be performant enough to be used for production applications, such as the new class of data-driven applications on the Web, for analytic data warehouses, and even for transactional systems that would normally call for a SQL database (in that case, paired with Kafka, which functions as the memory journal. “It was surprising to me, but it’s being done,” he says).
Periasamy wrapped up all this in a container-friendly packaged, and made it available as open source under the Apache 2 license. Companies that want technical support can purchase subscriptions for MinIO. According to the company, 12 of 15 largest banks in the United States are running MinIO, and 13 of 15 largest European banks are too.
Open Source Success
Since it became available in 2017, MinIO has become one of the more popular open source prjects, with more than 400 contributors. The software is being downloaded at the rate of 85,000 downloads per day. It has more than 247 million Docker pulls and nearly 18,000 stars on GitHub.
According to Periasamy, MinIO is being used as the backend in other popular products, including Splunk and ElasticSearch. Even MPP databases, like Vertica and Greenplum, are being used with MinIO back-ends. VMware’s new cloud storage offering uses MinIO under the covers, and if you read the Tensorflow documentation, it recommends MiniIO, he says. Nutanix and Weka.io both use MinIO under the covers. “Most of the [distributed file system] vendors actually bundle MinIO along with the system because they’re asking for S3 APIs,” Periasamy says.
The object storage market will eventually converge to just a handful of projects and products offering an S3-like experience for on-premise. “There’s basically a winner-take-all model,” Periasamy says. “We focused on being the most widely adopted object store. Today by the numbers, we are way ahead of these other players.”
When Hadoop was riding high and developers were writing their applications to work with HDFS, it was common to see S3 adapters being used when a customer wanted to use S3 instead of HDFS. That dynamic has flipped, and today HDFS adapters are being used for newer applications that were designed to use S3 but may need to pull data from “legacy” systems, Periasamy says.
“The emerging [applications] are all going straight to S3,” he says. “An object API is more modern than the HDFS API, which looks closer to a file system. We are seeing that even propriety enterprise-class products, from Splunk to Vertica to Teradata, they are actually going native S3. They’re not even looking at the HDFS API anymore.”
New Data Paradigm
As object stores become more widely used, the type of data that people are putting in them is changing too. In the old days, object stores were often used for storing photos and videos, Periasamy says. But these days, object stores are increasingly being used to store huge amounts of event data that needs to be accessed quickly and reliably.
“Some people call it time-series, some call it logs,” Periasamy says. “They’re analyzing it with all kinds of data processing and machine learning, and the output is fed back into the application.”
The volume of this event data is tremendous, with large retailers creating petabytes of data in just a few hours. Whereas organizations may have sought to create ways for humans to analyze the data in the past, today they’re looking for machines to automatically deliver the insights. The folks building these systems are counting on the new cloud architecture to automate many of the data engineering tasks for them and make them more productive.
“In the newer stack, the change is the machines produce the data and machines consume the data,” Periasamy says. “That is why we have more data and you need to have faster processing. Here’s it’s all API driven. That’s why the change in stack happened.”
Past Is Prolouge
While the sun is shining on MinIO at the moment, Periasamy knows that market conditions and customer expectations can change at the drop of a hat. While Hadoop may not fit as well into enterprise analytics applications at the moment, it was the right technology at the right time. The Hadoop experiment was necessary.
“Hadoop’s idea became obsolete not because of Hadoop itself,” he says. “The requirement of the whole infrastructure changed. The data growth pattern changed. The cost of memory fell, networks became faster. All of this basically challenged the very fundamental idea of Hadoop, data locality and MapReduce theory.”
The market is demanding S3-compatible object stores and Kubernetes today. How long will it last? Nobody knows, not even Periasamy, who seemed to be a step ahead of everybody else in developing a container-friendly, open-source, on-premise version of S3.
“I can’t tell two years from now how the market can change,” he says. “I like it that way. Otherwise we’d be stuck with the old technology. Gluster was great for its time. When I saw that Amazon would convince the industry to let go of legacy interfaces like iSCSI and NFS — you can’t access that outside of your LAN. You cannot possibly think of cloud-like infrastructure with those systems. But if I stuck onto those old technologies, I would be letting our users down. And every time something new happens, I see that as an opportunity. And solving new problems is fun.”