Why Anaconda’s Data Science Tent Is So Big–And Getting Bigger
With more than 13 million downloads to date, Anaconda is blossoming into a real phenomenon in a crowded data science field. What made the collection of mostly Python-based tools so popular to data hackers—a dedication to openness, interoperability, and innovation—is also the strategy behind Continuum Analytics’ business expansion, and possibly even an IPO.
More than 400 people attended Continuum Analytics‘ inaugural AnacondaCON event last week in the company’s hometown of Austin, Texas. By all measures, the two-day conference was a success—Wednesday night’s AnacondaCON Carne bar-b-que party was quintessentially Austin-esque–and planning is already underway for next year’s event, to be held in April.
In the meantime, the folks at Continuum, under new CEO Scott Collison, will be looking to build their data science platform into a true juggernaut that rivalling other big data ecosystems coalescing around the Apache Software Foundation. You may not know it, but Anaconda uses a similar open source business model as the ASF, all bound by a common BSD license.
“What we do is we look at the Python ecosystem, we identify gaps in the ecosystem, and then we’ll incubate an open source project,” Anaconda Business Unit EVP and Continuum CMO Michele Chambers told Datanami at last week’s conference.
“We mature it. We work with the community. We bring other people into the project and harden it,” she continued. “And then, once it becomes mature enough, we say, ‘Now we’re going to surface this up through our enterprise offering, our enterprise platform, and add bits that are proprietary that enterprises care about.'”
Not the Next Red Hat
Chambers is adamant that Continuum it’s not aiming to be “the next Red Hat of data science,” a comparison that is growing old. Instead, Chambers likens Continuum’s business model more to what Cloudera has done with Apache Hadoop. Cloudera develops proprietary add-ons in areas like management, monitoring, and security, that aren’t apt to be tackled by the open source community, and bundles them in enterprise offerings, like Cloudera Manager.
Only a small fraction the 13 million people who have downloaded Anaconda become paying customers, and that suits the company just fine, according to Continuum co-founder and CTO Peter Wang. “We’re not going to make every dollar there is to be made in data science, and I’m kind of okay with that,” Wang told Datanami. “Our company is different in that a lot of the core innovations are available free, for the most part.”
Anaconda is so remarkable because it includes more than 720 libraries, frameworks, and assorted other open source tools developed by data scientist and researchers working in “real” sciences like astrophysics, biology, etc. Most of these libraries are linked through their common use of Python, although there are a smattering of tools written in R too.
Wang and his Continuum co-founder Travis Oliphant, who is now the company’s chief data scientist, had the foresight to realize that, while these Python tools were powerful in of themselves, they become exponentially more powerful if they could all work together. That’s the genius behind Anaconda and the original conda package management tool.
“Let’s say you’re going to install the Scikit-learn package,” Chambers said. “First thing, it says, ‘Thanks for trying, but I’m dependent on 10 different other packages.’ And you have to get the right version and you have to work together and you have to do the build. We solved that problem… Travis and Peter, being visionaries, saw that that was going to be an impediment to the adoption of Python, so they solved that problem.”
A Must-Have Data Science Toolset
Anaconda has quickly become a must-have tool for data scientists and budding quantitative analysts. Just consider the numbers:
At the end of 2015, Anaconda had been downloaded 3 million times, a number that exploded to 11 million in December 2016. However, over the first six weeks of 2017, there have been 2 million more downloads. If this rate of growth continues, by the end of the year, Anaconda could be pushing 2 to 3 million downloads per month. As CEO Collison, a former Microsoft executive, told Datanami last month: “Even by Microsoft standards, that’s a lot of downloads.”
One AnacondaCON attendee had a story of a college student who was stumped by a data problem, only to be told by their professors that they wouldn’t even talk to them about the problem until they had downloaded and learned Anaconda. The product is practically a requirement these days, and approaching the hallowed data grounds long dominated by Excel. It’s even been adopted by IBM, Intel, and Microsoft, all of which have incorporated Anaconda into their own machine learning and deep learning packages.
Anaconda’s amazing success has turned Oliphant–who received his PhD in advanced biomedical imaging at the Mayo Clinic in Rochester, Minnesota and created the popular NumPy package to build better computer vision algorithms—into a data science celebrity. During the AnacondaCON Carne party, young data analysts were eager to shake his hand and ask him about various statistical packages.
It’s hard to find a single individual who has done so much to move Python from its roots as a scripting language used by Web developers into one of the most popular languages for data science. “When I came to Python one of the first things I loved was how quickly I could iterate, because it’s interpreted,” Oliphant said during his keynote address at AnacondaCON last Thursday. “I didn’t have to compile, wait for a while, lose my train of thought.”
Today, the Anaconda tent is big, and it’s getting bigger, thanks in large part to NumFOCUS, the non-profit organization founded by Oliphant to nurture open source projects. In addition to things like Pandas and NumPy, it’s home to a growing number of other tools, like Spyder, Holoviews, DataShader, PyParallel, DyND, Bokey, Phosphor JS, and Dask. The list goes on and one—and all of it is there, free of charge. It will remain open and free of charge forever, according to Wang.
“The core innovation work we’re doing in visualization, distributed computing and all that stuff–we don’t want to chain people,” Wang said. “We don’t want to charge rent because people have a whole bunch of stuff they’re stuck with. At any point in time, our customers have a pretty good BATNA.”
BATNA, of course, stands for “best alternative to negotiated agreement,” and it’s something that Oracle database customers don’t have much of, because Oracle controls access to their own data. Anaconda users, on the other hand, can always walk away, and the thing they built with Anaconda will still work, still process data, still provide value to the user.
It’s a decidedly different business model, but it’s one that resonates very well with Anaconda’s user base, a few hundred of which need the enterprise management and security capabilities that Continuum charges actual money for. If Continuum ever got greedy and clamped down on the open source innovation in an attempt to squeeze more gold from the Anaconda community, the golden goose would probably die.
“That’s what we care about,” Wang said. “We want to see the woods grow. We want to make sure that were good stewards of the beautiful forest and all the kinds of things that happen in it. If we can’t keep our business going the way it’s going, then a lot of energy that goes in that ecosystem will be drawn away, and we don’t want that to happen.”
Software veteran Collison was recently brought in to run the business as CEO, leaving Oliphant and Wang more time to work in the data science and Anaconda communities, and to build out the Blaze group of data science tools at Anaconda, and take Python to the next level. “I have a vision for the future of array computing,” Oliphant said during his keynote.
Together with Chambers—another software industry vet who most recently worked at R-backer Revolution Analytics before it was bought by Microsoft—the duo are aiming Anaconda along the rocket-ship path forged by other high-flying tech companies.
“Anaconda today has the right breadth of capability, and now it’s just a matter of building out the depth around the platform,” Chambers said. “With Python, we have an opportunity for massive disruption.”
It probably wouldn’t be wise to bet against them.