June 14, 2018

Anaconda: Data Science Exiting Hadoop for the Cloud

George Leopold

Data scientists are embracing cloud-native frameworks as they move on from on-premises data infrastructure previously dominated by Hadoop, concludes a survey on the state of data science.

The shift is driven in part by the enterprise transition from merely managing big data to using machine learning and other connected data tools to glean insights in real time, according to the data science survey released this week by Python platform specialist Anaconda Inc. Cloud-native technologies such as applications containers and the Kubernetes cluster orchestrator are growing at the expense of traditional big data technologies such as Hadoop and Apache Spark, the survey of more than 4,200 data scientists found.

The vendor survey that mostly addresses the growing popularity of the Python data science platform (Anaconda reports 2.5 million downloads per month) also acknowledges that Hadoop-style big data approaches are less appealing to data scientists. One reason may be a youth movement: 26 percent of respondents were identified as “students” accustomed to cloud services.

Anaconda executives noted that cloud-native technologies that also include API-based applications are helping drive the enterprise transition to cloud analytics.

“More software developers [are] using the Anaconda platform as machine learning is becoming pervasive and will be integrated with every application,” said Mathew Lodge, Anaconda’s senior vice president for products and marketing.

As containers move into production , the survey notes that data volumes at the time of Hadoop’s emergence in 2005 “now fit easily into a single server’s memory and there is a plethora of alternatives to building a data lake.” As a result, the survey found a growing preference among data scientists for Docker containers (19 percent) over Hadoop and Spark (15 percent).

Kubernetes, the de facto standard for container orchestration, was cited by nearly 6 percent of those surveyed.

As data scientists shift operations to the cloud, Kubernetes developer Google (NASDAQ: GOOGL) was ranked highest for its cloud data services, ahead of public cloud rivals Amazon Web Services (NASDAQ: AMZN) and Microsoft Azure (NASDAQ: MSFT). Google Cloud Platform’s “focus on data services is paying off with the Anaconda community,” the survey found.

As with most vendor surveys, Anaconda used the results to toot its own horn as the “data science community’s de facto platform for data processing, visualization and machine learning [and] AI.” A key reason is that the open source distribution of the Python and R programming languages is free.

Indeed, the survey found that open-source licensing ranked relatively low in importance to data scientists mainly interested in easy-to-use platforms. (Anaconda, formerly Continuum Analytics, offers an enterprise version of the Python data science platform. It claims more than 6 million users.)

Along with the shift to cloud data science, the survey found that NoSQL databases rank just behind cloud services, “demonstrating their value for storing and processing semi-structured data,” Anaconda said.

The data science report can be downloaded here.

Anaconda, Austin, Texas, said its survey was conducted between March 22 and April 30, 2018.

Recent items:

Anaconda Taps Containers to Simplify Data Science Deployments

Why Anaconda’s Data Science Tent is So Big—And Getting Bigger

Applications: Artificial Intelligence, Enterprise Analytics, Visualization

Technologies: Cloud, Frameworks

Sectors: Academia, Other

Vendors: Amazon, Anaconda, google, Microsoft

Tags: apache spark, cloud analytics, cloud native, cloud services, containers, data science, Data Scientists, Hadoop, Kubernetes, NoSQL, python

Only registered users may comment. Register using the form below.

Check off newsletters you would like to receive*
- HPCwire
- EnterpriseTech
- Datanami
- Technology Conferences & Events
- Advanced Computing Job Bank
- Technology Product Showcase
Email*
Name*
First Last
Organization*
Job Function*
Industry*
Country*
City*
State*
Province*
- Please check here to receive valuable email offers from Datanami on behalf of our select partners.

April 26, 2024

April 25, 2024

April 24, 2024

Sponsored Partner Content

Get your Data AI Ready – Celebrate One Year of Deep Dish Data Virtual Series!

Supercharge Your Data Lake with Spark 3.3

Learn How to Build a Custom Chatbot Using a RAG Workflow in Minutes [Hands-on Demo]

Overcome ETL Bottlenecks with Metadata-driven Integration for the AI Era [Free Guide]

Gartner® Hype Cycle™ for Analytics and Business Intelligence 2023

The Art of Mastering Data Quality for AI and Analytics

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Top 6 Strategies for Reducing Data Warehouse Costs

Building an Operational Data Warehouse for Real-time Analytics

Sponsored Multimedia

The Power of DataOps: Bring Automation to Life
No Comments

Tactical Steps for Cloud Migration
No Comments

Immuta Data Access Platform
No Comments

Data Mesh: Fact or Fiction?
No Comments

Contributors

Featured Events

AI & Big Data Expo North America 2024

CDAO Canada Public Sector 2024

AI Hardware & Edge AI Summit Europe

AI Hardware & Edge AI Summit 2024

CDAO Government 2024

Anaconda: Data Science Exiting Hadoop for the Cloud

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

April 26, 2024

April 25, 2024

April 24, 2024

Most Read Features

Most Read News In Brief

Most Read This Just In