People to Watch 2018

Wes McKinney
Creator of Pandas
Author of “Python for Data Analysis”

Wes McKinney is the creator of the Python pandas project and a PMC member for the Apache Arrow and Apache Parquet projects. He published the book “Python for Data Analysis” in 2012, with an updated 2nd edition released in 2017. He was the co-founder and CEO of DataPad, an analytics company later acquired by Cloudera in 2014. At Cloudera, he focused on engineering efforts to bridge the Python, Hadoop, and Spark ecosystems. He now works at Two Sigma in New York City as a software architect focused on data science tools.

Datanami: Congratulations on being named a Datanami Person to Watch in 2018! To kick things off, were you surprised at all at the huge success of the Pandas library? What were you expecting?

Wes McKinney: I was mostly relieved! When I got interested in building data tools for Python in 2008, it wasn’t obvious that the Python community would be able to develop the robust data community that we have now. Some of the biggest hurdles to community growth were basic data access and tabular data wrangling, problems that pandas made significantly easier for newcomers. I spent most of 2011 and 2012 focused (with the help of Adam Klein and Chang She) on making pandas a viable tool for real world data analysis, and writing my book “Python for Data Analysis” in the process. The turning point for the project came around the end of 2012. In 2013, when I got busy with my company DataPad, I handed off the reigns of maintenance and growth to Jeff Reback and rest of the pandas core team, who’ve done an outstanding job keeping the project alive and healthy the last 5 years. We just recently crossed 1,000 unique contributors on GitHub, a huge milestone for any open source project.

Datanami: You’re only 32. Do you think being a bit on the younger side gives you a unique perspective into today’s data science problems?

I was fortunate to have gotten involved in data science tools when I was 22, before it was even called data science! A lot of people my age and younger have gravitated toward web technologies in the JavaScript ecosystem and newer programming languages. As time has passed, I have been doing more and more infrastructure-level systems engineering in C and C++ for data processing. I believe there are still many important problems to solve, particularly at the systems level, and as a community we need to do what we can to make the engineering side of data science more attractive and interesting to the upcoming generations of developers.

Datanami: Python has become an incredibly popular language for data science over the last few years. Do you think Python’s rapid ascent have an impact on the field of data science, and if so, what was the impact?

The Python and R in particular have had a huge impact on the field in some key ways. By virtue of being open source, anyone can install the software and get up and running for free. Additionally, the communities have focused on usability, education, and developer experience to enable individuals to become productive very quickly. In recent years, Python has emerged as the “user interface of choice” for many cutting-edge machine learning projects, like TensorFlow and PyTorch. Engineering teams have been successful implementing systems code in lower-level languages like C or C++ and exposing the functionality to users through Python bindings. Python’s strength as a “glue language” is probably the main reason that it developed a numerical computing community back in the 1990s, and this remains a part of its success today.

Datanami: What do you hope to see from the big data community in the coming year?

I have personally spent the last several years focused mainly on the Apache Arrow open source project, a cross-language in-memory computing and data interoperability platform. As many big data systems have grown more mature in recent years, I hope we see increased ecosystem-spanning collaborations on projects like Arrow to help with platform interoperability and architectural simplification. I believe that this “defragmentation,” so to speak, will make the whole ecosystem more productive and successful using open source big data technologies.

Datanami: Outside of the professional sphere, what can you share about yourself that your colleagues might be surprised to learn – any unique hobbies or stories?

In the late 1990s, I helped operate a “speed run” competition website for the video game GoldenEye 007. I guess you could say it was my first experience with managing a distributed online community — surprisingly good preparation for some of the challenges that come with open source community development


AB Periasamy Minio	Bill Schmarzo Dell EMC	Cathy O’Neil Author

Crystal Valentine MapR	Emil Eifrem Neo4j	Lloyd Tabb Looker

Michael Jordan RISELab	Nima Negahban Kinetica	Tom Siebel C3 IoT

Tyler Akidau Google	Wes McKinney Two Sigma	Yann LeCun Facebook

People to Watch 2018

Wes McKinney
Creator of Pandas
Author of “Python for Data Analysis”

April 25, 2024

April 24, 2024

April 23, 2024

April 22, 2024

Sponsored Partner Content

Get your Data AI Ready – Celebrate One Year of Deep Dish Data Virtual Series!

Supercharge Your Data Lake with Spark 3.3

Learn How to Build a Custom Chatbot Using a RAG Workflow in Minutes [Hands-on Demo]

Overcome ETL Bottlenecks with Metadata-driven Integration for the AI Era [Free Guide]

Gartner® Hype Cycle™ for Analytics and Business Intelligence 2023

The Art of Mastering Data Quality for AI and Analytics

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Top 6 Strategies for Reducing Data Warehouse Costs

Building an Operational Data Warehouse for Real-time Analytics

Sponsored Multimedia

The Power of DataOps: Bring Automation to Life
No Comments

Tactical Steps for Cloud Migration
No Comments

Immuta Data Access Platform
No Comments

Data Mesh: Fact or Fiction?
No Comments

Contributors

Featured Events

AI & Big Data Expo North America 2024

AI Hardware & Edge AI Summit Europe

AI Hardware & Edge AI Summit 2024

CDAO Government 2024

People to Watch 2018

Wes McKinney Creator of Pandas Author of “Python for Data Analysis”

April 25, 2024

April 24, 2024

April 23, 2024

April 22, 2024

Most Read Features

Most Read News In Brief

Most Read This Just In

Sponsored Partner Content

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Sponsored Multimedia

Contributors

Featured Events

Wes McKinney
Creator of Pandas
Author of “Python for Data Analysis”