People to Watch 2023 – Maxime Beauchemin
You’ve created two successful open source projects, Apache Superset and Apache Airflow. What do you attribute the success to? What made them successful?
Most people are familiar with the idea of “product market fit” (PMF), a term coined by Marc Andreessen more than 15 years ago, and I like to think of a proxy for it in open source that I’d call “project community fit” (PCF). So it’s not just about the quality of the project, or how much you invest into it, it’s about building the right thing at the right time for the right people, and riding the momentum. I think reading about PMF and doing the mind exercise to translate the ideas to an open source project is fairly straightforward and informs finding PCF fairly well. The dynamics aren’t identical but they’re similar. If anything open source has better network effects (because it’s free by definition, and welcomes contributions) and snowballs better than a product in a market.
In any case, the ideas behind PMF were foreign to me back when I started both projects at Airbnb back in 2014/2016, and just wanted to build something that was going to be useful at Airbnb, and put it out there just in case someone outside of Airbnb may be interested to pick it up and collaborate or even just use it. My thinking was “if I’m building something for Airbnb that’s not a competitive advantage, why limit my impact to Airbnb?” Looking back, I think what worked for me was to build with passion, and to engage as directly as possible with anyone showing any kind of interest, whether it’d be on GitHub, email, Slack, or looking for conversation. For a long time, I honored and handled every single touch point. I also went beyond just writing software and did a lot of things that I’d now call “product marketing,” finding good names for the project, did some decent messaging/positioning, built half decent websites with nice screenshots, maintained decent docs, …
Both projects hit a point where I couldn’t keep up. From that point on, the projects have a life of their own. That’s OSS “escape velocity.” Feels great to reach this point!
Do you think data engineering gets the respect it deserves? Why does it seem perpetually overlooked in the data space?
The world isn’t always a fair place, but I think generally things (people, ideas, concepts, projects) tend to get the respect they deserve over time. In many ways historically data engineering, (maybe thinking about the pre-pipeline as code era, call it drag-and-drop ETL days) didn’t show a lot of self-respect either, especially when measured from the perspective of software engineering.
Arguably data engineering didn’t come into being until mid-2010s, tried to catch up/integrate software engineering practices, and while doing so missed out on the devops movement, only to try to catch up on some of that over the past five years or so through the lagging data ops movement. I think the gap in respect is reasonable when measured against software engineering practices, but is that fair!? We don’t measure other functions by SWE practices standard.
In the end, respect should be based on business impact, not solely around code/PDLC rigor and maturity. On the impact front, there are some real problems too. I talk about it in an article title “the downfall of the data engineer,” and some of these problems are preventing data engineering from delivering more impact and get respect from the organization as a whole.
Is it getting easier or harder to be a data engineer in 2023?
Clearly easier, the role is better defined, the stack/tooling has evolved, best practices increasingly well defined, and expectations around the role are more clear than ever before. Oh and the modern data stack is amazing, you can get started in minutes, get a world-class-scale-to-infinity cloud data warehouse setup in minute, set up Apache Superset instantly on top of it using Preset, do data integration with Airbyte or Fivetran without a hitch, set up Airflow through Astronomer, DBT Cloud. All this infrastructure is at your fingertips, pay-as-you-go and frankly amazing! The pool of articles and resources around best practices is only increasing too, communities exist now, … So much easier than it used to be.
Outside of the professional sphere, what can you share about yourself that your colleagues might be surprised to learn – any unique hobbies or stories?
I’m a huge snowboarder. Grew up riding 50 days a year in the Quebec city scene in the 90s, and recently moved to Tahoe to be able to get back into riding regularly. Before the move, going to ride from the Bay Area while having three young kids was very difficult, so I didn’t ride much for the past decade. But now I’m back on the mountain! Oh and the kids are getting good now, so we often ride together!c
April 18, 2024
- Moveworks Announces Strategic Collaboration with Microsoft to Deliver Secure, Scalable Generative AI Solutions to Customers
- Rockset Announces 2024 Index Conference, Industry Event for Engineers Building Search, Analytics, and AI Applications at Scale
- SAS Advances Industry Solutions with Packaged AI Models
- Altair Acquires Cambridge Semantics, Powering Next-Gen Enterprise Data Fabrics and GenAI
- SAS Adds to Its Trustworthy AI Offerings with Model Cards and AI Governance Services
- Fujitsu and Oracle Collaborate to Deliver Sovereign Cloud and AI Capabilities in Japan
- Kore.ai Introduces Experience Optimization Platform V11.0, Accelerating AI Deployment
- Volumez Expands Collaboration with AWS, Joins ISV Accelerate Program
- AI Squared Raises $13.8M to Accelerate Widespread AI Adoption Within Organizations
- Hazelcast Sets New Standards for AI Workloads with Platform 5.4 Enhancements
April 17, 2024
- Immuta Launches Domains Policy Enforcement Capability to Simplify Enterprise-wide Data Security and Governance
- ThoughtSpot Makes Embedding AI-Powered Analytics Easy and Ubiquitous for Everyone
- Cribl Ushers in a New Era of Data Storage Simplicity with Cribl Lake
- Neo4j Welcomes New GQL International Standard in Major Milestone for Database Industry
- General Assembly Report: Tech Firms Pay Top Dollar to Secure Competent AI Professionals
- Appen Named a Leader in Everest Group’s Data Annotation and Labeling Solutions for AI/ML PEAK Matrix Assessment 2024
- Loft Labs Raises $24M in Series A Funding to Enhance Multi-Cloud and AI Infrastructure Capabilities
- Hitachi Vantara Unveils Virtual Storage Platform One, Providing the Data Foundation for Unified Hybrid Cloud Storage
April 16, 2024
Most Read Features
Sorry. No data so far.
Most Read News In Brief
Sorry. No data so far.
Most Read This Just In
Sorry. No data so far.
Sponsored Partner Content
-
Get your Data AI Ready – Celebrate One Year of Deep Dish Data Virtual Series!
-
Supercharge Your Data Lake with Spark 3.3
-
Learn How to Build a Custom Chatbot Using a RAG Workflow in Minutes [Hands-on Demo]
-
Overcome ETL Bottlenecks with Metadata-driven Integration for the AI Era [Free Guide]
-
Gartner® Hype Cycle™ for Analytics and Business Intelligence 2023
-
The Art of Mastering Data Quality for AI and Analytics
Sponsored Whitepapers
Contributors
Featured Events
-
Call & Contact Center Expo
April 24 - April 25Las Vegas NV United States -
AI & Big Data Expo North America 2024
June 5 - June 6Santa Clara CA United States -
AI Hardware & Edge AI Summit 2024
September 10 - September 12San Jose CA United States -
CDAO Government 2024
September 18 - September 19Washington DC United States