
People to Watch 2023 – Maxime Beauchemin
You’ve created two successful open source projects, Apache OpenSet and Apache Airflow. What do you attribute the success to? What made them successful?
Most people are familiar with the idea of “product market fit” (PMF), a term coined by Mark Andreessen more than 15 years ago, and I like to think of a proxy for it in open source that I’d call “project community fit” (PCF). So it’s not just about the quality of the project, or how much you invest into it, it’s about building the right thing at the right time for the right people, and riding the momentum. I think reading about PMF and doing the mind exercise to translate the ideas to an open source project is fairly straightforward and informs finding PCF fairly well. The dynamics aren’t identical but they’re similar. If anything open source has better network effects (because it’s free by definition, and welcomes contributions) and snowballs better than a product in a market.
In any case, the ideas behind PMF were foreign to me back when I started both projects at Airbnb back in 2014/2016, and just wanted to build something that was going to be useful at Airbnb, and put it out there just in case someone outside of Airbnb may be interested to pick it up and collaborate or even just use it. My thinking was “if I’m building something for Airbnb that’s not a competitive advantage, why limit my impact to Airbnb?” Looking back, I think what worked for me was to build with passion, and to engage as directly as possible with anyone showing any kind of interest, whether it’d be on GitHub, email, Slack, or looking for conversation. For a long time, I honored and handled every single touch point. I also went beyond just writing software and did a lot of things that I’d now call “product marketing,” finding good names for the project, did some decent messaging/positioning, built half decent websites with nice screenshots, maintained decent docs, …
Both projects hit a point where I couldn’t keep up. From that point on, the projects have a life of their own. That’s OSS “escape velocity.” Feels great to reach this point!
Do you think data engineering gets the respect it deserves? Why does it seem perpetually overlooked in the data space?
The world isn’t always a fair place, but I think generally things (people, ideas, concepts, projects) tend to get the respect they deserve over time. In many ways historically data engineering, (maybe thinking about the pre-pipeline as code era, call it drag-and-drop ETL days) didn’t show a lot of self-respect either, especially when measured from the perspective of software engineering.
Arguably data engineering didn’t come into being until mid-2010s, tried to catch up/integrate software engineering practices, and while doing so missed out on the devops movement, only to try to catch up on some of that over the past five years or so through the lagging data ops movement. I think the gap in respect is reasonable when measured against software engineering practices, but is that fair!? We don’t measure other functions by SWE practices standard.
In the end, respect should be based on business impact, not solely around code/PDLC rigor and maturity. On the impact front, there are some real problems too. I talk about it in an article title “the downfall of the data engineer,” and some of these problems are preventing data engineering from delivering more impact and get respect from the organization as a whole.
Is it getting easier or harder to be a data engineer in 2023?
Clearly easier, the role is better defined, the stack/tooling has evolved, best practices increasingly well defined, and expectations around the role are more clear than ever before. Oh and the modern data stack is amazing, you can get started in minutes, get a world-class-scale-to-infinity cloud data warehouse setup in minute, set up Apache Superset instantly on top of it using Preset, do data integration with Airbyte or Fivetran without a hitch, set up Airflow through Astronomer, DBT Cloud. All this infrastructure is at your fingertips, pay-as-you-go and frankly amazing! The pool of articles and resources around best practices is only increasing too, communities exist now, … So much easier than it used to be.
Outside of the professional sphere, what can you share about yourself that your colleagues might be surprised to learn – any unique hobbies or stories?
I’m a huge snowboarder. Grew up riding 50 days a year in the Quebec city scene in the 90s, and recently moved to Tahoe to be able to get back into riding regularly. Before the move, going to ride from the Bay Area while having three young kids was very difficult, so I didn’t ride much for the past decade. But now I’m back on the mountain! Oh and the kids are getting good now, so we often ride together!
March 23, 2023
- IDC Report: Responsible AI Integral to Unlocking Benefits for B2B Enterprises
- Zenoss Launches Identity Management Capabilities
- Snowplow Appoints Industry Veterans to Expand Technical Leadership Team
- Alteryx Announces Partnership with DOD to Address Data and Analytics Skill Gap in Private Sector
- Striveworks Partners with Carahsoft to Provide AI and ML to Government Agencies
- Gartner Survey Reveals Less Than Half of Data and Analytics Teams Effectively Provide Value to the Organization
March 22, 2023
- Nutanix Study Shows Data Management Becoming More Complex as Cloud Deployments Diversify
- Dataiku Joins NVIDIA DGX-Ready Software Program to Simplify Enterprise AI
- UC San Diego Data Science Student Provides Guidance for High School DataJam Team
- TYAN’s AI Inference-Optimized Platforms Add Support for NVIDIA L4 Tensor Core GPU
- Weights & Biases Announces Integrations with NVIDIA AI
- Vultr Announces Availability of NVIDIA H100 Tensor Core GPU and Partnerships with Domino Data Lab and Anaconda
- AnswerRocket Introduces Max, an AI Assistant for Data Analysis
- Panintelligence and Yellowbrick Data Partner to Improve Embedded Analytics for SaaS Applications
March 21, 2023
- Domopalooza 2023 Unveils Full Lineup of Industry Experts and Customer Speakers
- Ascend.io Announces Industry-first Data Mesh Innovation
- Aerospike Expands Community Leadership and Enterprise Support for Spring Framework
- Aible Unveils Enterprise-Grade Solution to Generative AI’s Hallucination Problem at the 2023 Gartner Data & Analytics Summit
- Akkio Launches Chat Explore Powered by GPT-4
- Cribl Releases Product Enhancements Across Portfolio to Simplify and Personalize All Observability Data
Most Read Features
- Prompt Engineer: The Next Hot Job in AI
- Data Mesh Vs. Data Fabric: Understanding the Differences
- Iceberg Data Services Emerge from Tabular, Dremio
- Open Table Formats Square Off in Lakehouse Data Smackdown
- Hallucinations, Plagiarism, and ChatGPT
- The Future of Databases Is Now
- GPT-4 Has Arrived: Here’s What to Know
- Apache Pinot Uncorks Real-Time Data for Ad-Tech Firm
- March Madness Brings Out the Analytics
- Five Drivers Behind the Rapid Rise of Apache Flink
- More Features…
Most Read News In Brief
- Observability Primed for a Breakout 2023: Prediction
- Mathematica Helps Crack Zodiac Killer’s Code
- Multi-modal GPT-4 Rumored To Be Released This Week
- Google Cloud’s 2023 Data and AI Trends Report Reveals a Changing Landscape
- Bill Gates Says the Age of AI Has Begun, Bringing Opportunity and Responsibility
- Has Microsoft’s New Bing ‘Chat Mode’ Already Gone Off the Rails?
- Big Growth Forecasted for Big Data
- Meta Releases LLaMA Foundation Language Models to Researchers
- Andrew Ng’s Landing AI Offers Free Trial of LandingLens CV Platform
- Observability Overload: Grafana Labs Survey Builds a Case for Centralized Solutions
- More News In Brief…
Most Read This Just In
- Google Cloud and Accenture Expand Strategic Partnership, Announce Platform Tech Integration
- Former Cloudera CPO & Hortonworks Cofounder, Arun Murthy, Joins Scale AI
- Colossal-AI Releases Open Source Framework for ChatGPT Replication
- Salesforce Launches Hyperforce EU Operating Zone
- Comet Releases MLOps Industry Report | 2023 Machine Learning Practitioner Survey
- Esri Releases New App to Easily View and Analyze Global Land-Cover Changes
- AWS Announces General Availability of Amazon OpenSearch Serverless
- HireLogic Secures $6M Series A Funding to Bring Conversational Analytics to Talent Acquisition Market
- Salesforce Announces Einstein GPT
- Savant Labs Secures $11M Funding to Ease Operational Analytics
- More This Just In…
Sponsored Partner Content
Sponsored Whitepapers
Sponsored Multimedia
Contributors
Featured Events
-
The Connected Worker Summit 2023
March 28 @ 8:00 am - March 30 @ 5:00 pm -
Modern Data Pipelines 2023
March 30 @ 8:00 am - 5:00 pm -
AI in Finance Summit NY
April 20 - April 21New York NY United States -
AI & Big Data Expo North America 2023
May 17 @ 8:00 am - May 18 @ 5:00 pm -
CDAO Insurance 2023
June 13 - June 14