A ‘Glut’ of Innovation Spotted in Data Science and ML Platforms
These are heady days in data science and machine learning (DSML) according to Gartner, which identified a “glut” of innovation occurring in the market for DSML platforms. From established companies chasing AutoML or model governance to startups focusing on MLops or explainable AI, a plethora of vendors are simultaneously moving in all directions with their products as they seek to differentiate themselves amid a very diverse audience.
“The DSML market is simultaneously more vibrant and messier than ever,” a gaggle of Gartner analysts led by Peter Krensky wrote in the Magic Quadrant for DSML Platforms, which was published earlier this month. “The definitions and parameters of data science and data scientists continue to evolve, and the market is dramatically different from how it was in 2014, when we published the first Magic Quadrant on it.”
The 2021 Magic Quadrant for DSML is heavily represented by companies to the right of the axis, which anybody who’s familiar with Gartner’s quadrant-based assessment method knows represents the “completeness of vision.” No fewer than 13 of the 20 vendors to make the quadrant’s cut landed on the right side, which indicates active innovation.
Generating new DSML features and exploring new DSML methods is the name of the game in this fast-moving business, Gartner says. “There remains a glut of compelling innovations and visionary roadmaps,” the analysts wrote. “…[V]endors are heavily focused on innovation and differentiation, rather than pure execution. Innovation remains key to survival and relevance.”
The Connecticut-based analyst firm did not sound surprised to conclude that the cloud biggies have moved strongly into the space. “The long-expected gigantic presence in this market of Google and Amazon is now easily felt as they compete with Microsoft for supremacy in terms of DSML capabilities in the cloud,” the analysts write.
However, that does not mean that they are sucking all the air out of the room, as smaller companies have found success in the market, with a few achieving what Gartner termed “hypergrowth.” A few well-established leaders from the previous generation of statistical tools, like SAS, MathWorks, and IBM (SPSS) are also doing well, Gartner notes. In fact, those three vendors are collectively doing better than AWS, Google, and Microsoft when it comes to ability to execute.
The DSML market is young and vibrant, and there is ample revenue and funding opportunities for companies that differentiate themselves on the product side, Gartner says. There is just a “moderate” level of M&A activity at this time, which indicates a growing market. With that said, the vendors who made Gartner’s cut had to prove themselves by meeting certain customer-count and financial performance criteria. And of course, they have to have a product that meets the definition of an DSML platform.
Which begs the question: Just what is an DSML platform? Gartner defines it as a place “to source data, build models and operationalize machine learning,” either by certified, card-carrying data scientists or people who are doing data science work, i.e. citizen data scientists, data engineer, or ML specialists.
Beyond that broad definition, Gartner identified 13 other capabilities that may (or may not) exist in a given DSML platform, including: data ingestion; data preparation; data exploration; feature engineering; model creation and training; model testing; deployment; monitoring; maintenance; data and model governance; explainable artificial intelligence (XAI); business value tracking; and collaboration.
Here’s a brief description of the pros and cons provided for each of the vendors listed in the Magic Quadrant, courtesy of Gartner:
Databricks Unified Data Platform
Pros: Scalable multi-cloud support; empowerment of data scientists; execution and expansion.
Cons: Lack of support for citizen data scientists; need for governance and responsible AI; growing cloud competition.
Dataiku Data Science Studio
Pros: Support for citizen data scientists; focus on business value; market traction.
Cons: Heavy use of extensions and plugins; emerging story around “XOps” (i.e. unified management of data, ML, models, and platforms); pricing for smaller teams.
IBM Watson Studio on IBM Cloud Pak for Data
Pros: support for multiple personas; composite AI vision; responsible AI and governance.
Cons: scope of auto AI features; doubts about Watson brand; lack of clarity in product-bundling.
Pros: Robust composite AI capabilities; integrated domain knowledge; verifiable and reliable ML.
Cons: Interface lacks usability among non-engineers and non-scientists; interpretability of ML models; lack of augmented DSML capabilities.
Pros: Market understanding and presence; cloud-native architecture and open source integration; automated feature engineering and modeling.
Cons: Perceived high cost; product bundling; marketing strategy.
TIBCO Software (various products)
Pros: Leading edge DSML capabilities; integration of DS and BI/analytics; support for collaboration and applied analytics.
Cons: Limited ModelOps capabilities; lack of support for citizen data science capabilities; financial growth in 2020.
AWS (various products)
Pros: Breadth and depth of cloud platform; performance and scalability; data labeling and human-in-the-loop capabilities
Cons: Lack of attention on citizen data scientist; rapid rollout of products and maturity; maturity of on-prem, hybrid, and multi-cloud support
DataRobot Enterprise AI Platform
Pros: Sales strategy and execution; high-touch customer service; successful acquisitions.
Cons: Complexity of product portfolio; resource-heavy onboarding; capability gaps.
Google Cloud AI Platform
Pros: Responsible AI vision and capabilities; research contributions; cohesion and simplification of consolidated products.
Cons: Rapid pace of change; steep learning curve; lack of capabilities for on-prem, hybrid, and multi-cloud deployments.
KNIME Analytics Platform
Pros: Breadth and depth of DSML capabilities; commitment to open source; visual workflow coherence.
Cons: Limitations in enterprise deployments; responsible AI vision; low market traction.
Microsoft Azure Machine Learning
Pros: Strong support for enterprise DS; support for multiple personas; openness and partnerships.
Cons: Requirement of use of other Azure services; immaturity of on-prem, hybrid, and multi-cloud capabilities; lack of support for augmented DSML capabilities.
RapidMiner (various products)
Pros: Support for multiple personas; “clear vision and delivery of aligned features”; expandability and governance.
Cons: Growth rate; average advanced analytics capabilities; academic perception of product.
H2O.ai (various products)
Pros: Vision for value creation; extensive automation; rich AI explainability features (XAI).
Cons: Lack of some data access and data prep features; OEM partner strategy; collaboration and cohesion.
Alteryx Analytics Process Automation
Pros: Support for multiple personas; product packaging and go-to-market strategy; customer support.
Cons: Changing product portfolio; high cost; lack of innovation.
Niche Players Quadrant
Alibaba Cloud’s Platform for AI (PAI) Studio and Data Science Workshop
Pros: Strong community in China; advanced use-case modeling; and seamless integration.
Cons: Focus on Asia; lack of product vision; narrow usage and focus on professional data scientists.
Altair Knowledge Studio and Knowledge Works
Pros: Ease of use; support for data pipelines; customer satisfaction
Cons: Functional gaps in lineup; limited rollouts in some industries; relatively slow growth.
Pros: Trusted and flexible platform; based on open source; culture of collaboration.
Cons: Focus on technical audience; lack of model operationalization functions; runtime stability.
Cloudera Data Platform
Pros: Native Spark on Kubernetes; support for complex data workloads; metadata support for DataOps and MLOps.
Cons: No GUI for development; lack of coherence of products; domain-specific solutions.
Domino Data Lab Data Science Platform
Pros: Support for large, expert teams; mature MLOps capabilities; support for on-prem, hybrid, and multi-cloud.
Cons: Support for small, immature DS teams; low market visibility; open source vision;
Samsung SDS Brightics AI
Pros: Comprehensive ecosystem vision; data access, prep, and visualization; ease of use and collaboration.
Cons: Limited adoption outside of Asia; gaps in product vision; limited capabilities in ModelOps and explainability.
This is indeed a great time to be in the data science and machine learning business. Whether you’re a user of these tools or helping to develop them, the rapid pace of innovation is not only exciting but good for business as a whole.