Follow Datanami:
November 8, 2023

Monte Carlo Announces Support for Apache Kafka and Vector Databases to Enable More Reliable Data and AI Products

SAN FRANCISCO, Nov. 8, 2023 — Monte Carlo, a data observability leader, today announced a series of new product advancements to help companies tackle the challenge of ensuring reliable data for their data and AI products.

Among the enhancements to its data observability platform are integrations with Kafka and vector databases, starting with Pinecone. These forthcoming capabilities will help teams tasked with deploying and scaling generative AI use cases to ensure that the data powering large-language models (LLMs) is reliable and trustworthy at each stage of the pipeline. With this news, Monte Carlo becomes the first-ever data observability platform to announce data observability for vector databases, a type of database designed to store and query high-dimensional vector data, typically used in RAG architectures.

To help these initiatives scale cost-effectively, Monte Carlo has released two first-of-its-kind data observability products, Performance Monitoring and Data Product Dashboard. While Performance Monitoring makes it easy for teams to monitor and optimize inefficiencies in cost-intensive data pipelines, Data Product Dashboard allows data and AI teams to seamlessly track the reliability of multi-source data and AI products, from business critical dashboards to assets used by AI.

The Future of Enterprise AI Hinges on Data Reliability

Today’s announcements come amid the rapid growth of generative AI initiatives and subsequent focus on democratizing access to data. According to Databricks’ State of Data and AI report, the number of companies using SaaS LLM APIs (used to access services like ChatGPT) has grown 1310% between the end of November 2022 and the beginning of May 2023. Additionally, the report cited a 411% increase in the number of AI models put into production during the same period.

According to a 2023 Wakefield Research survey, data and AI teams spent double the amount of time and resources on data downtime year-over-year, owing to an increase in data volume, pipeline complexity, and organization-wide data use cases. The same survey found that time-to-resolution was a major culprit of the rise of data downtime, a 166% increase on average from last year.

To combat the trend of data downtime and help teams resolve data quality issues quickly and collaboratively, Monte Carlo’s newest product enhancements unlock operational processes and key business SLAs that drive data trust, including cloud warehouse performance and cost optimization and maximizing the reliability of revenue-driving data products.

Powering the Future of Trusted AI with Kafka and Vector Database Support

Apache Kafka, an open-source data streaming technology that enables high-throughput, low-latency data movement is an increasingly popular architecture with which companies are building cloud-based data and AI products. With Monte Carlo’s Kafka integration, customers can ensure the data that must be fed to AI and ML models in real-time for specific use cases is reliable and trustworthy.

Another critical component of building and scaling enterprise-ready AI products is the ability to store and query vectors, or mathematical representations of text and other unstructured data used in retrieval-augmented generation (RAG) or fine-tuning pipelines. Available in early 2024, Monte Carlo is the first data observability platform to support trust and reliability for vector databases, such as Pinecone.

“To unlock the potential of data and AI, especially large language models (LLMs), teams need a way to monitor, alert to, and resolve data quality issues in both real-time streaming pipelines powered by Apache Kafka and vector databases powered by tools like Pinecone and Weaviate,” said Lior Gavish, co-founder and CTO of Monte Carlo. “Our new Kafka integration gives data teams confidence in the reliability of the real-time data streams powering these critical services and applications, from event processing to messaging. Simultaneously, our forthcoming integrations with major vector database providers will help teams proactively monitor and alert to issues in their LLM applications.”

Expanding end-to-end coverage across both batch, streaming, and RAG pipelines enables organizations to realize the full potential of their AI initiatives with trusted, high-quality data.

Both integrations will be available in early 2024. Visit the Monte Carlo blog for frequent updates.

Alongside these updates, we’re partnering with Confluent to develop an enterprise-grade data streaming integration for Monte Carlo customers. Built by the original creators of Kafka, Confluent Cloud provides businesses with a fully managed, cloud-native data streaming platform to eliminate the burdens of open source infrastructure management and accelerate innovation with real-time data.

Operationalizing Data Observability for Data and AI Products at Scale

As enterprises look to incorporate generative AI capabilities both for internal and external use cases, the need to build, refine, and fine-tune the underlying data pipelines only grows. To help companies achieve reliable, enterprise-ready AI and trustworthy data products, Monte Carlo is excited to announce new performance monitoring and data product dashboards to easily track the health of its most critical assets.

  • Performance Monitoring: When adopting data AI products, efficiency and cost monitoring are critical considerations that impact product design, development, and adoption. Our new Performance dashboard allows customers to avoid unnecessary cost and runtime inefficiencies by allowing them to easily detect and resolve slow-running data and AI pipelines. Performance allows users to easily filter queries related to specific DAGs, users, dbt models, warehouses, or datasets. Users can then drill down to spot issues and trends and determine how performance was impacted by changes in code, data, and warehouse configuration.
  • Data Product Dashboard: Data Product Dashboard allows customers to easily define a data product, track its health, and report on its reliability to business stakeholders via direct integrations with Slack, Teams, and other collaboration channels. Customers can now easily identify which data assets feed a particular dashboard, ML application or AI model, and unify detection and resolution for relevant data incidents in a single view.

With this added ability to track key reliability SLAs at the individual data product level, Monte Carlo becomes the first data observability platform to enable observability at the organizational, domain, and data/AI product levels.

Learn more about Performance Monitoring and Data Product Dashboard via docs.

All Eyes on AI at IMPACT 2023

Today’s announcements were made at Monte Carlo’s third-annual IMPACT Data Observability Summit. In addition to being the world’s only data observability conference, this year’s IMPACT Summit brought together some of the foremost experts on the topic of reliable data and AI, including:

  • Eli Collins, VP of Product, Google DeepMind
  • Nga Phan, SVP of Product Management, Salesforce AI
  • Neta Iser, VP of Data and Integration, Navan (formerly TripActions)
  • Oliver Gomes, VP of Analytics and Strategy, Fox
  • Krishnan Parasuraman, VP, Head of the Office of the Field CTO, Snowflake
  • Craig Wiley, Sr. Director of Product for AI/ML, Databricks
  • Tristan Handy, Co-Founder and CEO, dbt Labs
  • George Fraser, CEO and co-founder, Fivetran
  • Prukalpa Sankar, Co-Founder, Atlan
  • And more

To learn more about Monte Carlo’s vision for enterprise-ready AI and data observability, visit or request a demo.

About Monte Carlo

As businesses increasingly rely on data to power digital products and drive better decision-making, it’s mission-critical that data and AI are accurate and reliable. Founded in 2019, Monte Carlo, the data observability leader, works with Fox, Comcast, CreditKarma, Roche, and hundreds of companies to help them achieve trust in data. Named a “New Relic for data” by Forbes, Monte Carlo is rated as the #1 Data Observability solution by G2 Crowd, GigaOm, and Ventana Research, and is consistently cited as a data observability leader by Gartner and Forrester.

Source: Monte Carlo