April 26, 2023

Arize AI Debuts Phoenix, the 1st Open Source Library for Evaluating LLMs

BERKELEY, Calif., April 26, 2023 — Arize AI, a market leader in machine learning observability, debuted deeper support on the Arize platform for generative AI and a first-of-its-kind open source observability library for evaluating large language models (LLMs) at its Arize:Observe 2023 summit.

The launch comes at a critical moment for the future of AI. Generative AI is fueling a technical renaissance, with models like GPT-4 showing sparks of artificial general intelligence and new breakthroughs and use cases emerging daily. On the other hand, most leading large language models are black boxes that have known issues around hallucination and problematic biases.

Available today, Arize Phoenix is the first open source observability library specifically built to help data scientists evaluate outputs from LLMs like OpenAI’s GPT-4, Google’s Bard, Anthropic’s Claude, and others. Leveraging Phoenix, data scientists can visualize complex LLM decision-making, monitor LLMs when they produce false or misleading results, and narrow in on fixes to improve outcomes.

“A huge barrier in getting LLMs and Generative Agents to be deployed into production is because of the lack of observability into these systems,” says Harrison Chase, Co-Founder of LangChain. “With Phoenix, Arize is offering an open source way to visualize complex LLM decision-making.”

“Phoenix is a much-appreciated advancement in model observability and production,” says Christopher Brown, CEO and Co-Founder of AI-focused consulting firm Decision Patterns and a former Computer Science lecturer at UC Berkeley. “The integration of observability utilities directly into the development process not only saves time but encourages model development and production teams to actively think about model use and ongoing improvements before releasing to production. This is a big win for management of the model lifecycle.”

“Despite calls to halt AI development, the reality is that innovation will continue to accelerate,” said Jason Lopatecki, CEO and Co-Founder of Arize AI. “Phoenix is the first software designed to help data scientists understand how GPT-4 and LLMs think, monitor their responses and fix the inevitable issues as they arise.”

Phoenix is instantiated by a simple import call in a Jupyter notebook and is built to interactively run on top of Pandas dataframes. The tool works easily with unstructured text and images, with embeddings and latent structure analysis designed as a core foundation of the toolset.

Leveraging Phoenix, data scientists can:

Evaluate LLM Tasks: Troubleshoot tasks such as summarization or question/answering to find problem clusters with misleading or false answers.
Detect Anomalies: Using LLM embeddings
Find Clusters of Issues to Export for Model Improvement: Find clusters of problems using performance metrics or drift. Export clusters for fine-tuning workflows.
Surface Model Drift and Multivariate Drift: Use embedding drift to surface data drift for generative AI, LLMs, computer vision (CV) and tabular models.
Easily Compare A/B Datasets: Uncover high-impact clusters of data points missing from model training data when comparing training and production datasets.
Discover How Embeddings Represent Your Data: Map structured features onto embeddings for deeper insights into how embeddings represent your data.
Monitoring Analysis to Pinpoint Issues: Monitor model performance and track down issues through exploratory data analysis.

About Arize AI

Arize AI is a machine learning observability platform that helps ML teams deliver and maintain more successful AI in production. Arize’s automated model monitoring and observability platform allows ML teams to quickly detect issues when they emerge, troubleshoot why they happened, and improve overall model performance across both structured and unstructured data. Arize is a remote first company with headquarters in Berkeley, CA.

Source: Arize AI

Arize AI Debuts Phoenix, the 1st Open Source Library for Evaluating LLMs

April 19, 2024

April 18, 2024

Sponsored Partner Content

Get your Data AI Ready – Celebrate One Year of Deep Dish Data Virtual Series!

Supercharge Your Data Lake with Spark 3.3

Learn How to Build a Custom Chatbot Using a RAG Workflow in Minutes [Hands-on Demo]

Overcome ETL Bottlenecks with Metadata-driven Integration for the AI Era [Free Guide]

Gartner® Hype Cycle™ for Analytics and Business Intelligence 2023

The Art of Mastering Data Quality for AI and Analytics

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Building an Operational Data Warehouse for Real-time Analytics

Can You Use Kafka as a Database?

Sponsored Multimedia

The Power of DataOps: Bring Automation to Life
No Comments

Tactical Steps for Cloud Migration
No Comments

Immuta Data Access Platform
No Comments

Data Mesh: Fact or Fiction?
No Comments

Contributors

Featured Events

Call & Contact Center Expo

AI & Big Data Expo North America 2024

AI Hardware & Edge AI Summit 2024

CDAO Government 2024

Arize AI Debuts Phoenix, the 1st Open Source Library for Evaluating LLMs

April 19, 2024

April 18, 2024

Most Read Features

Most Read News In Brief

Most Read This Just In

Sponsored Partner Content

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Sponsored Multimedia

Contributors

Featured Events

Share

Copy short link