

DataStax just announced the general availability of its vector search capability in Astra DB, its DBaaS built on Apache Cassandra.
Vector search is a must-have capability for building generative AI applications. In machine learning, vector embeddings are the distilled representations of raw training data and act as a filter for running new data through during inference. Training a large language model results in potentially billions of vector embeddings.
Vector databases store these embeddings and perform a similarity search to find the best match between a user’s prompt and the vectorized training data. Instead of searching with keywords, embeddings allow users to conduct a search based on context and meaning to extract the most relevant data.
There are native databases specifically built to manage vector embeddings, but many relational and NoSQL databases (like Astra DB) have been modified to include vector capabilities due to the demand surrounding generative AI.
This demand is palpable: McKinsey estimates that generative AI could potentially add between $2.6 and $4.4 trillion in value to the global economy. DataStax CPO Ed Anuff noted in a release that databases capable of supporting vectors are crucial to tapping into the potential of generative AI as a sustainable business initiative.
“An enterprise will need trillions of vectors for generative AI so vector databases must deliver limitless horizontal scale. Astra DB is the only vector database on the market today that can support massive-scale AI projects, with enterprise-grade security, and on any cloud platform. And, it’s built on the open source technology that’s already been proven by AI leaders like Netflix and Uber,” he said.
DataStax says one advantage of vector search within Astra DB is that it can help reduce AI hallucinations. LLMs are prone to fabricating information, called hallucinating, which can be damaging to business. This vector search release includes Retrieval Augmented Generation (RAG), a capability that grounds search results within specific enterprise data so that the source of information can be easily pinpointed.
Data security is another factor to consider with generative AI deployment, as many AI use cases involve sensitive data. DataStax says Astra DB is PCI, SOC2, and HIPAA enabled so that companies like Skypoint Cloud Inc., which offers a data management platform for the senior living healthcare industry, can use Astra DB as a vector database for resident health data.
“Envision it as a ChatGPT equivalent for senior living enterprise data, maintaining full HIPAA compliance, and significantly improving healthcare for the elderly,” said Skypoint CEO Tisson Mathew in a statement.
To support this release, DataStax also created a Python library called CassIO aimed at accelerating vector search integration. The company says this software framework easily integrates with popular LLM software like LangChain and can maintain chat history, create prompt templates, and cache LLM responses.
The new vector search capability is available on Astra DB for Microsoft Azure, AWS, and Google Cloud. The company also says vector search will be available for customers running DataStax Enterprise, the on-premises, self-managed offering, within the month.
Matt Aslett of Ventana Research expects generative AI adoption to grow rapidly and says that through 2025, one-quarter of organizations will deploy generative AI embedded in one or more software applications.
“The ability to trust the output of generative AI models will be critical to adoption by enterprises. The addition of vector embeddings and vector search to existing data platforms enables organizations to augment generic models with enterprise information and data, reducing concerns about accuracy and trust,” he said.
Related Items:
Vector Databases Emerge to Fill Critical Role in AI
DataStax Bolsters Real-Time Machine Learning with Kaskada Buy
DataStax Nabs $115 Million to Help Build Real-Time Applications
July 3, 2025
- FutureHouse Launches AI Platform to Accelerate Scientific Discovery
- KIOXIA AiSAQ Software Advances AI RAG with New Version of Vector Search Library
- NIH Highlights AI and Advanced Computing in New Data Science Strategic Plan
- UChicago Data Science Alum Transforms Baseball Passion into Career with Seattle Mariners
July 2, 2025
- Bright Data Launches AI Suite to Power Real-Time Web Access for Autonomous Agents
- Gartner Finds 45% of Organizations with High AI Maturity Sustain AI Projects for at Least 3 Years
- UF Highlights Role of Academic Data in Overcoming AI’s Looming Data Shortage
July 1, 2025
- Nexdata Presents Real-World Scalable AI Training Data Solutions at CVPR 2025
- IBM and DBmaestro Expand Partnership to Deliver Enterprise-Grade Database DevOps and Observability
- John Snow Labs Debuts Martlet.ai to Advance Compliance and Efficiency in HCC Coding
- HighByte Releases Industrial MCP Server for Agentic AI
- Qlik Releases Trust Score for AI in Qlik Talend Cloud
- Dresner Advisory Publishes 2025 Wisdom of Crowds Enterprise Performance Management Market Study
- Precisely Accelerates Location-Aware AI with Model Context Protocol
- MongoDB Announces Commitment to Achieve FedRAMP High and Impact Level 5 Authorizations
June 30, 2025
- Campfire Raises $35 Million Series A Led by Accel to Build the Next-Generation AI-Driven ERP
- Intel Xeon 6 Slashes Power Consumption for Nokia Core Network Customers
- Equal Opportunity Ventures Leads Investment in Manta AI to Redefine the Future of Data Science
- Tracer Protect for ChatGPT to Combat Rising Enterprise Brand Threats from AI Chatbots
June 27, 2025
- Inside the Chargeback System That Made Harvard’s Storage Sustainable
- What Are Reasoning Models and Why You Should Care
- Databricks Takes Top Spot in Gartner DSML Platform Report
- LinkedIn Introduces Northguard, Its Replacement for Kafka
- Change to Apache Iceberg Could Streamline Queries, Open Data
- Agentic AI Orchestration Layer Should be Independent, Dataiku CEO Says
- Fine-Tuning LLM Performance: How Knowledge Graphs Can Help Avoid Missteps
- The Evolution of Time-Series Models: AI Leading a New Forecasting Era
- Top-Down or Bottom-Up Data Model Design: Which is Best?
- Stream Processing at the Edge: Why Embracing Failure is the Winning Strategy
- More Features…
- Mathematica Helps Crack Zodiac Killer’s Code
- ‘The Relational Model Always Wins,’ RelationalAI CEO Says
- Confluent Says ‘Au Revoir’ to Zookeeper with Launch of Confluent Platform 8.0
- The Top Five Data Labeling Firms According to Everest Group
- DuckLake Makes a Splash in the Lakehouse Stack – But Can It Break Through?
- Solidigm Celebrates World’s Largest SSD with ‘122 Day’
- Toloka Expands Data Labeling Service
- Supabase’s $200M Raise Signals Big Ambitions
- With $17M in Funding, DataBahn Pushes AI Agents to Reinvent the Enterprise Data Pipeline
- Databricks Is Making a Long-Term Play to Fix AI’s Biggest Constraint
- More News In Brief…
- Astronomer Unveils New Capabilities in Astro to Streamline Enterprise Data Orchestration
- Seagate Unveils IronWolf Pro 24TB Hard Drive for SMBs and Enterprises
- Databricks Unveils Databricks One: A New Way to Bring AI to Every Corner of the Business
- Gartner Predicts 40% of Generative AI Solutions Will Be Multimodal By 2027
- BigBear.ai And Palantir Announce Strategic Partnership
- Databricks Donates Declarative Pipelines to Apache Spark Open Source Project
- Deloitte Survey Finds AI Use and Tech Investments Top Priorities for Private Companies in 2024
- Code.org, in Partnership with Amazon, Launches New AI Curriculum for Grades 8-12
- Atlan Launches AI Data Quality Studio for Snowflake
- Databricks Launches Lakebase, a New Class of Operational Database for AI Apps and Agents
- More This Just In…