Follow Datanami:
April 2, 2024

OctoAI Unveils OctoStack for Seamless Deployment of Generative AI Models Across Environments

SEATTLE, April 2, 2024 — OctoAI (formerly OctoML) today announced OctoStack, the industry’s first complete technology stack to serve generative AI models anywhere. With OctoStack, organizations have a turn-key production platform that delivers highly-optimized inference, model customization and asset management at enterprise scale. Now, companies can achieve total AI autonomy when building and running Generative AI applications directly within their own environments.

OctoStack offers businesses a complete and self-contained solution for deploying generative AI models in their environment—e.g., any cloud in their virtual private cloud (VPC) provider and / or on-premises. Organizations can run any model or fine-tune in their preferred location, adjacent to their enterprise-data, and within their security guardrails. This solution encompasses state-of-the-art model serving technology that is meticulously optimized at every layer, from data input to GPU code.

Innovative businesses like Otherside (Hyperwrite) are already moving from proprietary LLM providers to their own fine-tuned open source models on OctoAI, reporting speedups and savings of up to 12X. OctoStack brings these same efficiencies to Generative AI deployments within customer environments, with 4X better GPU utilization and an estimated 50 percent reduction in operational costs compared to best-in-class DIY.

“Enabling customers to build viable and future-proof Generative AI applications requires more than just affordable cloud inference,” said Luis Ceze, Co-Founder and CEO of OctoAI. “Hardware portability, model onboarding, fine-tuning, optimization, load balancing—these are full-stack problems that require full-stack solutions. That’s exactly what we’ve built at OctoAI. With OctoStack customers can leverage our industry-leading GenAI systems on their own terms with control over their data, models and hardware.”

Key Benefits of OctoStack:

  • Run Any Model, Fast: Businesses select their ideal mix of open-source—e.g., Llama, Mistral, Mixtral and others—custom, and proprietary models while maximizing performance.
  • Run In Any Environment: Run in your cloud virtual private connection (VPC) in your cloud of choice: AWS, Microsoft Azure, Coreweave, Google Cloud Platform, Lambda Labs, OCI, Snowflake and more.
  • Choose Any Hardware Target: Run models on a broad range of hardware, including NVIDIA and AMD GPUs, AWS Inferentia and more.
  • Expertise and Innovation: Benefit from OctoAI’s unmatched expertise in hardware-independent, full-stack inference optimization, honed over years of dedicated research and development.
  • Continuous Optimization: On-premises customers receive continuous updates similar to SaaS benefits, including subscription to newly optimized models and support for additional hardware types, ensuring their AI capabilities remain cutting-edge.

The OctoAI SaaS platform is already in use by high-growth generative AI companies—including, Otherside AI, Latitude Games, and Capitol AI. OctoStack enables customers to seamlessly transport this highly reliable, customizable, efficient infrastructure directly into their own environment. OctoStack has been designed to place companies firmly in control, delivering AI autonomy with a seamless, maintenance-free serving stack for running Generative AI applications. is a global service to combat telephone scams using generative conversational AI. With highly custom models supporting multiple languages and regional dialects, they leveraged OctoStack to efficiently run their suite of LLMs across multiple geographies.

“For our performance and security-sensitive use case, it is imperative that the models that process call data run in an environment that offers flexibility, scale and security,” said Dali Kaafar, Founder and CEO at Apate AI. “OctoStack lets us easily and efficiently run the customized models we need, within environments that we choose, and deliver the scale our customers require.”

About OctoAI

OctoAI (formerly OctoML) is on a mission to make AI more accessible and sustainable so it can be used to improve lives. The OctoAI platform delivers a complete stack for app builders to run, tune, and scale their AI applications in the cloud or on-prem. With blazing fast inference APIs for popular models like SDXL, Mixtral, and Llama2, end-to-end developer solutions, and world-class ML systems under the hood, businesses can focus on building apps that wow their customers without becoming AI infrastructure experts. OctoAI is based in Seattle, Washington and is backed by Madrona Venture Partners, Amplify Partners, Tiger Global, and Addition Capital.

Source: OctoAI