Follow Datanami:
November 29, 2023

OctoML Launches OctoAI Text Gen Solution

SEATTLE, Nov. 29, 2023 — OctoML announced the launch of its OctoAI Text Gen Solution to empower application builders to run and scale applications on their choice of Llama 2 Chat, Code Llama Instruct and Mistral Instruct models—all on one unified API endpoint.

The new release offers the fastest fleet of accelerated open source LLMs, including numerous configurations of Llama 2, Mistral, and the unique option to bring your own fine-tuned Llama 2 models. OctoAI’s Text Gen Solution, together with the OctoAI Image Gen Solution, now offers a flexible “model-cocktail” alternative to monolithic multi-modal models, enabling developers to build highly composable multi-modal applications.

“There’s no one-size-fits-all approach to building text generation applications,” said Luis Ceze, CEO of OctoML. “And not every use case calls for an inefficient, costly mega-model. There are many instances where a smaller, fine-tuned model can get the job done with less overhead. OctoAI Text Gen gives app builders the flexibility to mix their own model cocktail using OSS models, or run their own model variant if that’s the best fit.”

With the OctoAI Text Gen Solution, developers can now easily run inferences against multiple OSS model families, sizes, and variants—all against one scalable production-grade API endpoint. This allows for easy swapping of models with minimal code changes, a seamless approach that has resonated with early adopters given today’s focus on evaluating and bringing together multiple OSS models. In addition, OctoAI’s enterprise tier allows customers to work with the team for contractual latency SLAs, and for private network connectivity to their environments.

Benefits and features:

  • Unparalleled Speed and Cost Efficiency: Early results show speeds up to 169 tokens per second on the popular Code Llama 34B model, with no quantization and before applying optimizations like batching—all at best per token prices available today
  • Broadest optionality with OSS LLM Models: The most comprehensive set of production-ready LLM models including your choice of Llama 2, Code Llama, and Mistral variants—all delivered on one unified API endpoint
  • Robust Delivery and Proven Scalability: More than a billion customer inferences served, with individual customers running greater than one million per day, and reliably handling 10X usage surges with proven performance.
  • Flexible “model-cocktail” approach to multi-modal needs: Text Gen solution complements OctoAI’s recently launched Image Gen Solution and all the models available in the OctoAI compute service, empowering customers to easily build multi-modal application using their preferred mix of OSS models, as demonstrated in the OctoStudio demo application walkthrough.

“The LLM landscape is changing almost every day, and we need the flexibility to quickly select and test the latest options,” said Matt Shumer, CEO of Hyperwrite. “OctoAI made it easy for us to evaluate a number of fine-tuned model variants for our needs, identify the best one, and move it to production for our application.”

OctoAI Text Gen customers can also bring their own fine-tuned Llama 2 variant or checkpoint and run it at low-latency at massive scale. This BYO model capability allows for a high degree of customization to align with specific requirements of customer projects.

About OctoML

OctoML is on a mission to make AI more accessible and sustainable so it can be used to improve lives. Our platform, OctoAI, delivers generative AI infrastructure for app builders to run, tune, and scale the models that power AI applications. With the fastest foundation models on the market, including Llama 2, WhisperX, and SDXL, end-to-end solutions, and world-class ML systems under the hood, developers can focus on building apps that wow their customers without becoming AI infrastructure experts.


Source: OctoML

Datanami