Equall Introduces Expanded Saul Family of Legal LLMs with 54b and 141b Models
Aug. 5, 2024 — Earlier this year, Equall released SaulLM, a state-of-the-art 7 billion (7b) parameter large language model (LLM) adapted uniquely for the legal domain — and the first open language model for law. Since then, Equall has continued to scale and develop Saul.
Equall is excited to introduce the Saul family of legal language models, now available in 54 billion (54b) and 141 billion (141b) sizes. Achieving state-of-the-art performance on legal reasoning benchmarks, Saul shows substantial performance gains over similarly sized models, and outperforms much larger commercial models, including GPT-4, across an array of legal tasks.
Equall is committed to making legal intelligence broadly accessible. The company is openly releasing the full family of Saul models — and sharing research and training methodologies with the community — in an aim to spur further innovation, research, and value creation in the legal industry.
Introducing the Saul Family
Saul is a family of language models specialized for legal reasoning. This family now includes three different sizes (7b, 54b, and 141b), each trained using a general-purpose Mistral LLM architecture. To maximize performance, Equall undertook an adaptation process that involved continued pretraining, instructing fine-tuning (IFT), and preference alignment using domain-specific optimization (DSO). You can read the latest technical report to learn more about Equall’s data composition and adaptation methodologies.
A few key details include:
- In February of this year, Equall released Saul-7b, the first free and publicly available LLM for law. As part of Equall’s domain-adaptation process, Saul-7b was trained using a very large corpus (30 billion tokens, or roughly 15 million pages) of legal data.
- Equall’s training process resulted in an average 6% performance gain over Mistral-7b across a range of legal tasks as measured by LegalBench, a leading benchmark used to evaluate legal reasoning in LLMs.
- Leveraging insights from initial training, Equall then trained Saul-54b and Saul-141b from an expanded and further curated pretraining corpus of legal text.
- Each model was trained using AMD GPUs at ADASTRA, a French supercomputer renowned for its energy efficiency.
Equall is releasing Saul-54b and Saul-141b, joining the initial 7b model to form the Saul family of open models for law.
What’s Next?
The entire Saul family of models is now available on Equall’s HuggingFace page. In the spirit of fostering innovation and increasing access to legal resources, Equall is releasing the Saul models open for free and responsible use under the permissive MIT license. With the continuing development and scaling of the models, Equall hopes that the SaulLM project meaningfully contributes to the ongoing research and collaboration across the legal and NLP communities.
For questions about Equall’s model training research or work in building intelligent workflow systems, please contact the team at [email protected].
About Equall
Equall builds specialized AI systems that can take live inputs and large amounts of data and, under lawyer supervision, solve intricate legal problems from beginning to end. The company’s goal is to make legal risk simple by producing actionable outputs that decision-makers across organizations can leverage in real time.
Source: Pierre Colombo, Equall