July 21, 2022

BLOOM Large Language Model Breaks Down the Walled Garden

Oliver Peckham

For the past couple of years, everyone from AI experts to the general public has been entranced by the often astonishing output of large language models (LLMs) like GPT-3 and DALL•E 2. These models, using narrative inputs, are able to produce everything from convincing artificial images to stories and poetry. However, the models have also largely been produced by large companies like Google (PaLM) or OpenAI (GPT-3), which routinely restrict access to their full models for a variety of business and ethical reasons. Now, the BigScience research workshop—a group of over 1,000 volunteers—is looking to change that status quo with a new LLM: BLOOM.

BigScience was born in 2021 out of discussions between researchers from Huggingface, Inc. (a Brooklyn startup focused on democratizing AI, and creator of the popular Craiyon—née DALL•E Mini—image generation tool) and representatives from GENCI and IDRIS, two French supercomputing organizations. Eventually, BigScience secured a grant for five million CPU-hours on the Jean Zay supercomputer, which weighed in around 14 aggregate peak petaflops ahead of a planned upgrade this year.

The goal: to democratize AI through the introduction of “the world’s largest open multilingual language model.” That became “BigScience Large Open-science Open-access Multilingual language model,” or “BLOOM.”

“Large language models (LLMs) have made a significant impact on AI research,” BigScience’s announcement reads. “These powerful, general models can take on a wide variety of new language tasks from a user’s instructions. However, academia, nonprofits and smaller companies’ research labs find it difficult to create, study, or even use LLMs as only a few industrial labs with the necessary resources and exclusive rights can fully access them.”

BLOOM, they said, was being released “to change this status quo” and was “the result of the largest collaboration of AI researchers ever involved in a single research project.”

And, as a result, BLOOM is no slouch—even compared to the big guns. The LLM is able to generate text in 46 human languages and 13 programming languages, and it contains 176 billion parameters: not quite the 540 billion found in Google’s PaLM model, but just ahead of the 175 billion parameters in GPT-3. On top of that, BLOOM is the first LLM with over 100 billion parameters for “almost all” of the languages it supports, including major languages like Arabic, French and Spanish.

Example BLOOM inputs and outputs. Image courtesy of BigScience.

Accomplishing this was no small feat, with BLOOM trained using that five million CPU-hour grant over a 117-day period.

BLOOM is now available for researchers to download, run and study under the terms of BigScience’s Responsible AI License (RAIL). Ethics were a major concern for the group, and have generally been a point of major worry for corporations and the public alike, given the often convincing results produced by LLMs—which lend themselves to unsavory applications like the production of realistic fraudulent media or text. During development of BLOOM, BigScience developed data governance structures for LLMs and the RAIL itself. The RAIL prohibits unlawful and otherwise harmful uses, and more specifically prohibits use of BLOOM for controversial applications such as “fully automated decision making that adversely impacts an individual’s legal rights or otherwise creates or modifies a binding, enforceable obligation” and “medical advice and medical results interpretation”.

BigScience intends to update BLOOM, as well. “This is only the beginning,” they wrote. “BLOOM’s capabilities will continue to improve as the workshop continues to experiment and tinker with the model.” Items on the agenda include easier instructability and compression. “BLOOM is the seed of a living family of models that we intend to grow, not just a one-and-done model, and we’re ready to support community efforts to expand it.”

While these efforts are unlikely to eclipse those of Google, Meta or OpenAI any time soon, one thing is for sure: the walls of the LLM garden are slowly but surely coming down. Only time will tell whether the benefits of open research on LLMs outweigh the costs of misuse.

Google’s Massive New Language Model Can Explain Jokes

Mantium Lowers the Barrier to Using Large Language Models

Applications: Artificial Intelligence, Research Analytics

Technologies: Middleware

Vendors: BigScience

Tags: BigScience, BLOOM, large language model, LLM

Only registered users may comment. Register using the form below.

Check off newsletters you would like to receive*
- HPCwire
- EnterpriseTech
- Datanami
- Technology Conferences & Events
- Advanced Computing Job Bank
- Technology Product Showcase
Email*
Name*
First Last
Organization*
Job Function*
Industry*
Country*
City*
State*
Province*
- Please check here to receive valuable email offers from Datanami on behalf of our select partners.

BLOOM Large Language Model Breaks Down the Walled Garden

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

April 25, 2024

April 24, 2024

April 23, 2024

Sponsored Partner Content

Get your Data AI Ready – Celebrate One Year of Deep Dish Data Virtual Series!

Supercharge Your Data Lake with Spark 3.3

Learn How to Build a Custom Chatbot Using a RAG Workflow in Minutes [Hands-on Demo]

Overcome ETL Bottlenecks with Metadata-driven Integration for the AI Era [Free Guide]

Gartner® Hype Cycle™ for Analytics and Business Intelligence 2023

The Art of Mastering Data Quality for AI and Analytics

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Top 6 Strategies for Reducing Data Warehouse Costs

Building an Operational Data Warehouse for Real-time Analytics

Sponsored Multimedia

The Power of DataOps: Bring Automation to Life
No Comments

Tactical Steps for Cloud Migration
No Comments

Immuta Data Access Platform
No Comments

Data Mesh: Fact or Fiction?
No Comments

Contributors

Featured Events

AI & Big Data Expo North America 2024

CDAO Canada Public Sector 2024

AI Hardware & Edge AI Summit Europe

AI Hardware & Edge AI Summit 2024

CDAO Government 2024

BLOOM Large Language Model Breaks Down the Walled Garden

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

April 25, 2024

April 24, 2024

April 23, 2024

Most Read Features

Most Read News In Brief

Most Read This Just In

Sponsored Partner Content

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Sponsored Multimedia

Contributors

Featured Events

Share

Copy short link