April 25, 2022

SalesForce Taps LLM for Programming Boost with CodeGen

Alex Woodie

via Shutterstock

Large language models (LLMs) like GPT-3 are capturing the imaginations of data scientists around the world, thanks to their advanced capability to understand and generate text. Now researchers at Salesforce have leveraged an LLM to build CodeGen, which can understand source code and even generate its own code in different programming languages.

To get the low-down on the CodeGen, we turned to Salesforce Chief Scientist Silvio Savarese, who was kind enough to answer a few questions about the research project, how it was developed, and its possible role with Salesforce in the future.

The original inspiration for CodeGen came more than a year ago, when Savarese’s team “envisioned a conversational system that is capable of having a discussion with the user to solve a coding problem or task,” Savarese said via an email Q&A with Datanami.

“With CodeGen, this discussion takes the form of an English text-based discourse between a human and a machine,” he continued. “The result is AI-generated code that solves the discussed problem.”

Just as language models have demonstrated a capability to understand William Shakespeare’s writings and even to generate prose that closely resembles the Bard’s, CodeGen has the capability to understand the various textual components of a programming language and to generate code that matches the syntax, rules, and constraints of that language.

“CodeGen establishes a bridge between natural language and programming language,” Savarese said. “CodeGen can help democratize programming much like low-code tools do by lowering the barrier to entry for non-developers.”

CodeGen’s model has a GPT-style architecture, and was trained from scratch on Google’s TPU-v4, Savarese said. At this point, CodeGen remains mostly a research project, according to Savarese, although it is being tested with a small group of users.

Up to this point, the demonstrations of CodeGen have focused primarily on interactive data science scenarios, such as working with a Jupyter notebooks. It’s also been used with context-sensitive code-completion within common development environments, he said.

“Since CodeGen is a flexible, foundational model, it can be applied broadly,” Savarese said. “For example, it helps us better understand existing code. It helps detect bugs in code that humans have written, estimate risks, and even summarize a code’s functionality to help new developers understand it. CodeGen even translates code across programming languages–another benefit when dealing with legacy code that may still have value but is difficult to maintain.”

CodeGen excels at frequently used programming patterns, Savarese said, such as “known efficient implementations of algorithms, file operations, data manipulation, custom analytics tools on top of platforms like Tableau, Web development and design, or the construction of larger programs composed of many ‘simpler’ steps,” he continued. “Programs for which entirely new algorithms to solve a problem are needed or code written in less common programming languages may be less approachable.”

The model could enable users with no experience to develop simple programs, Savarese said, while more complex programs would still require some development experience. It could accelerate the progression of IT users with some experience, such as administrators, who want to become full-blown deelopers, while it could be a time-saver for expert coders who want to eliminate redundant or repetitive tasks, he said.

CodeGen is available on GitHub.

10 NLP Predictions for 2022

‘Deep-Speare’ Emulates The Bard with AI

Applications: Artificial Intelligence

Technologies: Middleware

Sectors: Academia

Vendors: Salesforce

Tags: CodeGen, GPT, large language model, LLM, NLP, Silvio Savarese, TPU

SalesForce Taps LLM for Programming Boost with CodeGen

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

July 2, 2025

July 1, 2025

June 30, 2025

June 27, 2025

June 26, 2025

Sponsored Partner Content

AI That Knows Your Business: Meet Cube D3

Mainframe data: A powerful source for AI insights

CData recognized in the 2024 Gartner ® Magic Quadrant™ Report

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Transforming Healthcare with Data

IDC Spotlight: Boosting AI Impact with Data Products

Sponsored Multimedia

The Power of DataOps: Bring Automation to Life
No Comments

Tactical Steps for Cloud Migration
No Comments

Immuta Data Access Platform
No Comments

Data Mesh: Fact or Fiction?
No Comments

Contributors

SalesForce Taps LLM for Programming Boost with CodeGen

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

July 2, 2025

July 1, 2025

June 30, 2025

June 27, 2025

June 26, 2025

Sponsored Partner Content

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Sponsored Multimedia

Contributors

Share

Copy short link