AI Experts Discuss Implications of GPT-3
Last July, GPT-3 took the internet by storm. The massive 175 billion-parameter autoregressive language model, developed by OpenAI, showed a startling ability to translate languages, answer questions, and – perhaps most eerily – generate its own coherent passages, poems, and songs when given examples to process. As it turns out, experts were captivated by these abilities, too: captivated enough, in fact, that researchers from OpenAI and a number of universities met several months ago to discuss the technical and sociopolitical implications of the platform.
The summit, helmed by OpenAI in partnership with Stanford’s Institute for Human-Centered Artificial Intelligence, convened in October. Apart from those two institutions, the remainder of the participants are currently unknown by the public, as the meeting was held under the Chatham House Rule, whereby a meeting’s information is public but its participants are secret.
On the table: two key questions on the future of large language models like GPT-3. First: what are the technical capabilities and limitations of those models? Second: what are the societal effects of widespread use of those models?
The summary of the summit was written by Alex Tamkin, Miles Brundage, Jack Clark, and Deep Ganguli, who characterized the conversation as “collegial and productive,” but added that “there was a sense of urgency to make progress sooner than later in answering these questions.”
So, what’d they talk about?
Exponential expansion. GPT-3 is larger than its predecessor, GPT-2, by more than two orders of magnitude (175 billion parameters versus 1.5 billion parameters). One participant said that the growth in model capabilities feel as reliable as the laws of physics or thermodynamics, and several said they expected these rapid trends to continue.
Necessary collaboration. In one case, a participant compared large language models to particle accelerator experiments, citing how both tools require experts from a wide range of backgrounds to construct and operate.
An unclear bar for “understanding.” The classic AI question: what is intelligence, and how intelligent is “intelligent?” The participants were divided on how to assess GPT-3’s intelligence: some argued it should be defined by its ability to respond to requests in the real world; some said it needed to overcome hurdles intended to confuse AI; some countered that causality, rather than correlation, was necessary; and yet others argued against a binary understanding of intelligence altogether, the importance of intelligence as a concept, and even our ability to understand GPT-3. “GPT-3 is completely alien,” one participant said. “It’s the first thing I’ve seen where it’s not a dumb thing to ask whether it’s [artificial general intelligence].”
A bright future for multimodal models. Participants generally agreed that multimodal language models – which also train on data from images, audio, and so on – will become more prevalent and enable more functionality. Some participants even suggested that GPT-3 is already multimodal, since the forms of data it processes are so diverse.
A need to align with human values. There was discussion over how large language models could better incorporate human values into their decision making: humans, some explained, care more about errors that have value implications than they do about grammatical errors.
Access concerns… Currently, OpenAI restricts access to GPT-3’s API. Some participants questioned this: who receives access, and why? How can the model be effectively evaluated for biases if it’s not available at scale?
… and concerns over responsible deployment. GPT-3’s language generation is, in many cases, incredibly convincing. Participants discussed how that ability could be abused, including through active disinformation or propaganda campaigns that appear to have been written by a human. Some disagreed, expressing confidence that modern society has encountered enough analogues in other fields (e.g. Photoshop) to be cautious about fake information.
Ideas for identifying real information. Some participants floated ideas for tackling AI-generated language: one floated laws requiring disclosure when AI is used to generate text, while another suggested increased investment in cryptography to authenticate human-generated media.
A need for OpenAI to set norms. Participants agreed that OpenAI’s advantage is unlikely to remain for more than a handful of months before others catch up and, as a result, stressed the need for OpenAI to set socially healthy norms around the use of large language models.
Worrying biases. Since GPT-3 pulls its generated language from observed language, it exhibits many of the demographic biases that pervade much modern language, media, and conversation. Participants noted the difficulty of addressing these biases in any kind of systemic way, since biases are often highly contextual. Many suggested that the most harmful biases, at least, should be proactively addressed, but others countered that OpenAI should not be the one to make those decisions. Participants suggested solutions ranging from better-labeled data and reinforcement learning to suites of bias tests and carefully targeted deployment.
Potential impacts on jobs and the economy. As GPT-3’s abilities begin to near the responsibilities of rote writing and moderation jobs, it is increasingly likely that AI models might begin to replace some of those jobs. Participants discussed how some of those jobs may be more or less desirable than others, raising a question of identifying which jobs should be off-limits for AI replacement.
This left the convention with many urgent questions and very few answers. Will the models continue to scale at their current rates? How should access to these models be granted? How can we understand, let alone mitigate, the biases they demonstrate? While the era of large language models is only just beginning, the hurdles they present are rapidly approaching.
To read more about the discussion, click here to read the paper.