Are We Nearing the End of ML Modeling?
Josh Tobin, the co-founder and CEO of machine learning tool provider Gantry, didn’t want to believe it at first. But Tobin, who previously worked as a research scientist at OpenAI, eventually came to the conclusion that it was true: The end of traditional ML modeling is upon us.
The idea that you didn’t need to train a machine learning model anymore and can get better results by just using off-the-shelf models without any tuning on your own custom data seemed wrong to Tobin, who spent years learning how to build these systems. When he first heard of the idea after starting his ML tool business Gantry, which he co-founded in 2021 with fellow OpenAI alum Vicky Cheung, he didn’t want to believe it.
“The first four or five times I heard that, my thinking was like, okay, these companies just don’t know what they’re doing,” Tobin said. “Because obviously I’m trained more classically in machine learning, and so my worldview is training models is really important.”
But the more he heard the refrain–particularly when it involved using large language models (LLMs) to build predictive natural language processing (NLP) systems, but not exclusively–the more he came to the conclusion that it was true.
“Once I heard it enough times, I realized that there’s a shred of truth in this,” he said, “There’s a new stack that’s being built around how do you just squeeze the most out of GPT-3 or a model like that without actually needing to fine-tune it on your data at all.”
We’re in the midst of a revolution in how ML models get built. The pace of change is greatest in the LLMs, such as OpenAI’s ChatGPT, Google’s BERT, and GitHub’s Co-Pilot, which are already trained on enormous amounts of general purpose data culled from the Internet and are gaining traction for their ability to generate useful text based on some other text input, or prompt.
“It’s pretty profound when you think about it,” he told Datanami recently “It’s now orders of magnitude more complicated to fine-tune a model, even on an existing data set. It’s so much cheaper and easier if you don’t have to fine-tune models.”
That’s not to say that developers aren’t tweaking things with LLMs. But instead of lopping off the top of a transformer model and retraining it with your own data–which has been the accepted pattern for data scientists and machine learning engineers to be productive with deep learning since the ResNet model was trained on the ImageNet corpus of image data around 2015–users have different ways of getting the results they want.
Tobin explained: “You’re still injecting context-dependent data into it. You’re just not doing it by training. You’re doing it by prompting, effectively. And I think it’s a much faster and easier way of injecting your domain-specific data into models.”
By injecting data into the model, or prompting it, the developer is telling the model what he wants it to do. Users who have interacted with ChatGPT will recognize the prompt as the question you ask it, which gives the model the context necessary to generate the response. There are similar approaches for other models that utilize an API rather than a text prompt in a UI. According to Tobin, the API is critical for making this work in other data products.
“The pattern for how people are building these systems is they’re not actually training models on that data,” he said. “What they’re doing is they’re basically creating a corpus of embeddings for each of those documents and they’re searching over those embeddings. So they’re saying ‘Hey, when a user asks a question, let me find the document that seem the most relevant.’”
It’s similar to a search function. But instead of explicitly training the model on a certain piece of data, the modern LLM developer will select a handful of relevant pieces of data and dump it in the prompt of the model, via the API.
“The relevant information is being injected by heuristics or some similarity search that you’re using to say, of this corpus of knowledge, here are the things that are probably most helpful to solve the task that the user wants you to solve,” Tobin said.
There are several reasons why this approach works. For starters, the LLMs have already been pre-trained on an enormous corpus of data–essentially the entire Internet–and that allows it to spot patterns. The models, especially ChatGPT, have also shown themselves able to generalize to a wide degree. They’re also good at few-shot learning, he said.
“When you’re doing this context engineering process, what you’re doing is you’re giving the models example or relevant context that it can use to answer the questions,” he said. “But that relevant context maybe nothing that it’s ever seen before. That could be specific information for your company that, in an ordinary way of building machine learning models, you’d have to fine-tune the model on.
“But since these LLM have learned these general-purpose patterns about how to take a document and pull out relevant information, or how to spot a pattern in how people are asking it to answer a question and repeat the pattern, those general patterns are learned by the models,” he continued. “And then what you’re doing is providing a specific example of the pattern you want it to follow up.”
While the change in development technique is most profound with the LLMs, it’s not restricted to them, Tobin said.
“I think there’s a similar phenomenon that’s happening outside of it, but it’s most acute with transformer-based models,” he said. “I think the imporatance of modeling has greatly decreased in pretty much every branch of machine learning.”
When Tobin started working in machine learning back in 2015, it took a a lot of very specialized work. “You’re banging your head against CUDA drivers and trying to install Caffe and things like that,” he said. “Over the years it’s gotten to where…you just pull a model off the shelf and you call out an API. Just throw some data in there and fine-tune it. It basically works off the shelf.”
Now, with the LLMs, there’s no need to fine-tune the model at all.
There are several consequences of this new approach. For starters, users no longer need the deep level of technical skill to develop a working system. Since you’re not training a model in the classic sense, you don’t need as much data, and you don’t need to wrangle and clean all the data (although data labeling is still important in some contexts, Tobin said). There’s also no need to build and manage different versions of models, eliminating a lot more hassle.
“You still have to put data in so the model knows what problem you’re trying to solve, but the way that you put it in is even easier and even cheaper and requires even less machine learning skills than fine tuning a model that you got from a library,” Tobin said. “So the implication is it’s going to be a skill that’s accessible to a much wider range of companies and much wider range of developers within those companies, because they need to know a lot less of the specialized machine learning skillsets to do it.”
Instead of a full-blown data scientist with deep knowledge of a specific knowledge and mad technical skills, like the ability to code in Assembler or C, the new generation of prompt engineers won’t need nearly as much formal training. They’ll need to learn a few tricks to make the systems fast, but the focus on prompt engineering rather than training ML models with massive data sets will have big implications, Tobin said.
“Some people think that prompt engineering is the next hot job. I think there’s maybe two people with the title of prompt engineer in existence,” he said “There’s always going to be a need for people who can do the lower-level stuff…But for most things that you’re building most of the time, just understanding the higher level framework is enough to be really productive.”
Prompt-engineering will take some getting used to. It’s a decidedly different approach to getting the machine learning models to do what you want them to do. But it’s one that is bound to catch on as LLMs like Co-Pilot, LaMDA, and ChatGPT are adopted and begin having a wider influence on software design.
“It’s been really fun to watch, especially over the past year or so, and kind of humbling, as a former ML researcher who’s developed a lot time to training large models myself,” he said. “It’s amazing. Many of us spent a lot of years in school learning how to train machine learning models, and it’s not really how the next generation of models are going to get built.”
Hallucinations, Plagiarism, and ChatGPT
Large Language Models in 2023: Worth the Hype?
Experts Disagree on the Utility of Large Language Model