Follow Datanami:
March 13, 2023

Multi-modal GPT-4 Rumored To Be Released This Week


Rumors are swirling around the anticipated release of OpenAI’s GPT-4 this week. German publication Heise reported that Microsoft Germany CTO Andreas Braun spilled the tea at an AI kickoff event last week.

“We will introduce GPT-4 next week, there we will have multi-modal models that will offer completely different possibilities–for example, videos,” Braun is quoted as saying by Heise. Microsoft is the leading investor in OpenAI after paying $10 billion for a 49% stake in the company and has integrated OpenAI’s technology into its Bing search engine.

Multi-modal AI systems can process data in the form of text, images, audio, and video. ChatGPT is the famed chatbot powered by large language models, specifically Generative Pre-trained Transformer models, and can only process text. ChatGPT was finetuned on GPT-3.5, a version that was quietly released in November.

With its new multi-modal capabilities, excited users are anticipating ChatGPT to understand and synthesize the meaning and context of multiple types of data at once. Text-to-video capabilities are possible, as well as other modes. For instance, a user might show ChatGPT a picture and ask it questions about how people or objects in the image are connected. Or they could perhaps ask it to write a story based on an image or video’s contents.

OpenAI began training GPT-4 in September. An image sweeping across Twitter claims that GPT-4 will have been trained on 100 trillion parameters versus GPT-3’s 175 billion parameters, which would make GPT-4 500 times more powerful. However, in an interview with StrictlyVC last month, OpenAI CEO Sam Altman called the image accompanying the 100 trillion parameters rumor “complete bullshit.”

This is an example of an image circulating around Twitter that claims GPT-4 was trained on 100 trillion parameters, which OpenAI CEO Sam Altman has said is “complete bullshit.” (Source: Twitter)

“The GPT-4 rumor mill is a ridiculous thing,” Altman said in the interview. “People are begging to be disappointed, and they will be.” When asked when GPT-4 will come out, he said it will be released when it is safe and responsible to do so.

Another rumor making Twitter rounds is that Microsoft will announce GPT-4’s release during a live stream event at 8:00 a.m. PT this Thursday, March 16. The event is called “Reinventing Productivity: The Future of Work with AI,” and will feature Microsoft CEO Satya Nadella and head of Microsoft 365 Jared Spataro as they discuss how “AI will usher in a new way of working for every person and organization.” Though it is hypothesized to be the GPT-4 launch event, there is nothing official from Microsoft to support this assertion.

During last week’s Microsoft event where Braun announced GPT-4’s impending arrival, Clemens Sieber, a senior AI specialist with Microsoft Germany, discussed which use cases will be possible with GPT-4. One example is speech-to-text, where call center agents could record customer calls without needing to manually type and summarize them. But he cautioned that hallucinations within the technology still abound and that verification and validation of facts will still be necessary. According to Heise, Microsoft is working on creating confidence metrics to address this issue. Sieber mentioned the company is building a feedback loop with a thumbs up and thumbs down feature, which he calls an “iterative process.”

The hype for GPT-4 has prompted quite a few hyperbolic memes, like this one referencing the movie Terminator. (Source: Twitter)

Though the GPT-4 hype is currently deafening, Altman’s promise of disappointment may come true. At their core, LLMs are probability machines that use complex algorithms to predict the order of text by searching for patterns, despite how much people wish these models were able to form thoughts on their own, a hallmark of general artificial intelligence.

Famed linguist Noam Chomsky wrote about this last week in the New York Times, saying, “However useful these programs may be in some narrow domains (they can be helpful in computer programming, for example, or in suggesting rhymes for light verse), we know from the science of linguistics and the philosophy of knowledge that they differ profoundly from how humans reason and use language. These differences place significant limitations on what these programs can do, encoding them with ineradicable defects,” he said.

Related Items:

ChatGPT Brings Ethical AI Questions to the Forefront

Salesforce Goes All-in on Generative AI with Einstein GPT, Integration with OpenAI

Has Microsoft’s New Bing ‘Chat Mode’ Already Gone Off the Rails?