February 15, 2019

The AI That Writes Fake News

Alex Woodie

Following the 2016 presidential election, real journalists worked to expose the fake news that human charlatans wrote and spread across the Internet for the LOLS and for profit. But the bigger threat on the fake news front could be the made-up words generated by AI, including one recently created by OpenAI.

OpenAI yesterday published a blog post that describes GPT-2, a natural language processing (NLP) model capable of generating text “of unprecedented quality.” The unsupervised model, which was trained on 8 million Web pages (about 40GB), was designed to do one thing: predict the next word, given all of the previous words within a given piece of text.

In a paper that accompanied the blog post, OpenAI researchers complained how brittle most supervised machine learning language models are when the real-world data deviates slightly from the training set.

“Current systems are better characterized as narrow experts rather than competent generalists,” the researchers write. “We would like to move towards more general systems which can perform many tasks – eventually without the need to manually create and label a training dataset for each one.”

The researchers designed GPT-2 to “connect the dots” between traditional supervised learning approaches that are brittle and newer unsupervised “multi-task” methods that are still nascent. Specifically, it’s based on a type of neural network called transformer model, which employs a “self-attention mechanism” that directly models relationships between all words in a sentence.

Transformer models, including the first GPT model that Google Brain researcher Ashish Vaswani and his colleagues described in July 2017, are simpler than recurrent neural networks (RNNs) and convolutional neural networks (CNNs), are more easily adapted to parallel computation, and required significantly less time and compute resources to train.

Apparently, GPT-2 it’s really, really good at generating human-like text in a variety of areas without being specifically trained to do so. The researchers say that, even with its generalized learning approach, GPT-2 delivered top scores on seven out of eight benchmarks in a “zero-shot” manner.

But in addition to writing a “really competent, really well-reasoned essay,” as OpenAI Vice President of Engineering David Luan told The Verge, GPT-2 can also write words that are libelous, slanderous, and completely untrue.

When fed the words “Russia has declared war on the United States after Donald Trump accidentally fired a missile in the air,” GPT-2 went on to generate a larger story about this supposed incident, including details of Russia’s response and background of Russia-US relations.

That’s one of the reasons why OpenAI decided not to release GPT-2 – at least not the fully trained model. The group stated:

“Due to concerns about large language models being used to generate deceptive, biased, or abusive language at scale, we are only releasing a much smaller version of GPT-2 along with sampling code,” OpenAI stated in its blog. “We are not releasing the dataset, training code, or GPT-2 model weights.”

OpenAI was founded four years ago as a non-profit organization by technologist Elon Musk and others to conduct research towards “safe artificial general intelligence.” The San Francisco, California group’s founders were awed by the positive and negative aspects of AI.

“It’s hard to fathom how much human-level AI could benefit society,” the founders wrote in 2015, “and it’s equally hard to imagine how much it could damage society if built or used incorrectly.”

OpenAI is modeling its slow release of GPT-2 on the “responsible publication” standards that have been exercised by biotechnology and cybersecurity firms, where the potential for abuse or misuse is weighed against the potential good a technology can do. The company says it hopes its “experiment will serve as a case study for more nuanced discussions of model and code release decisions in the AI community.”

Potential benefits of a powerful generalized language model cited by Open AI includes better chatbots, the advent of AI writing assistants, improved language translation, and more accurate speech recognition systems. Potential negatives include creation of fake news, impersonation of people online, generation of abusive or fake social media content, and automated spam and pfishing content.

Balancing the plusses and the minuses is not an easy task. By releasing some of the code behind GPT-2, OpenAI hopes it can help accelerate the movement toward positive aspects while minimizing the negatives.

“It’s very clear that if this technology matures—and I’d give it one or two years—it could be used for disinformation or propaganda,” Jack Clark, the policy director at OpenAI, tells MIT Technology Review. “We’re trying to get ahead of this.”

Deep Learning Reveals New Insights About People

Applications: Research Analytics

Technologies: Middleware

Sectors: Academia

Vendors: OpenAI

Tags: chat bots, fake news, machine learning, natural language processing, NLP, OpenAI

Only registered users may comment. Register using the form below.

Check off newsletters you would like to receive*
- HPCwire
- EnterpriseTech
- Datanami
- Technology Conferences & Events
- Advanced Computing Job Bank
- Technology Product Showcase
Email*
Name*
First Last
Organization*
Job Function*
Industry*
Country*
City*
State*
Province*
- Please check here to receive valuable email offers from Datanami on behalf of our select partners.

The AI That Writes Fake News

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

April 23, 2024

April 22, 2024

April 19, 2024

April 18, 2024

Sponsored Partner Content

Get your Data AI Ready – Celebrate One Year of Deep Dish Data Virtual Series!

Supercharge Your Data Lake with Spark 3.3

Learn How to Build a Custom Chatbot Using a RAG Workflow in Minutes [Hands-on Demo]

Overcome ETL Bottlenecks with Metadata-driven Integration for the AI Era [Free Guide]

Gartner® Hype Cycle™ for Analytics and Business Intelligence 2023

The Art of Mastering Data Quality for AI and Analytics

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Top 6 Strategies for Reducing Data Warehouse Costs

Building an Operational Data Warehouse for Real-time Analytics

Sponsored Multimedia

The Power of DataOps: Bring Automation to Life
No Comments

Tactical Steps for Cloud Migration
No Comments

Immuta Data Access Platform
No Comments

Data Mesh: Fact or Fiction?
No Comments

Contributors

Featured Events

Call & Contact Center Expo

AI & Big Data Expo North America 2024

AI Hardware & Edge AI Summit Europe

AI Hardware & Edge AI Summit 2024

CDAO Government 2024

The AI That Writes Fake News

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

April 23, 2024

April 22, 2024

April 19, 2024

April 18, 2024

Most Read Features

Most Read News In Brief

Most Read This Just In

Sponsored Partner Content

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Sponsored Multimedia

Contributors

Featured Events

Share

Copy short link