Common Sense Makes Progress with Deep Learning
We’ve witnessed incredible progress in the capability of deep learning models to not only understand text, but to generate it too. While the generated text is grammatically sound, the actual meaning of the words leaves something to be desired. Now researchers at Salesforce are reporting the application of common sense techniques is driving a 10% improvement in the accuracy of neural network-based natural language processing (NLP) models. What’s more, this technique bolsters explainability, which is currently a thorn in the side of many AI practitioners.
Commonsense reasoning, as the field of study is called, has long been a branch of AI. For decades, researchers have sought ways to imbue software or robots with capabilities that humans just take for granted, like knowing what happens to an object that’s pushed off a table (it falls), or being able to infer how someone feels when her significant other hurls profanities in public (she’s embarrassed). Computers excel when programmed for specific tasks, but they’re generally poor at using contextual clues to find the right answer.
“Commonsense reasoning has been one of the Holy Grails of the fields of AI for a long time,” says Richard Socher, the chief scientist at Salesforce Research. “Despite a lot of other research progress in deep learning…we’ve not been able to really make good use of neural networks in commonsense reasoning before.”
That might be about to change. Socher and his colleagues at Salesforce Research have devised a novel method of merging new commonsense reasoning techniques with existing neural network-based NLP models. The researchers spoke with Datanami recently to talk about their work, which was published in a June 2019 paper published by the Association for Computational Linguistics (ACL) titled “Explain Yourself! Leveraging Language Models for Commonsense Reasoning.”
The Salesforce researchers – including Socher, Nazneen Fatema Rajani, Bryan McCann, and Caiming Xiong — deviated from the usual path trod in the field of commonsense reasoning. Instead of building a logic model that lists all the commonsense facts known in the world, they sought a way to allow a model to derive commonsense facts by inferring relationships indirectly from text that’s composed of questions and human-generated answers.
Here’s what they did:
First, the researchers obtained CommonsenseQA, a highly curated dataset that lists a series of commonsense question and the human-generated answers for them. Next, they generated human explanations for the answers in the CommonsenseQA dataset, which it dubbed the Common Sense Explanations (CoS-E). Then they used CoS-E to create Commonsense Auto-Generated Explanation (CAGE, which works with existing deep learning language models to automatically generate commonsense explanations.
The training actually consisted of two stages. First the researchers provide a CommonsenseQA example alongside the corresponding CoS-E explanation to a deep learning NLP model. The model “learns” about the question and answer choices from the example, and is trained to generate the CoS-E explanation.
Here’s an actual example from the CommonsenseQA training set:
Question: After getting drunk people couldn’t understand him, it was because of his what?
Choices: Lower standards, slurred speech, or falling down
CoS-E: People who are drunk have difficulty speaking
This first phase generated its own improvement. “When I trained a deep network on these explanations just by using the human commonsense explanations as training, we got significant improvements in performance, even though it did not have these explanations at that time,” Rajani says. “Although we still don’t understand why we caused that performance improvement, we can speculate that the network learned to reason about how the world works using this information in the explanation and use it as the ability to perform better at test time.”
The second phase of training involved CommonsenseQA training and validation data sets. “These CAGE explanations are provided to a second commonsense reasoning model by concatenating it to the end of the original question, answer choices, and output of the language model,” the authors write in the paper. “The two-phase CAGE framework obtains state-of-the-art results outperforming the best reported baseline by 10% and also produces explanations to justify
In essence, the CAGE framework was able to generate answers to questions on topics for which it had no previous training. The framework was able to infer the correct answers based on previous answers to multiple choice questions on other topics that were related, but not directly, to the topic in the question at hand.
As the researchers trained the CAGE-boosted NLP models, they would periodically prompt it by asking the question “My common sense tells me.” The progress was fascinating to watch, according McCann.
“At first, if you just take this language model at its initial stage, it will generate things that are somewhat reasonable grammatically but are quite far from human explanations. It doesn’t really know what we mean by asking it for common sense,” he says.
“What we found was, after a course of a couple hours of training, we are able to get it to be quite close to our COS-E datasets, the human explanation. This is really interesting, because we’ve actually taken a language model that apparently has just read a lot of stuff but doesn’t really know how to connect it or provide it in a useful way to us. And we’ve fine-tuned it to provide a sense of what we want for an explanation.
“It’s not just answering the question directly, in a lot of cases,” McCann continues. “It’s actually providing whatever information it knows about the things that appear in that context. So if it’s talking about lamps, it might mention lighting, or if it’s talking about cups, it might mention liquids that go in cups. Or for dark places, it will mention things around darkness. In that way these explanations are tapping into the latent information inside the language model built up after reading about all these words in different context, and pulling that out in the form of explanations that we can appreciate as humans.”
The technology is still in its nascent stages. However, there are several promising directions applications that this approach to commonsense reasoning could take. There are obvious uses for bolstering the explainability of AI for NLP use cases. But it could also potentially be used for computer vision, the researchers say.
“We’re trying to look into different modalities as well, such as visual commonsense reasoning, where there might be an image or a theme you have to reason over, and our language model would have to ground its common sense in that,” McCann says. “We’re hoping in the end that it makes it into … any setting where an algorithm is possibly dealing with someone who’s using language who’s a little more subtle, a little more ambiguous, and they don’t realize that they’re using common sense allusions, but the model will now be able to pick up on that information.”