Your Big Data Will Read To You Now
One of the biggest challenges in big data analytics is communicating insights back to users. Many organizations rely on dashboards and visualization tools to display the analytic results. Now a new crop of tools is using big data to generate written narratives that explain to users what’s happening in plain English.
Advances in artificial intelligence have led to a wave of new applications that can automatically analyze massive amounts of data and write stories about them, and do so at scale and at par with human ability. You may have heard about the robot that writes baseball game recaps that are superior to sports journalists or the machine that media firms use to generate earnings reports for publicly traded companies? Those are the types of systems we’re talking about.
Narrative Science developed the StatsMonkey program in 2011 to automatically generate baseball game recaps that were as good as the average sports journalist (and better than the lazy one who buried the lead about the perfect game). The software is based on technology that Narrative Science co-founder and chief scientist Kris Hammond helped develop while teaching at Northwestern University during the 2000s.
“The first version of [StatsMonkey] was so successful and so human in terms of this ability to express that we took pause and….realized it’s not about baseball or sports or media–it’s about explaining the world through the lens of data and the lens of analytics applied to that data,” Hammond tells Datanami. “We can explain things based on this incredibly rich mass of data that people are amazingly frustrated with because of the gap between their data and their ability to understand it.”
One of the early adopters of Narrative Science’s flagship product, called Quill, is Credit Suisse. The Swiss investment bank offered its customers a dashboard that would present a series of charts and graphs that summarized assessments and predictions of how publicly traded companies were performing.
The problem was, the charts and graphs were so complex that even the developers who created the backend analytics had no idea how to use them. So the bank brought in Narrative Science, and fed Quill the data that was previously going into the dashboards. Now, instead of requiring users to interpret charts and graphs, the banks users can push a button on the screen and read a Quill-generated story that boils down the data into the pertinent parts.
“They’re not looking at charts anymore. They’re just reading a story,” Hammond says. “Rather than force everybody to become data scientists, we are actually providing people insight into what’s happening in the world by having Quill do that work.”
Quill really has two main parts: the analytics portion that works against structured data, and the storytelling algorithms that output text. Hammond, a Ph.D, cut his teeth on artificial intelligence at the Yale labs in the 1980s, initially layered the storytelling component atop a search substrate, but shifted it to analytics following advances in the field.
The software doesn’t just spit out a collection of words based on the data used as input, but instead uses specific sets of data to answer questions about the state of the world for its user. It’s a combination of ETL and analytics technologies along with a dose of natural language generation, all running on the Amazon Web Services cloud (although one of Narrative Science’s 50 customers is getting it on-premise).
“In order for Quill to articulate, it needs to understand what’s happening. It needs to know what to say,” Hammond says. “We do a lot of work in terms of the analytics with regard to building up profile based on histograms, time series analytics, comparative analytics, clustering and aggregation, ordering and ranking. All that analysis is aimed at getting information out of the data that is required to tell the story.”
The stories generated by Quill are as good as those written by a firm’s top analyst or author, the company claims. That claim would seem to have some truth to it, considering that Forbes has adopted Quill for writing basic earnings stories, such as earnings previews. If you search Google News for “Narrative Science” (with the quotes), you’ll see that the software is generating dozens of stories for Forbes.com every day.
Another storied media firm, the Associated Press, adopted similar technology from a company called Automated Insights this June. “For many years, we have been spending a lot of time crunching numbers and rewriting information from companies to publish approximately 300 earnings reports each quarter,” the AP’s vice president and managing editor Lou Ferrara wrote on the AP blog. “We discovered that automation technology…would allow us to automate short stories–150 to 300 words–about the earnings of companies in roughly the same time that it took
our reporters. And instead of providing 300 stories manually, we can provide up to 4,400 automatically for companies throughout the United States each quarter.”
The new robot-writer didn’t lead AP to axe any of its human editors or reporters, according to Ferrara. In fact, the robot is helping the remaining editors and reporters to be more productive and take on higher-level work. “This is about using technology to free journalists to do more journalism and less data processing,” he wrote.
That sentiment is echoed by Hammond, who is on extended leave from his job as a journalism and comp-sci professor at Northwestern so he can develop Quill and Narrative Science. “The clock is not ticking on your job,” he says. “We’re working side by side and helping people who are already in the trenches with regard to data. And even though [data analysis] is a skill they have, it’s a skill nobody wants to spend four hours a day doing. We free them so they can do the higher end work, so they can work at the top of their game.”