Text Analytics Firm Gets Seed Money to Build ‘Cuddle’
Sensai emerged from stealth mode yesterday to begin selling a textual analytics solution that uses machine learning technology to find insights hidden in unstructured data such as news articles, call transcripts, legal and regulatory filings, and tweets.
Many companies are building the same type of text analytics solutions to glean insights from the huge amount of unstructured data they’ve already stockpiled or that flows across the Internet day and night, says Sensai CEO and co-founder Jonas Lamis.
“Every company has that same problem and they’re all in a battle now for talented people in order to solve it,” Lamis says. “About 80 percent of what they’re doing is the same thing. They’re all essentially re-inventing the wheel.”
So Lamis co-founded Sensai with former Google staff developer Monica Anderson and Michael P. Gusek, who worked with Anderson at Syntience, where Anderson is still the CEO. As well-respected researcher in the fields of artificial intelligence and machine learning, it was Anderson who came up with the genesis for Sensai’s linguistic-oriented data flow language, called Cuddle.
“Her observation was that she could write a new type of language, and it would abstract the heavy lifting away from the analyst so that they could harness the powers of all the algorithms and the in-memory technologies that are emerging to analyze the data they cared about, without upfront heavy development.” Lamis says. “It’s a non-trivial problem, as you know.”
What Anderson developed is called Content Discovery Language (CDL, or “Cuddle”), a linguistically oriented data flow language that allows users to find patterns hidden in data. CDL currently has about 40 operators, including clustering algorithms that user can bring to bear on the data, as well as data cleansing and visualization functions, with more operators on the way.
“We have built a data science platform to handle the world’s textual information, so analysts in large companies can quickly uncover insights that drive performance and growth and compliance without necessary the full cost, time, and technical background that are required today to build their own data science solution,” Lamis says.
With CDL doing the grunt work behind the scenes, the Senasai platform is designed to make it easier for a user to understand big and fast moving data sets. Some of the earliest users are on Wall Street, where Sensai ingests and analyzes SEC filings, ticker symbols, Twitter and news articles in search of alpha. WorldQuant, a hedge fund based in Connecticut that uses Sensai to analyze about 80 different sources, and Swiss bank UBS are early adopters of Sensai.
After programming and training Sensai, the system can alert the analyst when a new cluster emerges that they wouldn’t have known about until it reached a higher noise level in the general press, Lamis says.
“If I’m a stock trader or an analyst, and I have l list of companies that I track, [Sensai would help] me know when things are happening in the data sets that would be positive or negative momentum indicators,” he says. “”It has the effect of potentially signaling things that might be, rather than what already has been.”
Wall Street isn’t the only industry for Sensai. Another early adopter is the global manufacturing powerhouse Siemens, whose auditing department pointed Sensai at a library of more than 1.4 million documents sitting in Microsoft Sharepoint. Once Siemens learned the language and programmed the machine, Sensai was able to parse the data and pull the pertinent bits of information out of sentences, paragraphs, and lists.
It’s similar to how Palantir or IBM Watson work, but without the long implementation times and high licensing fees, says Lamis. “We want to provide faceted search and advanced algorithms,” he says. “We want to take those cool new technologies and apply them to internal data, things that give them a proprietary advantage. The incumbents–except for high end products like Palantir and IBM Watson–are only focused on public data stuff.
Sensai is just getting started. This week it announced that it received $900,000 in seed funding from Andreessen Horowitz, Formation8, Chris Kelly, ValueStream Labs and others, and it expects to complete a larger Series A round of venture funding later this year. That will give the company, which currently has only seven employees, the capability to grow and expand its product.
“One of our main directions is evaluating many different algorithms and releasing the most relevant ones to our platform,” Lamis says. “Nobody is demanding more advanced algorithms. But I think in 2016 and 2017, as [data science] teams get more advanced, we’ll pull in deep learning and automated learning technology. Those will appear in our language.”