Too many big data initiatives are science projects that take months of effort, risk failure and require highly trained data scientists with scarce skills. According to a CSC survey, 55 percent of big data projects aren’t completed and many others fall short of their objectives.Read more...
MIT Spinoff Mines Text With Analytics Engine
A data analytics spinoff from the Massachusetts Institute of Technology’s Media Lab has developed a new text analytics engine it claims could give computers the ability to understand humans the way people understand each other.
The startup, Luminoso, says it is attempting to give computing devices “a structural foundation of common-sense reasoning that sits at the forward edge of text analytics.” Luminoso’s founders spent years at MIT’s Media Lab building a cloud-based, multi-lingual tool for machine learning and, ultimately, “understanding text.”
One goal is to extract concepts that “dart and lurk” within the text. The text engine would allow big data techniques to spot nuances in meanings to produce actionable business intelligence.
So far, companies like General Mills, Intel, and Sony have reportedly signed up to use Luminoso’s text engine software.
The startup argues that text analytics systems that traditionally use ontologies are fundamentally limited by their own pre-existing categories. The company claims its engine overcomes this limitation through “embedded understanding” that allows the text engine to make connections and figure out the intended meaning of words and sentences.
The text engine tackles unstructured data with an analysis that begins without rules or columns of keywords. “You can concept mine a consumer survey, find a verbatim response that’s representative and generate the statistics to prove it,” the company claims.
Luminoso’s text engine products include a dashboard that is billed as allowing rapid extraction of business insights without manual updates or the “training” of data. In another example, the dashboard provides conceptual maps designed to help discover, say, themes and drivers of consumer interest.
The dashboard allows users to load data as a “.csv” file frequently used in spreadsheets before analyzing text.
The text engine then pulls out all the documents relevant to keywords that customers are interested in, not just the documents containing the keywords. That way, the company said, everything contextually relevant to a search is identified.
The other component of the text engine is Luminoso’s application programming interface (API) that allows the embedding of “text understanding” within existing tools and services such as classification, conceptual search and predictive modeling. The API service is available as a software-as-a-service offering.
In one user case, a response to growing volumes of unstructured data has been tagging, a technique used to organize and filter data. In-house human tagging is expensive and drains resources from other less-menial tasks. Outsourced human tagging is less expensive but not scalable.
Luminoso argues that auto-tagging not only speeds the process but also makes it easier to general labels and analyze data. Meanwhile, the startup’s auto-tagging approach that uses vector-based algorithms is multilingual.
Luminoso claims its text engine can understand “cultural connotations and idiomatic associations” in English, Spanish, French, German, Italian, Portuguese, Japanese and Mandarin.