Beyond Social: How Text Analytics Can Improve Business
There’s been a lot of time and attention dedicated to mining social media with big data techniques. With more than 400 million tweets a day, you’re almost certain to find a trending Twitter topic that’s relevant to you. But text analytics’ potential is much bigger than social media, and some of its applications may surprise you.
Humans can read and process only so much text, but computers can process enormous strings of words. Provalis Research‘s recently released WordStat for Stata product, for example, can analyze up to 20 million words per minute. Thanks to the power of predictive algorithms, text analytic engines can flesh out useful patterns and trends that are hidden within huge swaths of text.
Depending on the context, these patterns can give you important pieces of information about your customers, businesses, products, or prospects that you wouldn’t otherwise have. For example, when Provalis Research used its software to analyze tens of thousands of airline safety records of Airbus 320s, it was able to apply big data techniques, such as the use of clustering and regression algorithms, to identify how certain processes and procedures could be changed to improve safety.
Retailers are the biggest users of text analytics, accounting for one-third of the market, which could be worth $6.5 billion by 2020, according to a recent Allied Market Research study. But it’s touching just about every industry, including airlines, banks, military agencies, and insurance companies, all of which have used Provalis products to pull actionable pieces of information out of mounds of text. Another airline even uses Provalis to analyze maintenance records, according to Provalis CEO Normand Péladeau. They’re “trying to predict failure of specific parts based on maintenance reports before waiting for it to break in an airplane. You don’t want that,” he says.
Provalis’ tools have also been used in schools, where student feedback informs the administration about their feelings toward teachers and words like “boring, unfair, disorganized, and rude” may be indicators of a problem. What’s more, the machine learning algorithms in WordStat can actually be used to grade student’s papers, provided they can be trained on a large enough sample of human-graded work.
“Of course it becomes controversial when you have a computer that does that,” Péladeau tells Datanami. “I would not want my essay to be graded by a computer. But indeed, what we found is maybe I should have all my essays graded by computer simply because it’s more accurate. It’s less prone to be tired or to be drunk or somehow impaired by any other factor.”
Tapping Into External, Unstructured Data
There are rich supplies of text to analyze both inside and outside the enterprise. With tools such as Provalis’, it’s possible to analyze the entire New York Times articles per minute. Some customers in the financial services industry are tapping into Lexis Nexis archives to feed text into its predictive algorithms.
“Some people will analyze finical reports, to find how news or financial releases are worded, and try to predict future outcomes” from that, Péladeau says. “Or sometimes it will be used to simply differentiate companies based on how big they are on corporate responsibility or the environment. Those are the various ways that text analytics has come to be used. We have all of those in research.”
Another big source of external data is Avention, which recently changed its name from OneSource. The Concord, Massachusetts company makes a living by selling its customers highly tailored feeds of data about tens of thousands of businesses all over the world. It takes raw data from more than 70 partner data sources–such as Dun&Bradstreet, Thomson Reuters, Morningstar, and LexisNexis– correlates that against other sources such as SEC filings and 30,000 news and blog sites, and sells the packaged result to customers for use in targeted sales and marketing campaign.
“Avention has built a higher order, or 2nd order, set of information that’s derived from this core foundation, and we’ve done that by applying big data technique and technologies that find insight in all sorts of structured and unstructured content to derive calculation or scores for various characteristics of the company’s business,” says Avention’s vice president of products Ray Renteria. “Avention has introduced the concept of business signals to the business information community, where a business signal is a discrete measurement of one part of a company’s business: whether a company is expanding, whether it’s hiring, whether it’s an importer, exporter, whether it has a social presence on the Internet.”
Most of Avention’s customers mix this procured data feed with their own internal data to boost their business returns. So when a sales rep calls on his next prospect, he already knows that the prospect is a subsidiary of a much larger firm that recently sold a division or hired a new CMO, for example. “They will understand recent news and product launches,” Renteria says. “They’ll also have enough information to validate to navigate the account by understanding who their contact reports and also understanding the corporate hierarchy.”
Avention employs all sorts of big data technologies and techniques in its own data centers, including text analytics. Cassandra, ElasticSearch, and Spark all play a role in helping to cobble together the “Frankenstein record” from its various constituents and to keep the quality of the data stream high.
“We have a lot of analytics customers that don’t want to invest in an infrastructure that allows them to consume unstructured content–news, blogs, business publications and what not–but they’d like to be able to extract metadata from the corpus of the natural language that’s written about those organizations,” Renteria says. “We’ve opened up our business signal development framework that will allow customers to extract metadata from unstructured content without our customers having to stand up their own search engine and associate the unstructured types of data that we basically do for a living.”
Unfriending Social Media
Although Avention pulls social media profile information from Twitter and other sources, it doesn’t currently include social media mentions in its data products, but it is considering adding it later this year. “The Twitter data is valuable to be sure,” Renteria says. “But what we don’t want to do is simply pass through tweets where there are at mentions or something that sparked the timeline.”
Provalis, likewise, has played in the social media space, but the CEO, Péladeau, thinks it serves as a distraction for a better and broader application of text analytics. “A lot of companies are trying to jump on the text mining bandwagon, but they’re doing the same thing–sentiment analysis on social media or doing some topic on social media,” he says. “Although we do have people who have done that, it has never been our main focus.”
According to Péladeau, company’s own internal data repositories can provide a rich source of data for text analytic engines. “We’re probably focusing too much on social media. Many companies will benefit more if they analyze the data they already have,” he says. “For example, if they have a chat room, they could see what is being said there, listening to the voice of the customer, trying to see what they want, what they don’t want, what they don’t like.”
Ten years companies didn’t see the point of analyzing text, Péladeau syas. “But now they get it. We just have to tell them they can do more than sentiment analysis from social media,” he says. “Rather than trying to analyze text that’s from outside like Twitter Facebook etc, companies could probably benefit more if they were analyzing their customer feedback.”
October 20, 2021
- OctoML Announces Collaboration with Arm for ML Models
- VAST Data Introduces VASTOS Version 4
- DAS42 and AtScale Partner to Deliver Advanced Data Technology Solutions
- Iguazio MLOps Platform Now Supports Amazon FSx for NetApp ONTAP
- Credo AI Emerges from Stealth to Help Organizations Build Ethical AI
- Exxact Partners with SoftIron to Provide Ceph-Based Software Defined Storage Solutions
- TigerGraph, HPE, and Xilinx Announce Graph Analytics Solution for the Enterprise
October 19, 2021
- Quantum Announces Partnership with IBM for Next Generation of LTO Technology
- Scality Delivers Comprehensive Portfolio for Splunk SmartStore Deployments
- Splunk Announces Enhancements to its Enterprise Observability Portfolio
- Datatron Introduces New Features to MLOps and AI Governance Solution
- Snowflake Launches Media Data Cloud
- SolarWinds Introduces Database Mapper and Task Factory
- Tintri Expands VMstore Portfolio of NVMe-based Platforms
- Cockroach Labs Introduces CockroachDB Serverless
- AnalyticsIQ Marketing Data Now Available on AWS Data Exchange
- Query.AI Closes Oversubscribed $15 Million Series A Round
- Couchbase Introduces Couchbase Capella Hosted Database-as-a-Service on AWS
- SambaNova Introduces Enterprise Grade GPT AI-Powered Language Model
- Paradigm4 Launches flexFS for Geospatial Data in the Cloud
Most Read Features
- Google Cloud Gives Spanner a PostgreSQL Interface
- What Is Data Science? A Turing Award Winner Shares His View
- Big Data File Formats Demystified
- One on One with Google Cloud Product Director Irina Farooq
- We’re In the Moneyball 3.0 Era. Here’s What It Means for Live Sports
- Who’s Winning In the $17B AIOps and Observability Market
- What’s the Difference Between AI, ML, Deep Learning, and Active Learning?
- SambaNova Brings Custom Silicon To Bear on High-End AI Workloads
- Five Real-World Applications for Sports Analytics
- Composite AI: What Is It, and Why You Need It
- More Features…
Most Read News In Brief
- Data and AI Salaries Continue Upward March, O’Reilly Says
- Bigeye Observes $45 Million in Funding
- Data Prep Still Dominates Data Scientists’ Time, Survey Finds
- Why Is SAS Going Public?
- LinkedIn Open Sources Tech Behind 10,000-Node Hadoop Cluster
- Gartner Shuffles the Technology Deck with Latest ‘Hype Cycle’ Report
- Feature Stores Emerging as Must-Have Tech for Machine Learning
- Sisu Nabs $62M to Grow Data Analytics Biz
- The Next Breakthrough in Long-Term Data Storage is….Gold?
- Logistics Operators Look to Data, Technology for Advantage
- More News In Brief…
Most Read This Just In
- TIBCO NOW 2021 Showcases Limitless Power of Data
- Esri Releases ArcGIS GeoBIM, Bringing Spatial Context to AEC Operations
- Databricks Acquires Low-code/No-code Company to Expand its Lakehouse Platform
- NetApp to Acquire CloudCheckr and Expand its Spot by NetApp CloudOps Platform
- PrivaceraCloud 4.0 Enables Governed Data Sharing Across the Open Cloud
- BriefCam Introduces Video Analytics Enabled on Deep Learning Cameras from Axis Communications
- Transaction Processing Performance Council (TPC) Launches an Artificial Intelligence Benchmark (TPCx-AI)
- Sinequa Accelerates Time-to-Value with “Starter” Insight Apps
- Fluent Project Creators Announce Calyptia Cloud
- Indico Data Announces General Availability of Indico Unstructured Data Platform
- More This Just In…
Sponsored Partner Content
October 27 - October 28
November 29 - December 3
December 6 - December 10San Diego CA United States
February 7, 2022 - February 9, 2022Houston TX United States
June 26, 2022 - June 30, 2022Hollywood FL United States