Follow Datanami:
June 8, 2015

Four Ways Your Data is Lying to You

Howard Lau

Every second, new online data emerges in the form of posts, tweets, emails, and comments from your customers, clients and constituents to provide insights into economic trends, customer behavior and competitive threats. Measuring in the billions, data points provide an endless and ongoing stream of valuable opportunities for organizations to optimize their relationships, products and operations, making the Internet essentially the world’s largest focus group.

The reams of data available today are probably our most mission-critical and valuable connection to consumers and constituents, but this comes at a high cost. Recent research from Aberdeen has shown that the more data sources a company uses, the lower the trust in the data becomes. To add to the challenge of data interpretation, not all data management tools are created equally, with many boiling data down into skewed or sometimes false conclusions.

What this tells us is that before organizations can take full advantage of the opportunities that the endless supply of ever-changing data (and sources) provide, they must establish a solid foundation of data management – complete with the ability to distill the “right” data from the “wrong”.

When it comes to validating data, we suggest considering the following four ways your data is lying to you as a way to determine whether your data is telling you the whole truth:

Lie #1: Bad means bad, unless it means good

Here’s an understatement: There is a massive amount of online data available today. In fact, it is estimated that more than 2.5 exabytes of data is created daily [IBM]. At first glance, the volume looks like a windfall – who can resist crunching the feedback and opinions of billions of bytes into useful business strategy? But with volume comes challenges.

There is so much data to crunch that many tools available have to default to oversimplified searches, vacuuming up text data and then segmenting it into “black or white” or “yes or no” results just to keep pace. These good/bad, positive/negative buckets at best provide a primitive summary of the data collected. At it’s worst – and more likely – the segmentation of data into a limited number of categories can be erroneous and lead brands down incorrect, misguided paths of inference and strategy.apple_logo

In order for data to be strategically actionable, organizations have to be able to say at a glance, “What are our customers and competitors talking about? What are they latching on to? What are the conversations that people are having?”

For example, the differences between Apple the electronics company and apple the fruit are intuitive to most of us. The same goes for Sprint the company versus sprint the verb. Or even tossing the word “sick” into a Tweet poses problems, because most programs won’t know whether it’s referring to an illness or something that is “awesome.” Same goes for the adjective “bad”: is it referring to the good ‘bad’ or the bad ‘bad’?

These common phrases are just a few examples of language having multiple sides and meanings. While the correct interpretation may be obvious to the human eye, no organization has the resources to employ enough human eyes to digest, process and analyze the streams of data spewing each second from millions and millions of online sources in real time. Instead, our technology needs to be able to keep up with the way language unfolds online and “read” data with an ear for its correct, intended meaning.

Discovering the context behind data is key to turning it into something upon which companies can act. Are people posting, for example, about long lines to buy a particular product at a carrier’s store, or about a new diet craze? (Hint: Apple).

So our first example of lying data is this: Any data that is guided by technically strict, myopic and literal absolutes, instead of demanding the inclusion of a “human” touch (meaning the contextual environment around it), needs to be viewed with a highly skeptical eye. It’s critical at the onset to determine the correct inflection or meaning of the data you collect.

Data needs context for accuracy, because only when baseline data is interpreted within the context it is created can it be analyzed truthfully and used for strategic decision-making.

Lie #2: Don’t live in the past; it’s all about tomorrow!

Actually, it’s not… When analyzing social media results, another important notion to consider is time.
Historical data allows us to create a complete story and develop an outline of when a customer’s history began, when it will peak, and when it is likely to end—and why.charts and graphs_1

What businesses should seek from their analytics results is a story: one with a beginning, middle and end. Discover this, and you are more likely to predict the end and base business decisions on that timeline. Such predictions can alert a business to what is coming and what the results—whether good or bad—may be. With historical data (another form of context) figured in to the mix, businesses are able to set up a sort of “early warning system” for emerging trends and market shifts so that companies can stay ahead of events. With the real-time component built in, businesses should be able to find out what’s trending now and adjust campaigns accordingly while also accurately segmenting customer emotional and behavioral patterns across millions of data sources and various languages in real time.

Lie #3: Spam is only for email

It’s not just our inbox; analytics reports contend with vast amounts of spam. We’ve allowed machines to assist with data classification since the early days of email, but spam filtering of online text data analytics absolutely needs to be expanded to include trolls and even unrelated terms.

Part of the problem of getting a good picture from data is the sheer amount of spam, suspam alertch as clickbait and tweets about great prices on the iPhone 6, Samsung Galaxy giveaways and the like. Simple metrics on mentions aren’t enough without sufficient context. It is estimated that as much as 30% to 50% of data content is spam, which means you can’t even rely on the metrics that you get back, when one-third to half might not really be mentions of your brand or products.

Again, this is where recognizing the context – for example, when a mention of “sprint” refers to a race and when it refers to the wireless carrier – becomes critical.

Without appropriate filters, data results are often watered down or even drastically skewed. Depending on your business, you should choose to apply strict spam filter(s) to fit your contextual requirements, rather than a one-size-fits-all filter.

Lie #4: Data that only tells you what you already – or want to – believe

One of the greatest advantages of social media analytics is discovery. In other words, finding out what you don’t already know. With the right tools, social media can lead you to amazing discoveries, including everything from finding out who your top influencers are to navigating consumer complaints to discovering millions of dollars in fraudulent activity. Data analysis has become our most accessible “early warning system” for emerging trends and market shifts.

Several variables tie in to the discovery component, including previously mentioned context and sentiment, but knowing where to look and what to look for in order to identify the right trends is key. Enterprise organizations can assume that today’s social customers and observers are actively talking about their products and services on millions of sites across the web – what they shouldn’t assume is what’s being said, or even the conversational direction.Jim_Carrey_is_Yes_man

This is a challenge because many analytical monitoring tools typically look for what they’re told to find, thereby missing important data. It’s like only looking at one suspect in a murder case – while you might want to believe the dastardly ex-convicts did it, it might be that the two sweet aunts next door are actually the culprits.

Therefore, one of the biggest contributors to false data is predisposition. Or even prejudice.

If organizations instead analyze data to understand conversation threads emerging from a topic, they would be better equipped to understand their data beyond simplistic keywords and see big pictures and implications for their market. An open-minded approach will reap more valuable insights and have more positive significant impact on business.

After all, let’s face it: unless a company is able to identify trends at their onset, “see” what’s on the minds of customers and apply context for data accuracy, they are probably missing out on valuable discoveries of intelligence that could present competitive advantages, or even provide a jump on preventing negative threats.

Knowing what your customers are saying about you, tracking your competition and staying on top of trends in your industry are all major factors that can shape the success of your business. But it all relies on accurate data.

Delving into the world of online and offline data out there can be daunting, so when you do, keep in mind that truth in data should be a top priority. Otherwise, all those numbers are merely ill substantiated or false conjecture.Howard Lau, Attensity Headshot

About the author: Howard Lau serves as Chairman and CEO for Attensity, the provider of solutions for Global 1000 companies to surface business intelligence through real-time discovery.

Related Items:

Has Dirty Data Met Its Match?

Forget the Algorithms and Start Cleaning Your Data

Big Data’s Dark Side

Datanami