Follow Datanami:
August 1, 2017

Is Your Smartphone Spying On You?

(Valery Brozhinsky/Shutterstock)

That little phone in your pocket or purse is generating a massive amount of data – much more than you probably realize. And thanks to services offered by the phone makers and network providers like Apple and Verizon, it’s easier than ever to capitalize on all that data. But are the data giants doing enough to ensure privacy? Continuum’s Peter Wang weighs in.

One of the newest predictive analytic offerings to hit the street is Apple‘s Core ML. Unveiled in June, the new service enables software developers to incorporate Apple’s machine learning technology into the iOS applications they develop for iPhones and iPads without requiring them to develop their own data science expertise.

With just a few lines of code, developers can enable their applications to automatically detect things like human faces, landmarks, and text through the use of deep learning models. Apple, which acquired machine learning software company Turi last year, says the data is analyzed in place utilizing the CPU and GPU capacity of the devices themselves.

This easy onramp into the world of data science will become more common in the future says Peter Wang, the CTO and co-founder of Continuum Analytics, which develops and packages open source data science tools.

“They’re going to encourage you to use their predictive services as opposed to trying to gather your own data and go through the incredible pain of buying your own data from third party vendors and assembling your predictive services,” Wang tells Datanami. “It’s much easier for me to pay the .02 cents to make this API call to Apple’s predictive thing.”

Google is doing much the same thing with its Android operating system, although the Alphabet subsidiary has access to a much greater amount of data thanks to its more diversified offerings. “Google has all the data,” Wang says. “Google has every keystroke, every email. Google knows every query, every aborted query, every aborted page load in the Chrome browser.”

The mobile network and cable carriers also have access to vast amounts of data generated by the nearly 330 million smartphones in the United States – more than one for every man, woman, and child. Verizon, AT&T, T-Mobile, and other telecommunication firms stockpile this data and sell aggregated and anonymized versions of it to marketing firms so they can better target their chosen demographic.


Trails of Desire

Today’s Americans leave a digital trail wherever they go, and that trail offers a tremendous capability for companies to observe details of our lives. Some of us may welcome the better offers we receive as a result of our digital and physical wanderings, while others may not be comfortable with it.

As with almost everything, there are plusses and minuses to big data from smartphones. It’s not all Big Brother-esque snooping and wanton privacy intrusion. Electric utilities, for example, can use data from smartphones to predict demand on the grid with a greater degree of accuracy, thereby improving reliability. Road designers can also use location data culled from smartphones to better predict the flow of vehicles at rush hour, thereby improving traffic flow.

But the line between big data benefits and abuse is a fuzzy one. The telecom firms that stockpile location and demographic data of smartphone users are mindful of the potential for abuse, which is why they take pains to water down the impact of their data warehouses.

“They know what’s at stake enough to not give any of that away,” Wang says. “In fact they spend most of the time trying to scrub the data to make it more anonymous. One of the telcos was saying, ‘Our problem is not that we can’t build data products around our data set. It’s that, if the data products are too good, then there may be a congressional hearing.'”

The majority of Americans have no idea how much insight big companies have into personal details of their lives, Wang says. That ignorance is a problem.

“Every single aspect of human life across billions of people is being recorded by several of these large companies and they essentially hold all of that,” he says. “If most people realized how much data they’re just shedding about their daily lives, and more importantly, how singularly well they can predict human behavior on the basis of a large enough sample size – if people realized that, they would be really freaked out.”

The Creepy Factor

When enough data is collected from a big enough population, any deviations from the norm stand out like a sore thumb. In statistics, it’s called the bias-variance trade off, Wang says.

“Once you get enough signal, then you can


subtract out that bias and you get extremely crisp visibility into each individual’s little idiosyncrasies,” he says. “That’s when people start feeling weirded out by how accurate the target data is.”

People are generally boring, and that’s born out in the baseline of data. “The baseline for most people in a city of more than a couple of thousand people is really good,” Wang says. “It lets us say ‘Here’s what people should be doing in this town. And here are all the weirdos doing weird things.’ That’s not good. It completely destroys the notion of a private life.”

American companies have a tendency to jump headlong into new technology and see how far it can go levels without necessarily stopping to consider the ethics involved, Wang says. In the case of data analytics from smartphones and social media sites, companies have pushed it “way beyond what society is comfortable with,” he says. “We just haven’t had a hard conversation about it.”

Remediation or Regulation

To Wang’s credit, he’s not afraid to start the conversation, and to raise tough questions about how companies are using technologies that the data science community is creating.

“It’s become more and more apparent to me that … so much of the return on capital investment in technology right now is tied to exploiting knowledge of people, their behaviors, their sentiments, their whatever have you,” he says. “They’re not aware they’re being exploited this way.”

Continuum Analytics CTO and co-founder Peter Wang

In Europe, the forthcoming General Data Protection Regulation (GDPR), which goes into effect next May, is expected to give consumers more power to control how their data is collected and used. The United States doesn’t have an equivalent regulation like the GDPR, and experts say it isn’t likely that we’ll have one with the current administration and the agenda of congressional leadership.

In the meantime, we essentially allow phone makers, telecom providers, and social media firms to police themselves when it comes to how far they will push the analytics on data collect from smartphones. That raises the likelihood that companies will push the limit of what the technology can do.

“They say, ‘Oh we won’t let you target just on the basis of these demographics,” Wang says. “But if they wanted to, they absolutely could. ‘Oh you’re only looking for single divorcees with more than two kids,’ or ‘You’re only looking for black people of a certain income in this area.’ They have all this data. But they know there would be an absolute uproar if they did this too much.”

The long-term degenerative impact that big data will have on our privacy will eventually require a reckoning, Wang says. Whether government steps in to apply regulations, or whether the industry can regulate itself, is up in the air. What’s certain is that privacy is something that’s worth protecting in the United States.

“The more I see it, the more I think about it, the more deeply I believe that what makes our society unique, and what makes American society great, is we create the space for people to have an individual purpose and individual meaning, and that’s only possible if y u can have a private life,” he says. “More and more of the mechanisms we have are destroying that ability to have privacy, and I think that has a greater social impact.”

Related Items:

GDPR: Say Goodbye to Big Data’s Wild West

What’s Challenging in Big Data Now: Integration and Privacy

RISELab Replaces AMPLab with Secure, Real-Time Focus