October 1, 2015

Big Data Really Freaks This Guy Out

Alex Woodie

The calendar said October 1, but the tone of Maciej Ceglowski’s keynote at Strata + Hadoop today was perhaps better suited for the holiday at the end of the month. Some may disagree with his apocalyptical vision of a big data world gone mad, but it’s tough to ignore.

In Ceglowski’s view, the widespread abuse of big data technology across all levels of business and government threatens to bring about a Three Mile Island moment. While the data industry may couch itself in bucolic terms like streams, logs, lakes, silos, and clouds, Ceglowski equates data to “evil radioactive waste” that nobody knows how to contain.

“A singular problem that nuclear power had was it generated these deadly waste products,” the operator of the bookmarking site Pinboard said. “The problem is their lifespan is longer than that of any institution we can figure out to guard them with.”

strata_64_bucolic

A bucolic big data scene

Today’s massive data sets have a similar property to radioactive waste, he said. “It has a lifespan that’s longer than any institution that manages it,” he said. “We have data that is very sensitive and we recognize that it’s sensitive–private emails, financial records, healthcare records. But then we also have this vast mass of bulk that we’re collecting that also turns out to be chockfull of secrets, but we just don’t guard it.”

That data exists for years after it’s collected, and that is of great concern to Ceglowski. “In a world where everything is trapped and kept forever, you become hostage to the worst thing you’ve ever done or the most embarrassing thing you’ve ever done. And the people who have information about you have great power over you whether or not they choose to exercise it.”

strata_65_haz_waste

Is big data really more like hazardous waste?

He brought up Eric Schmidt, the former CEO of Google (now Alphabet), who in 2009 told CNBC’s Maria Bartiromo: “If you have something that you don’t want anyone to know, maybe you shouldn’t be doing it in the first place.” “In principal he’s right,” Ceglowski said. “But with one problem: Sometimes you don’t know in advance what’s going to be bad.”

Ceglowski cited several historical American figures who abused the data they had. Hollywood actors who thought nothing of hanging out with socialists and communists in the 1940s were hunted down a decade later by John Edgar Hoover’s FBI and his black list. “Imagine if we had Instagram back then, what would have happened,” he said.

He also brought up former President Richard Nixon, a man who “just had great criminal instincts” and a thirst to steal confidential data. “I want you to go through a visualization exercise with me,” Celowski said. “Nixon’s in your data center! He’s got your laptop out! He’s logging in! He’s got root! What does his find?  If you’re not breaking into a cold sweat, then congratulations: You’re good steward of data. But if….Tricky Dick in your data center scares you, then think about what you’re saving.”

strata_Celowski

Maciej Ceglowski operates the the bookmarking site Pinboard and is fearful of the impact of big data

And where would a counter-culture argument about the dangers of big data be without a reference to the Vietnam War. “I think there should be a law that every Hadoop cluster that is installed needs to have a framed photograph of Robert McNamara hanging above it,” Ceglowski said. “This is a reminder to us of the dangers of the data religion.”

Ceglowski, who comes across a bit like a paranoid Ray Romano, also has an issue with the validity of the findings from big data. “There’s a con going on here,” he said. “On the data side they say, ‘Hey just collect everything. Collect all the data and we have these magical algorithms that will find everything in it for you.’ But on the algorithm side, where I am, they tell us ‘Throw any code you have at it–we have this awesome training data and we have enough of it that you’re sure to surface something interesting.'”

The problem, Ceglowski said, is that any big data analysis that involves people  has a built-in self-destruct mechanism. “Human beings always ruin everything,” he said. “As soon as your model of the world starts to include human beings, all bets are off, because they’re not dumb and they’re going to notice what you’re modeling and react to it.”

strata_truckerFor example, the use of surveillance systems and in-truck devices to monitor how long truckers can drive and how fast they can go is causing truckers to evade the monitoring, and perhaps put lives in danger on the highway.

“They enforce things like how much time you can drive between rest periods,” he said. “What do you do if you’re 10 miles away from your hotel and your time is up? Truckers realized [the device] only measured speed one time every minute, at the top of the minute. So they can drive 45mph, slow down at the minute, and as long as you stay below the threshold…on the minute mark, you’re fine and you can go as long as you want. So you have these exhausted truckers looking at their phones as they drive slowly, stop and go, on the highway late at night.”

Big data has launched a sort of arms race that pits companies and governments against people, he said. “We’ve seen it reach the point of absurdity in the online advertising industry, which unfortunately is also the economic cornerstone of the Web,” he said. “Advertisers have built a huge surveillance apparatus following their dreams of perfect knowledge, but only to find that it’s turned into a hall of mirrors where they can’t even distinguish people from traffic.”strata_data_lake_monster

The bottom line: “Your industry is too scary,” Ceglowski said. Just as the U.S. nuclear power business hasn’t recovered since the Three Mile Island accident, big data is on the verge of its own accident that will cause people to wake up and take notice.

People didn’t used to fear nuclear energy. “I have to remind you that radioactivity used to be really, really cool,” he said. “One of the first things we discovered is it fought cancer. We had radioactive face powder, radioactive toothpaste. We had delicious radium chocolate. Who can’t remember the cool satisfying flavor of a radium cigarette? And had radium condoms that glowed in the dark and of course it went well with your radium underpants.

“Right now we’re in the radium underpants stage” with big data, he said. “Much of what we’re working on is silly. Some of it is useful. And some of it is downright harmful. There are people in this audience who are working on our version of the radium cigarettes.”

There are trade-offs with data collection that people may not get, he said. “It’s not just a paradise. It hurts the people’s whose data you collect, but it also hurts your ability to think clearly,” he said. “Make sure that it’s worth it. I’m not going to claim that the sponsors of this conference are selling you a bill of goods.  I’m just going to heavily imply it.”

Ceglowski then shared some advice with his audience. “When it comes to data, number one, just don’t radium cigarettescollect it,” he said. “On a similar principle of that, if you don’t have any money, you’re not going to get mugged or worry about getting mugged.”

His second piece of advice: “If you have to collect it, don’t store it! Think in terms of flows and sampling instead of stocks and mining,” he said. “Figure out how much you can do with ephemeral data. If you can convince and reliably promise the users that you are using ephemeral data, that you really aren’t storing it – they’re willing to give you much more.”

His last piece of advice: “And finally if you have to store it, don’t keep it forever! Don’t sell lit to Acxiom! Don’t put it in Amazon Glacier and forget the password and that it’s even there. Don’t do any of these things.”radium underpants

The current model of total surveillance and permanent storage is not tenable, he said. “If we keep it up, we’re going to have our own version of Three Mile Island.  Some widely publicized failure of galvanizing popular opinion against a technology. And then people who are angry or mistrustful and may not understand anything about computers will regulate your industry into the ground. You workers left will be like those poor saps in a nuclear reactor who can’t sharpen a pencil without filling a form in triplicate. You don’t’ want that to happen. Even I don’t want that to happen.”

We can have a radiant future powered by big data, “but it’s going to require self-control, circumspection and much more control for safety than so far we’ve been willing to show,” he said. “It’s time that we all take a deep breath and pull up our radium underpants.”

Related Items:

Feds Strive to Balance Data Sharing and Risk

Concerns About Big Data Abuses Grow

Big Data’s Dark Side

 

Share This