Who Controls Our Algorithmic Future?
The accelerating pace of digitization is bringing real, tangible benefits to our society and economy, which we cover daily in the pages on this site. But increased reliance on machine learning algorithms brings its own unique set of risks that threaten to unwind progress and turn people against one another. Three speakers at last week’s Strata Data Conference in New York put in all in perspective.
The first Strata speaker with a dire warning about the potential for algorithms to go haywire was Cathy O’Neil, who holds a PhD in mathematics from Harvard University and is the founder of the website mathbabe.org.
After working as a quant in the finance industry for several years, O’Neil became disenfranchised with the field and how it used data science to rack up profits at customers’ expense. In 2016 she wrote “Weapons of Math Destruction” to raise awareness of the problems she’s seeing. During her keynote at the Javits Center last week, O’Neil talked about how WMDs are hurting many people.
“We are hiding behind mathematics as a shield, pretending the stuff we’re doing is good for everyone, objective and unbiased,” she said. “The people who create those algorithms and are secretly choosing definitions of success and penalties for failures are pretty homogeneous in general. We don’t have a very diverse crowd thinking through these questions. It’s a problem.”
There are three qualities that turn a regular run-of-the-mill algorithm into a WMD, according to O’Neil. First, they’re widespread. “They’re making important decisions about a lot of people. Getting a job, getting a credit card, getting insurance, going to college, going to prison.”
Second, they’re secret. “People don’t understand how they’re being scored,” O’Neil said. “They can’t appeal. They often don’t even understand that they’re being scored.”
Lastly, they’re unfair to the induvial. They impact “hundreds thousands of individuals [who are] unfairly being denied something they deserve by a secret algorithm that they can’t understand and cannot appeal.”
O’Neil related a story about a teacher accountability test that was brought to her attention by a friend, who is a principal of a Brooklyn high school. “She said her teachers were at risk of not getting tenure based on a secret scoring algorithm that she couldn’t understand,” O’Neil said. “I said, show me the formula I’ll explain it to you. I’m a mathematician. She said, we’ll I asked for the formula and they told me it was math and I wouldn’t understand it.
“That is a bad sign,” O’Neil said. “It’s a weaponization of mathematics. People trust math and they’re afraid of math. It reminded me of finance. I looked into and I found there were all sorts of crazy things happening.”
It turned out, the model was primarily based on scores students received on standardized tests—specifically, the difference between students’ expected score at beginning of year and their actual score at year’s end. The low sample size and likelihood of statistical error led O’Neil to classify it as a WMD.
“It’s almost a random number generator,” O’Neil said. “When I talked to data scientists building this, they said it’s not as bad as a random number generator. That’s true, it’s a little bit more informative that that. But that’s what’s weird about data science… that we haven’t established standards for what is good enough. In finance, if I have a model that predicted the market 51% of the time, I would make money. But if you have an algorithm that finds bad teachers sometimes, that’s not good enough if you’re going to get them fired.”
Nobody sets out to make a bad algorithm, O’Neil said. But if the data scientists writing algorithms aren’t careful, they can not only fail at their primary goal, but create unintended consequences that negatively impact a large number of people.
“Machine learning doesn’t make things fair,” O’Neil concluded. “It repeats patterns. It automates the status quo. If the status quo was perfect, we’d want that….But every corporate culture has implicit bias that we have to acknowledge and we have to keep track of it.”
Skewing Training Data
Machine learning algorithms themselves are potential harbors of hidden biases, which can hurt society when poorly crafted models are applied broadly. But there’s also the potential for third-parties to actively subvert the public good by consciously polluting the training data that algorithms use to base their decisions. This was the message that Danah Boyd of Microsoft Research brought to Strata last week.
Boyd brought several colorful examples of how small groups of people use the Internet to achieve their end. Her first example was from 2003, when the LGBT community became upset when Pennsylvania Senator Rick Santorum publicly compared homosexuality to bestiality. Boyd says a member of the LGBT community created a website where the word Santorum was falsely given a new definition.
“To the senator’s horror, countless number of the public jumped in and linked to that website in an effort to influence the search engine,” Boyd said. It would become one of the first instance of “Google bombing,” which is a form of media manipulation intended to mess with data and impact society, Boyd says.
Another case arose 10 years ago, when a group of Web enthusiasts thought it would be funny to surreptitiously replace links to various Web pages with a YouTube recording of Rick Astley’s 1987 hit song “Never Gonna Give You Up.” Instead of going to a Web page to see the trailer to a new movie or to a politically controversial website, for example, they instead got Astley’s catchy crooning.
This practical joke, dubbed “rickrolling,” peaked in 2008 but its impact had a wider arc. “They thought that rickrolling would just be fun. It would be entertaining for lots of people,” Boyd said. “But through this practice lots of people learned how to manipulate systems. They learned that going viral is something they can use strategically. In other words they learned how to hack an attention economy.”
More recently, the practice as evolved to the destruction of fake news on social media sites like Facebook. Boyd said that large networks of folks looking to toy with the information ecosoystem have created a set of false accounts called “sock puppets” that they use to “subtlety influence journalist by asking them questions, pointing to blog posts, uploading YouTube videos, and creating a culture where people had to look for new information.”
“The goal isn’t actually to convince journalists that it’s true,” Boyd continued. “To the contrary: The goal is actually to get them to negate the content, to negate the theory, and in doing so what happens when people don’t trust the media is that more people believe there’s something to that conspiracy, so they feel the need to go and self-investigate. That’s exactly what happened.”
The phenomenon has snared some journalists in traps set by “radical trolls [who] know how to manipulate that system and play with the infrastructure and cause trouble,” Boyd said. “A researcher I work with…to this day can’t go on Amazon or Netflix or YouTube without being recommend all sorts of neo-Nazi music and videos.”
“Media manipulation is not new,” she continued. But “we are currently seeing an evolution in how data is being manipulated. We need to start building the infrastructure necessary to limit the corruption and abuse of that data, and how bias and problematic data might make its way through the technology … to the foundations of our society. In short we need to start thinking seriously about what it means to deal with security in a data driven world.”
Better AI Genies
The last word on the matter fell to Tim O’Reilly, the founder of O’Reilly Media, which was a co-sponsor of the Strata Data Conference. He used his keynote to showcase some of the ideas he explores in his new book “WTF: What’s the Future and Why It’s Up To Us,” which is actually published by Harper Collins.
O’Reilly is mostly bullish on the potential for AI to positively impact humanity. “We are entering a new period where we will be surprised, where we will have the WTF of amazement, not the WTF of dismay…where we see things that we couldn’t imagine on our own,” he said. It will be as remarkable to us as when the Wright Brothers first took to the skies more than a century ago, he added.
But getting to that promised land will take more work. “We really have to understand these algorithms do exactly what we tell them to do, but we don’t always understand what it is that we told them,” he said. “We all think that we’re doing something with our algorithms, but we don’t always get what we asked for. Just like the people in the stories from One Thousand and One Nights, we don’t quite understand how to talk to our genies, how to ask them the right wish.”
We should all be excited about the prospects of AI to improve mankind, even if it appears to be taking jobs. Technology, he said, doesn’t seek to eliminate jobs or make a small number of people rich while making the rest of society poor. “Cognitive augmentation is the key part of the future of our economy,” he said. “Technology is the solution to human problems. We won’t run out of work until we run out of problems.”
O’Reilly then implored the room full of technologists to take more care in how they build cognitive systems. “If we’re going to build a better world, we have to remember to tell these systems that we are building what it is that we want,” he said. “We have to, as a society, take all the lessons that all of you in this room are applying to your businesses…We have to apply all that ingenuity to rethink the rules that govern our society, that govern our economy, that are choosing the paths to prosperity that we opened to ourselves.”
This will eventually be a political debate, but for now it’s something that O’Reilly wants technologists in the private sector to embed into their day-to-day actions. “This is a process for all of us in business to understand our value, and what are we encoding into the systems we build,” he said. “What is the wish we’re giving to the genie that we are about to unleash on the world?”