Follow Datanami:
August 21, 2015

Conference Takes Aim at Data Enrichment Challenges

Data scientists who are interested in learning about the latest data enrichment tools and techniques may want to check out the Rich Data Summit, a one-day conference taking place in San Francisco this fall. It’s the first event dedicated to the topic of data enrichment, according to event host CrowdFlower.

Companies across all industries are jumping into data analytics and machine learning with the hope of harnessing big data for a competitive advantage. But before they can become data-driven, companies must first learn how to enrich their data. That’s where many of them get tripped up, says Lukas Biewald, the co-founder and CEO of CrowdFlower.

“It’s not a sustainable world where data scientists are so expensive to hire and then spend 80 percent of their time on data cleanup,” Biewald tells Datanami. “It’s not anything they want to do, but it’s something they have to do to make the data projects work.”

That’s especially true of machine learning, the core technology underpinning the types of predictive analytic applications that are in such high demand at the moment. As a provider of crowdsourced data cleaning and enrichment services, CrowdFlower does its fair share of prep work for the algorithms.

“The dirty secret of machine learning is you need more and more training data. You need almost an unlimited amount of training data to make them work really effectively,” Biewald says. “One of the issues [with machine learning] is that it tends to be good to 80 percent accuracy. But for most business processes, you really need like 99 percent accuracy for it to be viable.”

There’s no shortage of big data conferences making the rounds, but Biewald didn’t see any that focused on the data enrichment challenges that data scientists meet on a daily basis. “There’s a bazillion data conferences out there, but I wanted a conference that I would actually want to go to, that would be really useful to me,” he says. “I’m pretty excited to do a conference around it. I think it’s going to be really cool to see all the different vendors talking about practical ways to streamline that process.”

Headlining the event, which takes place October 14 at The Village, are keynote speakers Nate Silver, the founder and editor of FiveThirtyEight; Monica Rogati, vice president of data at Jawbone; and Beth Noveck, founder of a professor at NYU Poly, founder of The GovLab at NYU, and former head of President Obama’s Open Government Initiative.

Biewald admits to being a big fan of Silver’s. “He’s kind of my hero,” he says. “He did a great job of making data analysis cool and relevant to the world. I was so excited we’re able to get him. He’s a big advocate of taking messy, unusable data and turning it into rich or useful data.”

The format of the event will tend toward shorter talks and informal gatherings, with a laser focus on the data-enrichment challenges facing data scientists. “We want to make it useful and interesting,” Biewald says. “A lot of people say, ‘hey I go to a conference and the best part is in the lobby, meeting people.’ I figure, if we’re going to have talks, we might as well make those talks good.  So we really put a lot of work into making the [content] really relevant to people.”

Other speakers will include Joe Hellerstein, the founder and CSO of Trifacta; Anthony Goldbloom; the founder and CEO of Kaggle; Ben Lorica, chief data scientist at O’Reilly Media; James Rubinstein, a data scientist at Pintrest; and Josh Wills, director of data science at Cloudera.

Registration has been strong, but there are still tickets available to the event, which is expected to attract more than 500 people. Datanami readers can receive a special discount on the registration fee by entering the code “Datanami20” at checkout. For more information see the event’s website at