Tracking the Data Science Talent Gap
If your company is looking to hire data scientist right now, good luck. Five years after Harvard Business Review first shone the spotlight on the data scientist shortage, the gap between data science supply and demand remains substantial. In fact, the gap may be getting bigger.
How big is the data science skills gap? There are several ways to attack that problem, and a number of smart people at renowned organizations have attempted to put numbers to the problem.
Back in 2012, the research firm Gartner said there would be a shortage of 100,000 data scientists in the United States by 2020. A year earlier, McKinsey put the national gap in data scents and others with deep analytical expertise at 140,000 to 190,000 people by 2017, resulting in demand that’s 60 percent greater than supply. In 2014, the consulting firm Accenture found that more than 90 percent of its clients planned to hire people with data science expertise, but more than 40 percent cited a lack of talent as the number one problem.
Compensation is another way to track the skills shortage. In O’Reilly’s 2015 Data Science Salary Survey, data scientists in the United States earned a base median salary of $104,000, while a Glassdoor survey found the national median to be about $113,000, nearly double the median for a regular programmer.
However, if you’re working in a metro area, such as New York City or Silicon Valley, the starting salary for a data scientist can exceed $200,000, a Stanford University director recently told Bloomberg.
Clearly, the demand for data scientists is very high at the moment, and according to a recent survey by CrowdFlower, the demand isn’t letting up. In its recent 2016 Data Science Report, which the company released earlier this week, CrowdFlower found that 83 percent of respondents said there weren’t enough data scientists to go around, an increase from 79 percent reported a year earlier.
How Did We Get Here?
The huge gap in trained data scientist is largely a result of our fast-changing culture and the relentless pace of technological innovation. The advent of cheap distributed storage systems like Hadoop lets companies store and analyze the data exhaust that previously was thrown away. That data exhaust is piling up by the ton, but too few people have the necessary skills to make sense of it.
“The thing that’s really hard for people to get their heads around is the big data technologies are 20 to 50 times cheaper than traditional data warehousing technology,” says Bill Schmarzo, the EMC CTO who’s been called “The Dean of Big Data. “The economics of big data is what’s making the change.”
If data science were baseball, full-fledged data scientists would be the five-tool players who can do it all–field, throw, run, hit for average, and hit for power. No other class of worker combines the math/statistics, programming/computer science, and business expertise required to turn big data straw into gold.
The huge demand for data scientists is a boon for data science consultancies. “Everybody we talk to–big banks, hedge funds, oil and gas companies, consumer goods and Internet companies–is excited about the potential for getting insights out of their data,” says Travis Oliphant, the CEO of provider Continuum Analytics , which develops data analytic tools and provides data science consulting.
“They have a few programs that have generated a little bit of success, and now they’re saying ‘We’ve got to grow this program’ but they’re struggling to find people,” he says. “There’s only a certain number of people, and they all want the best.”
Down on the Farm
American universities have been ramping up their postgraduate data science programs ever since the data scientist shortage was first identified and DJ Patil and Tom Davenport labeled data scientist “the sexiest job of the 21st century.”
Today, there are a dozen or so PhD-level data science programs around the country, and another dozen or so established computer science programs with an emphasis in data science, according to Kennesaw State University Professor Jennifer Priestley, who is credited with forming the country’s first formal data science doctorate program.
“I think we have a short-term misalignment of what academia can provide to the marketplace and what the marketplace is demanding,” Priestley tells Datanami. “That’s what’s driving the buzz right now. Nobody can find the talent because the farm program can’t do it right now. But I think that’s changing. We’re seeing more PhD program in data science, and heaven knows we’re seeing an explosion in masters-level data science programs, and I think that’s a good thing.”
Priestley is currently going through the second round of applicants for the KSU data science program, which has five openings for a two-year program. Like the data science talent crunch felt in the marketplace, the interest in the KSU program exceeds supply.
“I have never been so popular,” Priestley says. “I wish I had had this much attention in high school. My phone rings all day long and I probably get an email every six hours from somebody somewhere on the planet who either wants to apply to our PhD program or can’t because we’re not online.”
While the misalignment may be short-term, the long-term trend is clear: data science is here to stay. Colleges across the nation are investing in postgraduate data science programs, including the University of Michigan, which recently announced it’s spending $100 million over the next five years to launch the Michigan Institute for Data Science (MIDAS).
It will take a while for academia to ramp up its data science programs and start stamping out data scientist in large numbers. In the meantime, the data science shortage has spurred the creation of a number of “bootcamp”-style data science classes, which typically last about 12 weeks and involve several dozen students at a time.
Last week the aptly named Los Angeles-based data science consultancy named Data Science announced its DS12 program, a data science residency that will focus on training prospective data scientists how to use Spark and Scala against real-world data to solve real-world problems.
The program (which is free and pays a stipend) is aimed at accelerating the skills and finding jobs for people who have a natural knack for “thinking through statistical problems,” says Chris McKinlay the senior Data Science data scientist who is leading the program.
“A good potential candidate is someone with coding skills and some degree of math and statistics,” says McKinlay, who is famous for reverse-engineering OKCupid’s algorithm. “A formal PhD or Master’s is not required. We’re testing on skills, not degrees.”
Earlier this month, Continuum Analytics launched Anaconda Skills Accelerator Program (ASAP), a 12-month “finishing school” for prospective data scientists, says CEO Oliphant. Participants in the program, which costs $5,000, will have their data science skills assessed and receive the necessary training that will make them marketable in the real world.
“It’s hard to simulate the actual pressures involved in an academic setting,” Oliphannt says. “You get tied up into solutions that seem right, but then in the real world actually aren’t useful. Say you have this really important machine learning program, but you actually don’t need a fancy deep learning algorithm here, just a regression model that’s upgraded regularly. It’s little simple things like that that don’t come out until you really solved a problem for somebody that they are paying for.”
The data scientist shortage is alive and well, despite some claims to the contrary. While software is getting better every day, there’s still no way to replace the skills and experience that a full-fledged data scientist can bring to the table. Keep your eyes out for our next story on possible long-term strategies for solving the data science skills shortage.