Rise of the Big Data Engineer
Over the past three years, businesses have scrambled to hire data scientists who can spin big data into gold. But it turns out the person best equipped to get data analytics applications into production may be a good data engineer.
The surge in demand for data scientists over the past few years has been well documented. McKinsey set the ball rolling with a report in 2011 that found the US faced a massive shortage of data scientists and others with deep analytical skill. In 2012 the Harvard Business Review threw gas on the fire when it anointed data scientists “the sexiest job of the 21st century.”
Suddenly, everybody seemed to either want to hire a data scientist, or wanted to become one. The race to recruit and retain the vaunted unicorns boosted salaries into the $200,000 for experienced data scientists, while universities shifted their data scientist programs into overdrive. If big data analytics could unlock new levels of profitability and efficiency, we all thought, then data scientists held the keys.
At least, that was the conventional thinking. The thing is, that view of the world may no longer correspond with reality. Signs are beginning to emerge pointing to the popping of the data scientist bubble. Big data analytics remains too great to pass up, but what’s changing are the titles of the personnel who can guide us to the big data promised land.
While highly skilled data scientists remain at the top of the big data food chain, it may be the versatile data engineer who’s poised to have a bigger impact on a company’s data analytics aspirations. Increasingly, companies are turning to data engineers to help them build and run big data platforms.
A data engineer is the all-purpose everyman of a big data analytics operation, working between downstream analysts on the one hand, and upstream data scientists on the other. They will often come from programming backgrounds, and are experts in big data frameworks, such as Hadoop. They’re called on to ensure that data pipelines are scalable, repeatable, and secure, and can serve multiple constituents in the enterprise.
Sean Kandel, the CTO and co-founder of data transformation startup Trifacta, noticed an uptick in interest around data engineers. “I’m definitely seeing that title pop up more, and seeing more postings for it,” Kandel told Datanami.
Figures from job posting websites show much higher demand for data engineers than for data scientists:
- LinkedIn currently has nearly 21,000 postings for jobs with “data engineer” in the title, compared to just over 11,000 for jobs with “data scientist” in the title.
- Indeed.com shows more than 90,000 hits for data engineer jobs, compared to about 11,000 for data scientist.
- Dice.com has more than 13,000 postings for data engineer, while only about 500 for data scientists.
- CareerBuilder.com has fewer than 1,000 postings for data scientist positions in the past 30 days, compared to more than 16,000 for data engineers.
Data engineers are instrumental in rolling out data analytics applications, Kandel said. “They help facilitate getting data from a variety of different sources, getting it in the right formats, assuring that it adhere to data quality standards, and assuring that downstream users can get that data quickly so they can perform whatever downstream tasks that is, whether reporting or exploratory analytics or, in the case of a data scientist, it might be building a new models or a recommendation algorithm,” he says.
Data engineers are not merely watered-down data scientists, doing the grunt work that data scientists don’t want to do. Rather, the two positions complement each other, Kandel said.
“As we’ve seen data scientists being successful working with data and building models that have high impact on an organization, we’re seeing a thirst for data in other business units and analysts, and the data engineer act as that middleman between access to data and performing analysis or getting to some new insight,” he said. “Data engineers have more of an eye toward how to take insights and put them into production, how to operationalize these things.”
Data engineers form the backbone of the team of big data professionals at Think Big Analytics, a 75-person services firm that was bought by Teradata last week. While the company has its share of data scientists who know R and SAS and can build complicated models, it’s the data engineers who create the foundation for the work, according to Ron Bodkin, founder and CEO of the San Francisco-based company.
“There is a lot of demand for data scientists. We’ve been able to ramp up and build our data science team,” Bodkin told Datanani. “But I often think that too often people look to data scientists and ignore the fact that in order to be successful in data science, you need to have an effective data platform. We actually see the biggest skill gap is in high quality data engineers who can build these new data applications and organize the data. And once you have that in place, then the demand for data science really picks up in a meaningful way.”
The demand for big data analytics services is huge at the moment, but the new technologies and skillsets involved so far have proved difficult for traditional systems integrators to master. Teradata bought Think Big Analytics after struggling to create its own services team to fulfill demand for consultants who can build big data analytics applications.
“We were getting people but they’re hard to keep and hard to find,” said Teradata’s vice president of products and services marketing Chris Twogood. “It just wasn’t happening fast enough for us. There are very few companies that are focused on how do I take multiple technologies and stitch them together to solve a business problem. That’s one of the key interests we had in Think Big.”
Think Big’s emphasis on data engineers grounded in the real world as opposed to data scientists heavy on theory also attracted Teradata, Twogood said. “One of the thing we liked about Think Big is they hire software engineers as the baseline because they have the aptitude for this, rather than taking people who were well-versed in data modeling and data,” he said. “They have a great recipe for how to get good talent in.”
The versatility of engineers make them good candidates for big data work, Bodkin said. “We look for high quality software engineers who have experience learning new technologies, can work with a variety of approaches, and have worked with data in a variety of contexts,” he said. “We hire from a range of experienced software architects and distributed systems architects…then we’re able to teach them what they need to be effective.”