Room to Grow on the Big Data Maturity Curve
If accurate self-reflection is the mark of a mature man, then we are collectively still boys when it comes to big data, according to an AtScale report unveiled today that detected a significant discrepancy between how we describe our big data prowess versus our actual capabilities.
“There is an overconfidence and a lack of context for a lot of enterprises we survey,” says Bruno Aziza, chief marketing officer for AtScale, which develops OLAP-style analytic tools for Hadoop and other big data platforms. “When we ask people what they say their big data maturity is, 78% say it’s medium or high. But in reality, when we look at the methodology we laid out, only 12% have high-level maturity.”
That was just one takeaway from AtScale’s Big Data Maturity survey, in which it asked more than 5,500 people over the course of three years what they’re doing with regards to big data tech, including Hadoop, advanced analytics, BI tools, and cloud deployments. The report uncovered some interesting nuggets that highlight where customers stand at this stage in the game.
Here’s another: the cloud is dominating big data conversations, and Google BigQuery is slaying it.
When AtScale asked about cloud deployments in 2016, some users said they were looking into it. Now, 59% of respondents say they have actually deployed into the cloud, up from 53% in 2017. Only 23% of respondents say they will not use the cloud at all, down from 28% last year.
Multi-cloud is also big. “We talked to about 3,000 people across Gartner [BI Summit] and Strata last week and we asked: What’s your cloud strategy?” Aziza said in an interview last week. “Eighty percent of people said they have a multi-cloud strategy. Nobody is betting it all on one. We thought that was interesting.”
However, the move the cloud is seen taking a toll on self-service. The percentage of respondents who reported having self-service access to big data analytics declined from 47% in 2017 to 42% this year.
The rate of adoption of Google BigQuery also surprised AtScale, which conducted the survey with assistance from Cloudera, Hortonworks, The Linux Foundation, MapR, and Tableau. Out of all the respondents, 11% said they were planning on using BigQuery.
While the SQL data warehousing offerings from Amazon Web Services and Microsoft Azure have bigger customer bases, AtScale found the rate of growth highest for the Google Cloud Compute offering. “We were surprised because Google BigQuery came on the market last year really,” Aziza said. “That’s a big number for a technology that was nowhere a year ago.”
Here’s another takeaway: Data governance is a growing concern.
When AtScale asked users to rank their top big data concerns, governance was the least concerning challenge in 2016 out of five challenges. In 2018, it was number two, behind skill set, which is a perennial concern.
What’s driving that? Well, the looming deadline for compliance with the General Data Protection Regulation (GDPR) is one factor. The migration to the cloud is another, according to Aziza.
“As people are moving data platforms from on-prem to the cloud, they’re being challenged to provide self-service capabilities the way they did on premise, and that’s being seen in two places: one is pure self-service access, and the other is governance,” he says. “It’s not just security – it’s about data access, who should have access to what data, when, across what reports. It’s really hard to manage when your data is in multiple formats and multiple places.”
As we move from an on-premise to a hybrid data storage world, it should come as no surprise that data activities become decentralized. And while we can still hear the Hadoop mantra of “centralize your data” echoing across cyberspace, the number of data silos used by enterprises is increasing, not decreasing.
“The big change over the past year over two years is the realization that there’s not a single data platform that’s going to handle all the use cases,” Mariani says. “And while everybody thought that Hadoop was going to be good a everything, I think they figured out exactly what it’s good at, which means they still need other data stores and other platforms.”
Hadoop is still a much cheaper place to park large amounts of data. And according to AtScale, companies that have built up Hadoop expertise are still expanding their use of the technology. But AtScale’s data suggests the pipeline of new Hadoop users is shrinking due to several reasons, with the available of cloud platforms being arguably the biggest.
“The cloud definitely makes [integration challenges] better,” Mariani said. “Each cloud has its own set of tools that work [well] together. Another reason why people want to move to the cloud is that integration has somewhat been solved for them versus having to run on prem,” says Mariani.
However, Hadoop’s cost advantage is not as big of a factor as it used to be. While cost savings was the number two most commonly cited reason for adopting Hadoop in 2017, it fell to number five in 2018.
“I think there’s been a realization that Hadoop is a data platform, not the data platform where you can unplug all the legacy stuff,” Mariani said. “It no longer is a cost savings play. It’s more a capabilities play. At Yahoo, the reason we invented Hadoop to begin with was we wanted to capture all our log files from our Web storage and our ad servers. I just couldn’t afford to jam that into an Oracle database. It allowed us to have a new capability – to scale out — but in a way where we’re not throwing data away.”
The last interesting tidbit from AtScale’s report is the maturation of Microsoft’s PowerBI. Just as Excel was seen (is still seen) as the default tool for basic data manipulation of smaller data sets, PowerBI is beginning to occupy that spot for low-cost power visualization tool for bigger data setes.
According to the report, PowerBI moved up four positions in its ranking to number two, right behind Tableua. PowerBI has two things going for it, according to Aziza. First it meets 80% of people’s needs. And secondly it only costs $10 per user per month.
Mariani relates the experience of an AtScale customer, one of the biggest car makers in the world. :”They had Tableau and Tableau was their tool of choice. But the customer decided it was too expensive to roll out in a broad fashion,” he said. “So he saw PowerBI as sort of for the masses, and Tableu for the data ninjas.”
AtScale will share its findings in a webinar on March 27, which you can register for here.