Why 2018 Will Be All About the Data
Since big data roared onto the public stage in 2012, we’ve seen an accelerating pace of technological progress and prowess. Along with it, tech trends have come and gone. Once a rising star, Hadoop is now settling into a supporting data role while AI now soaks up all the attention.
But this year, there’s good reason to think that many organizations will dedicate a sizable chunk of their time and resources to just getting a handle on the data. After getting burned with advanced data projects that didn’t pan out, it’s becoming increasingly obvious that many organizations are not ready to partake of data’s profitable bounty until and unless they master some of the more basic requirements.
Here are some observations and predictions from industry thought-leaders about the need to pull back from unicorn dreams of magic algorithms and instead focus on the challenges of managing, moving, and manipulating data.
Our first “data truther” is Ted Dunning, a chief application architect for MapR, who says this year organizations will recognize that 90% of the success behind machine learning applications lies in the logistics surrounding data, not with algorithms and models.
“To run successful machine learning systems in the real world,” Dunning says, “it is essential to manage input data and multiple models across a complete life cycle, including model development, evaluation, and ongoing maintenance in production,” Dunning says. The need for efficient machine learning logistics, he says, will drive a trend toward streaming architectures and data fabrics.
Many of the data-oriented goals that organizations are seeking can be neatly summed up in two words: digital transformation. But to succeed with digital transformation will require a fundamental rewrite of the data infrastructure, argues Couchbase SVP of Engineering and CTO Ravi Mayuram.
“Businesses have begun to understand the linkage between customer engagement and digital transformation, and in turn they’ve realized that using old infrastructure will not help them achieve this transformation,” Mayuram writes. “Therefore, more and more businesses will evolve their business models by fundamentally rethinking their data: how it is managed, how it is moved, and how it is presented to the customer. This fundamental rethink begins at the data infrastructure level, enabling the agility that will ultimately lead businesses to reach their digital transformation goals.”
Dan Sommer, a senior director and market intelligence lead for Qlik, says data literacy will be front and center as companies seek to do more with their data. “Gartner estimates that by 2020, 80% of organizations will initiate deliberate competency development in the field of data literacy, acknowledging their extreme deficiency,” he says.
Data engineers have been in high demand for years, and that trend will likely continue in 2018. According to Kelly Stirman, vice president of strategy and CMO of Dremio, we’ll see another position rise to prominence this year: the data curator.
“Organizations are now identifying the need for a new role, the data curator, who sits between data engineers and data consumer, and who understands the meaning of the data as well as the technologies that are applied to the data,” Stirman writes. “The data curator is responsible for understanding the types of analysis that need to be performed by different groups across the organization, what datasets are well suited for this work, and the steps involved in taking the data from its raw state to the shape and form needed for the job a data consumer will perform.”
Will 2018 be the year we finally wake up to the costs of dirty data? Jon Lee, co-founder and CEO of ProsperWorks, is betting that it is.
“Whether its erroneous sales forecasts and poor customer relationship management, or the time lost having to manually check and correct inaccurate system information, the cost of dirty data is not cheap,” he writes. “In 2018, companies will begin focusing on ways to ensure data quality and data standardization across their organizations, realizing that clean data powers everything from fiscal year estimates to the machine-learning algorithms powering enterprise software.”
It’s unfortunate, but true: data scientists spend most of their time doing data janitorial work instead of experimenting with algorithms. According to Anand Raman, the big data practice head at Impetus Technologies, retrofitting legacy tools will only exacerbate the problem, which shows that a new approach is required.
“It is imperative to build an intelligent meta data model that works with big data technologies, and development tools like Spark are helping to make it easier to do that,” Raman writes. “Where traditional tools are rule-based, the newer tools provide more intelligent, self-learning data management models that build a richer metadata. These tools can also help achieve data lineage in near real-time while providing new ways to wrangle and cleanse data using data science algorithms.”
Organizations want to start using AI to improve various aspect of their operations, but the reality is the step is too steep for most. This “organizational deficiency” is preventing widespread adoption of AI, and is something that the folks at Collibra are tracking.
“Organizations are discovering there’s a dark side to AI, despite its many benefits,” Collibra CEO, Felix Van de Maele says. “In 2018, AI will expose data deficiencies and reveal where data processes fall apart. For instance, organizations will struggle to answer simple question such as: Where is the data I need for my AI project? What is the data’s quality – and is it reliable? Who owns the data – and who can fix it? And how do I embed the outputs from AI into day-to-day business operations?”
In the epistemology of digital life, data is the rootstock of information, which ultimately is what we all desire. With that in mind, John Bates, the CEO of TestPlant, makes one particularly bold prediction for 2018: the return of the Chief Information Officer.
“As companies realise that their data (i.e. information) is their most valuable asset, but that it’s now fragmented across functions who’ve all deployed independent SaaS solutions, they will look to the CIO to bring their data together,” he writes. “So the CIO will once again be the centre of information, but whereas before it was about bringing systems together, now it’s going to be about data.”