January 4, 2018

Why 2018 Will Be All About the Data

Alex Woodie

Since big data roared onto the public stage in 2012, we’ve seen an accelerating pace of technological progress and prowess. Along with it, tech trends have come and gone. Once a rising star, Hadoop is now settling into a supporting data role while AI now soaks up all the attention.

But this year, there’s good reason to think that many organizations will dedicate a sizable chunk of their time and resources to just getting a handle on the data. After getting burned with advanced data projects that didn’t pan out, it’s becoming increasingly obvious that many organizations are not ready to partake of data’s profitable bounty until and unless they master some of the more basic requirements.

Here are some observations and predictions from industry thought-leaders about the need to pull back from unicorn dreams of magic algorithms and instead focus on the challenges of managing, moving, and manipulating data.

Our first “data truther” is Ted Dunning, a chief application architect for MapR, who says this year organizations will recognize that 90% of the success behind machine learning applications lies in the logistics surrounding data, not with algorithms and models.

“To run successful machine learning systems in the real world,” Dunning says, “it is essential to manage input data and multiple models across a complete life cycle, including model development, evaluation, and ongoing maintenance in production,” Dunning says. The need for efficient machine learning logistics, he says, will drive a trend toward streaming architectures and data fabrics.

Many of the data-oriented goals that organizations are seeking can be neatly summed up in two words: digital transformation. But to succeed with digital transformation will require a fundamental rewrite of the data infrastructure, argues Couchbase SVP of Engineering and CTO Ravi Mayuram.

“Businesses have begun to understand the linkage between customer engagement and digital transformation, and in turn they’ve realized that using old infrastructure will not help them achieve this transformation,” Mayuram writes. “Therefore, more and more businesses will evolve their business models by fundamentally rethinking their data: how it is managed, how it is moved, and how it is presented to the customer. This fundamental rethink begins at the data infrastructure level, enabling the agility that will ultimately lead businesses to reach their digital transformation goals.”

Data management will re-emerge to be a big differentiator this year

Dan Sommer, a senior director and market intelligence lead for Qlik, says data literacy will be front and center as companies seek to do more with their data. “Gartner estimates that by 2020, 80% of organizations will initiate deliberate competency development in the field of data literacy, acknowledging their extreme deficiency,” he says.

Data engineers have been in high demand for years, and that trend will likely continue in 2018. According to Kelly Stirman, vice president of strategy and CMO of Dremio, we’ll see another position rise to prominence this year: the data curator.

“Organizations are now identifying the need for a new role, the data curator, who sits between data engineers and data consumer, and who understands the meaning of the data as well as the technologies that are applied to the data,” Stirman writes. “The data curator is responsible for understanding the types of analysis that need to be performed by different groups across the organization, what datasets are well suited for this work, and the steps involved in taking the data from its raw state to the shape and form needed for the job a data consumer will perform.”

Will 2018 be the year we finally wake up to the costs of dirty data? Jon Lee, co-founder and CEO of ProsperWorks, is betting that it is.

“Whether its erroneous sales forecasts and poor customer relationship management, or the time lost having to manually check and correct inaccurate system information, the cost of dirty data is not cheap,” he writes. “In 2018, companies will begin focusing on ways to ensure data quality and data standardization across their organizations, realizing that clean data powers everything from fiscal year estimates to the machine-learning algorithms powering enterprise software.”

It’s unfortunate, but true: data scientists spend most of their time doing data janitorial work instead of experimenting with algorithms. According to Anand Raman, the big data practice head at Impetus Technologies, retrofitting legacy tools will only exacerbate the problem, which shows that a new approach is required.

“It is imperative to build an intelligent meta data model that works with big data technologies, and development tools like Spark are helping to make it easier to do that,” Raman writes. “Where traditional tools are rule-based, the newer tools provide more intelligent, self-learning data management models that build a richer metadata. These tools can also help achieve data lineage in near real-time while providing new ways to wrangle and cleanse data using data science algorithms.”

Data engineers and data curators will be in-demand jobs in 2018 (Dean Drobot/Shutterstock)

Organizations want to start using AI to improve various aspect of their operations, but the reality is the step is too steep for most. This “organizational deficiency” is preventing widespread adoption of AI, and is something that the folks at Collibra are tracking.

“Organizations are discovering there’s a dark side to AI, despite its many benefits,” Collibra CEO, Felix Van de Maele says. “In 2018, AI will expose data deficiencies and reveal where data processes fall apart. For instance, organizations will struggle to answer simple question such as: Where is the data I need for my AI project? What is the data’s quality – and is it reliable? Who owns the data – and who can fix it? And how do I embed the outputs from AI into day-to-day business operations?”

In the epistemology of digital life, data is the rootstock of information, which ultimately is what we all desire. With that in mind, John Bates, the CEO of TestPlant, makes one particularly bold prediction for 2018: the return of the Chief Information Officer.

“As companies realise that their data (i.e. information) is their most valuable asset, but that it’s now fragmented across functions who’ve all deployed independent SaaS solutions, they will look to the CIO to bring their data together,” he writes. “So the CIO will once again be the centre of information, but whereas before it was about bringing systems together, now it’s going to be about data.”

2018 Predictions: Opening the Big Data Floodgates

Applications: Artificial Intelligence, Data Mining, Enterprise Analytics

Technologies: Frameworks, Middleware

Sectors: Biosciences, Financial Services

Vendors: Collibra, Couchbase, Dremio, Impetus Technologies, MapR, Properworks, Qlik, Testplant

Tags: 2018 predictions, AI, data cleaning, data engineer, data management, machine learning

Only registered users may comment. Register using the form below.

Check off newsletters you would like to receive*
- HPCwire
- EnterpriseTech
- Datanami
- Technology Conferences & Events
- Advanced Computing Job Bank
- Technology Product Showcase
Email*
Name*
First Last
Organization*
Job Function*
Industry*
Country*
City*
State*
Province*
- Please check here to receive valuable email offers from Datanami on behalf of our select partners.

Why 2018 Will Be All About the Data

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

April 26, 2024

April 25, 2024

April 24, 2024

Sponsored Partner Content

Get your Data AI Ready – Celebrate One Year of Deep Dish Data Virtual Series!

Supercharge Your Data Lake with Spark 3.3

Learn How to Build a Custom Chatbot Using a RAG Workflow in Minutes [Hands-on Demo]

Overcome ETL Bottlenecks with Metadata-driven Integration for the AI Era [Free Guide]

Gartner® Hype Cycle™ for Analytics and Business Intelligence 2023

The Art of Mastering Data Quality for AI and Analytics

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Top 6 Strategies for Reducing Data Warehouse Costs

Building an Operational Data Warehouse for Real-time Analytics

Sponsored Multimedia

The Power of DataOps: Bring Automation to Life
No Comments

Tactical Steps for Cloud Migration
No Comments

Immuta Data Access Platform
No Comments

Data Mesh: Fact or Fiction?
No Comments

Contributors

Featured Events

AI & Big Data Expo North America 2024

CDAO Canada Public Sector 2024

AI Hardware & Edge AI Summit Europe

AI Hardware & Edge AI Summit 2024

CDAO Government 2024

Why 2018 Will Be All About the Data

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

April 26, 2024

April 25, 2024

April 24, 2024

Most Read Features

Most Read News In Brief

Most Read This Just In

Sponsored Partner Content

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Sponsored Multimedia

Contributors

Featured Events

Share

Copy short link