Follow Datanami:
November 3, 2021

Finding the Needle in Your Data Haystack

Lance Walter


As more and more of everyday life is lived online, the amount of data generated has grown exponentially. While technologies for storing and analyzing data have evolved, companies’ overall approach to harnessing big data to solve business problems has largely remained the same. This does not necessarily mean organizations need to throw out the tools they are used to and start from scratch. What is really needed is a shift in how IT leaders approach the data, starting by asking the right questions to solve business problems and drive value for customers. Without this change in perspective, businesses are doomed to repeat the mistakes of data eras past.

Each evolution of the data landscape – from the introduction of the data warehouse to the Big Data boom – has focused on finding a depository big enough to dump all available data into. From there, enterprises have attempted to dig through that entire haystack of data in hopes of finding the needle that will both reveal a yet-undefined business problem and identify the solution.

This process is not only time- and resource- consuming, but it’s also clearly not working. Gartner estimates about 85% of projects with this approach fail. Yet businesses continue to operate this way because it is what they know, and they’ve already invested too much to abandon the technologies and processes they’ve adopted up until now.

Repeating History

Traditional data warehouses have proved ill-suited for today’s big data challenges

To understand why Big Data initiatives have gotten such a black eye over the last few years, we need only look at how Data Warehousing 1.0 turned out. The general approach was to stick every scrap of data into the warehouse and figure out what to actually do with it later. Data professionals just assumed that the bigger and more comprehensive the haystack, the more likely it would be that they would magically find the right needle in it eventually. If you’re trying to understand your customer with a “Customer 360” application, wouldn’t you want to dump every bit and byte of sensor data related to that customer into one place? As it turns out, probably not.

In the early days of data warehousing, companies went all in and invested tons of money in adopting this new architecture and suddenly “data haystacks covered the open plains of this new frontier,” but things soon turned sour when it proved a lot harder to find an undefined needle than anticipated. Data warehouse managers went from being the irreplaceable rockstars of the IT team to being the job nobody wanted as it became increasingly clear that the warehouse wasn’t delivering its expected value, despite still requiring real work to maintain.

The problem wasn’t the concept of data warehousing. Take the success of Snowflake for example, which is successfully warehousing critical enterprise data for more than 3,500 companies in the cloud. The real problem was that organizations were so focused on data management that they lost sight of what the value data would actually bring. This created a disconnect with end users, who were more interested in how to learn more from the data than they already knew and answer questions that they couldn’t always see than in simply sticking a bunch of data in one place.

Big Data continues to repeat the same mistakes, just with more data and different technologies. For example, a 747 aircraft generates approximately 4TB of sensor data from the landing gear during every single landing. While platforms like Hadoop can handle data volumes like this far more elegantly than traditional data warehouse platforms, that’s still not enough to guarantee that analyzing that data will improve the experience of your passengers, the maintenance of your planes, or the efficiency of your operations. It’s surprising the data industry hasn’t learned better by now. Correcting course will require a shift in focus to starting the process with a clearly defined business challenge and then harnessing the right data to find answers.

Putting the Problem First

Today’s IT departments are expected to move at break-neck speeds, but if companies are really going to put data value front and center, IT leaders need to first make sure their approach is aligned with the business users’ definitions of success. Starting each project by asking what output is needed to make it worth the time and resources will give the project direction. This responsibility is especially important for IT leaders to prioritize as they work to control costs because most outside software vendors surely won’t be the ones to make sure business needs are clear before taking the money.

Knowledge graphs stress the inter-connectedness of individaul data points (Shutterstock)

Defining the questions and problems to be solved by data can sometimes be impeded by our own imaginations. Part of this perspective shift from data volume to data value will require IT leaders to get out of their own heads about what is possible. Teams will sometimes avoid asking certain business questions when they believe, perhaps by previous experience, that their traditional data platform can’t find an answer. Instead, they need to get used to ideating around the problems they want to solve without being hamstrung by the known limitations of current technologies. Much like “when there’s a will, there’s a way,” when we state a complex question, there is likely an opportunity to cobble together different technologies and platforms to answer it in a creative way.

The technology market has evolved to the point where there are a plethora of free cloud offerings to test out on the cheap to meet these unique needs. Historically, you’d have to deal directly with a vendor salesperson, get a trial license, and potentially spend six figures to get a pilot off the ground. Now you can easily take a credit card to hundreds of provider websites and just test out different data services at will. Users can prove a concept or just get familiar with a technology in a friction-free way before making big investments of time or money. This provides a lot more flexibility to experiment – as long as they avoid the trap of falling in love with any one technology before actually deciding what business problems they are trying to solve. The playgrounds are out there, but buyers should only play around with options when they have a clear direction.

Another hurdle to defining business questions may be that teams don’t know what they don’t know. Sometimes there are questions hiding in blind spots. Technology that reveals the relationships between data points, such as knowledge graphs, can help uncover unknown business problems as well as be part of finding data-driven solutions. The relationships within data were always there, but traditional platforms weren’t equipped to find them along with all of the other tasks they were completing. New insights are available when relationships are evident, and that understanding of data relationships is native to graph databases. Additionally, the graph model is very intuitive even for non-technical business users, making the eventual answers to business problems much more digestible for everyone involved regardless of data expertise.

The Final Touch: The Right Team

Once businesses know what questions they will be approaching the data with, the final necessity is assembling the right team to find and execute on the answers. Value can only come from the data if insights are used to further business objectives and the company’s bottom line. This success will require people with the right technical expertise, acute understanding of the business problem, and executive sponsorship to make things actually happen on a meaningful timeline. (It also helps to be partnered with a vendor that is focused on that team’s success rather than just collecting a logo for their customer page.) When a team is missing even one of these key ingredients, the project will never be able to reach full data value.

With the right team, the right questions and the right data, businesses can finally find success in their big data project. And this success won’t require any major upset to their technology budget – only a shake-up of the way they approach their data. With data assets growing rapidly, businesses can no longer afford to spend so much time and manpower on sifting through for answers blindly. The prep work of clearly defining a business challenge to be solved and then purposefully mapping the data toward a solution is like taking a metal detector to the massive haystack of data rather than sifting through by hand.

About the Author: Lance Walter is the Chief Marketing Officer at Neo4j and has more than two decades of enterprise product management and marketing experience. Lance started his career in technical roles at Oracle supporting enterprise relational database deployments. Since then, Lance has worked at industry leaders like Siebel Systems and Business Objects, as well as successful startups including Onlink (acquired by Siebel Systems), Pentaho (acquired by Hitachi Data Systems), Aria Systems and Capriza.

Related Items:

Why Young Developers Don’t Get Knowledge Graphs

Neo4j Sees Graph Data Science Taking Off Following $325 Million Round

Big Data, Big Problems? Responsible Data Management in 2019