Which Came First, Data or Applications?
There has been an ongoing debate in the world for many years: “Which came first, the chicken or the egg?” In the technology industry, a similar type of debate has been going on for more than 20 years: “Which came first, data or applications?”
For the last couple years, the restrictions of rigid database schemas requirements and the cost of developing ETL processes to make the data more usable in a data warehouse, led to one answer: the application, and more importantly, the database came first. In most cases, actual data was just an afterthought. People who needed the data across business sectors just got what was left over after the data was normalized, schematized and cleansed. This provided significantly less value from the data and put business in a reactive mode to data, with the past driving the discussion.
However, in the last five years, this has become less and less the case. The data from applications, systems, and networks can be written in any format and structured as needed. Then, the new data technologies deal with it from there.
These new technologies such as Cassandra or Hadoop have changed the natural order of things by enabling more people to access a richer data set. This is fundamentally changing the way we do business. The handcuffs of rigid schema have been shed, and the data warehouses are taking new shape much to the disappointment of legacy database vendors. Instead of a database schema or a team of SQL developers dictating what formats businesses could collect and store data, the scale has now shifted and that concept is flipped on its head. Business analysts can now work with developers and third party resources to collect data from internal, external, social, mobile and emerging sources to create a true picture of their business.
This ease of collection, coupled with tooling to allow for real-time search and dashboards has changed the way people interact with data. The discovery possible in the nuggets of data that were inaccessible or ETL’d out of the stream are now able to help companies provide better customer service and increase their profit margin.
By using these tools, companies can remove the heavy costs of the legacy data warehouse and ETL systems. They can replace legacy systems with a high performance OLTP system like Cassandra and batch oriented analytic systems like Hadoop. The added ability to replace SQL developers with developer tools (API) and an easy search UI puts the analyst or developer in the driver’s seat and allows for a true exploration warehouse approach. For many years, this was the dream system of many a legacy vendors and DBA. The constraints on the relational database world would never allow this to come to pass and hence the reason that trailblazers like Google, Yahoo, and Amazon planted the seeds for these new systems. Thanks to papers from Google on Big Tables, there are many of the new data solutions today.
These data solutions provide another key advantage. The shift in thinking that commodity hardware is now a completely reliable storage layer that protects data within data centers without the expense of high-end SAN solutions. This has significantly changed the way that people spend on infrastructure and has also freed up IT budgets to switch to the new innovative solutions and away from site-site replicated SAN. Now these two shifts in technology are causing companies to have more money to spend on data acquisition, data services, data retention and analysts and less on legacy hardware and software. While examining these shifts in detail, it will explain many of the recent market trends and change in legacy IT vendor behavior.
This ability to store data in a real-time index, OLTP system, or distributed file-system, and in all of these cases not need a rigid schema or a team of SQL developers, is what is changing the paradigm so rapidly over the last few years. Now the business, technical or other leader can allow developers to understand what data he wants and how he wants to see it. Then they can spec an app and a data store based on the requirements. The developers are free to build the app to the specifications of the user and not to the specifications of the database that backs it.
Five years from now, people will be saying, “of course the data comes first.” Data is changing the world, and will continue to change the world. Even more so, relational databases will only be discussed when a system requires them for compliance reasons.
About the Author:
Eddie Satterly is the Chief Big Data Evangelist at Splunk, and has served in a variety of roles, including developer, engineer, architect and CTO over his 23 year career. Eddie has worked with Splunk since June 2012. He previously worked at Expedia, CMI, Enterprise Architects, Onstart and HP. Most notably, he used to work at Expedia where he was in charge of implementing Splunk for the company. He played a key role in big data adoption at his former employers, and currently plays a key role with Splunk in adoption of the product as well as in partnerships with the big data community.