Learning from Your Data: Essential Considerations
For any organization undergoing digital transformation, a primary consideration is how to find, capture, manage and analyze big data. They are looking to big data and data science to facilitate the discovery of analytics that will enable informed decision-making. CIOs have a responsibility to provide expertise in the area of analytics, as well as an understanding of how to provide data analytics through data sciences and algorithmic approaches.
However, it can be difficult—to say the least—to know how to work with big data and data sciences to find the needle in the haystack, support mission-focused programs, provide appropriate intelligences; design and implement predictive models, algorithmic approaches and shareable models; and cut costs while producing bottom-line results.
So then, how can executive teams who already feel overburdened implement big data into their workflow, and how can they translate what appears to be hieroglyphics to top-level executives in plain language? More importantly, how can your organization tell which data is good and which is bad – that data that doesn’t provide you with the level of information you need to be successful? How can you implement real-time analytics on streaming data?
Priority One: Know Your Data
Every organization seems to have its own definitions of big data. A common data language will foster the growth of the best ideas shared across diverse internal teams and trusted partners. Taking this first step will determine how an organization will harness the power of advanced analytics and benefit from big data.
Here are some of the prevailing definitions of big data and data sciences:
- An evolving concept that refers to the growth of data and how to curate, manage and process the data.
- A collection of data assets that require new forms of processing to enable enhanced decision making, to extract new insight or new discovery.
- Data sets whose characteristics include veracity, high volume, high velocity and a variety of data structures.
- Data that cannot be processed using standard databases because it is too big, too fast- moving or too complex for traditional data processing tools.
New Tools to Know Your Data
The IoT, machine-to-machine communication, social networks and sensors are all contributing to this proliferation of data. Much of it is unstructured, less ordered and more interrelated than traditional data. What this means is that these new, massive data sets can no longer be easily managed or analyzed with traditional data management tools, methods and infrastructures.
Big data represents a significant paradigm shift in enterprise technology, from security to financial data and from mobile apps to software applications. It’s rapidly changing the traditional data analytics landscape across all industries. To meet these challenges, enterprises have begun implementing big data technologies.
Apache Spark and Storm are among the new tools that have evolved to manage and analyze big data. A viable option may be a suitable architecture designed to complement Spark and Hadoop/NoSQL databases like Cassandra and Hbase, which can use in-memory computing and interactive analytics.
Comparison to the Traditional EDW
A typical enterprise data warehouse (EDW) usually works with abstracted data that has been rolled up into a separate database for specific analytics. EDW databases are based on stable data models. They ingest data from enterprise applications like CRM, ERP and financial systems. Various Extract, Transform, Load (ETL) processes update and maintain these databases incrementally, typically on hourly, weekly and monthly schedules. A standard EDW runs from hundreds of gigabytes to multiple terabytes.
There are drawbacks to working with EDWs, including:
- Lowered response times. The need to run ad-hoc analysis from time to time in addition to regular operational reporting degrades system response times.
- Expensive change. Changes to the system and configuration are expensive due to rigid and inflexible designs.
- Availability and latency issues. Separating the database from operational data sources causes data availability issues. Batch window limitations also add to data latency.
The data deluge has pushed the traditional EDW to the breaking point. Data is coming in all varieties and formats, and new data collection processes are no longer centralized.
It is critical to ask some key questions once organizations begin to realize the amount of big data that has been collected:
- What data is actually relevant, and what isn’t?
- What is the end goal for the data collected?
- How will this data help achieve goals, whether it’s mobile, marketing or sales?
- Is the data at rest or in motion?
In taking these first exploratory steps, your company has an advantage when determining the best fit for extracting and using this data and its place in the overall roadmap.
That’s because a zealous vendor can load you down with millions of dollars in data analysis appliances and software. Consider this, however: if there is a large dataset, how much of it is really relevant to achieving your corporate goals? Of the entire dataset, half may be relevant to run applications, which are transaction-based, and the other half could be co-located on low-latency, low-cost consumer hardware or software that supplies information to researchers or scientists. This level of thinking will give your team a manageable dataset instead of trying to ingest and analyze years and years’ worth of data.
Each Data Platform is Unique
It is critical to understand the different types of data you’re working with – usually a combination of traditional structured data and relatively unstructured big datasets. Once you have sorted out what your data platform should look like, you will have a better understanding of how to manage and analyze the different types of data.
Next, you’ll need to create a data platform that complements your organization’s strengths and your existing technology footprint and uses the most effective tools to meet your data ingest and analysis needs. Typically, this will be a dynamic combination of legacy and new technology, off-the shelf and open source licensing, and static and fluid data access methods.
An End-to-End Perspective
An important point to remember when analyzing data is that there is more than one piece of the data pie. Data analyst professionals must consider, at a minimum, these five aspects of data:
In terms of big data management, think of the data from an end-to-end perspective. Some companies may need to consider all five aspects, while other companies may have everything covered except the democratization or interoperability of the data. In other words, make sure your plan includes all these aspects and determine your strengths and weaknesses before moving forward.
Big Data Solutions: Three Points to Consider
As your teams prepare to capture, control, manage and visualize the big data
that matters most to your organization, these three key elements will help.
Strategize: Do an assessment to determine a strategy that works for your organization before making the move to big data. Consider bringing in a third-party vendor or someone from outside the organization to evaluate your current situation. Through internal support and feedback and external assessments and recommendations, you will be better able to determine where you are and what you need to advance the program.
Get support: Collaborate with stakeholders to establish a clear vision and mission. What are you trying to accomplish? Many organizations are jumping onto the big data bandwagon and ingesting terabytes of data, only to ask the question, “Now what?”
Ensure buy-in from the users by involving the teams that will actually benefit from the information. It will also provide a concise, well-thought-out plan instead of implementing technology just because it is available. Ultimately, if you build a program that doesn’t fit into your existing technology stack or doesn’t provide the information to advance your goals, the entire operation will fail.
Map out a plan: With a plan this is strategic and clear, you and your teams will be able to break down the tactical outcomes. For instance, a 36-month strategic roadmap will give you an opportunity to review and change course if necessary. The resulting outcome every quarter will help you better evaluate and build out your goals.
Reactive and responsive implementations are quite different. Reactive mode can lead to solutions that require constant patching or updating – or worse, trying to fit a new solution into a legacy network. Instead, by being responsive, big data or data sciences implementation can become a swift and smooth process.
Advance with Caution
Big data is a gold mine whose riches are just waiting to be extracted. But not just any tools will do, and would-be “miners” must have a detailed plan. Big data can get organizations into big trouble if they jump in without understanding that not all data is useful or necessary. Data-driven insights are intended to meet your strategic business goals, and the recommendations listed above will help you safely and sanely mine your data for its true value.
About the author: Nageswar “Nick” Yedavalli is Senior Vice President, Big Data and Data Sciences at DMI. Nick has extensive knowledge and experience in big data, big data analytics, business intelligence, data architecture, data governance, data management and IT business management. At DMI, Mr. Yedavalli is leading a big data & data sciences practice, delivering solutions to defense, federal, state and local clients using the latest and emerging technologies and energizing the human capital across IT, business and trusted partners.