Don’t Make These Data Science Mistakes in IoT
Data science is tough enough already. Whether you’re looking to act upon data collected from IoT sensors or human generators, don’t make it harder than it has to be by making these three common data science mistakes.
Failure, unfortunately, is not unusual when it comes to big data and data science — and it’s even more troublesome when dealing with large amounts of sensor data from the Internet of Things. When you consider the number of organizations with data science practices versus those that are getting a positive return on investment, it’s clear that many (if not most) organizations struggle to bring it all together before finding repeatable recipes for monetizing data.
Vamsi Sistla, the senior director of product innovation for machine learning, IoT and architecture at BSQUARE, has seen his share of data science failures. Over its 20-year history, the publically traded company from Seattle, Washington has served 1,100 customers, which includes the delivery of industrial engineering and data analytics software and services for Fortune 500 firms.
Sistla recently shared with Datanami his insight into common data science mistakes that are made with big data and IoT use cases.
Mistake No. 1: Not Thinking Strategically with Data
The first mistake that organizations commonly make with data science projects is not sufficiently understanding the use case, Sistla says.
“Most companies approach IoT or data science as a technology project,” he says. “It’s really not a technology project. It’s a strategic initiative for businesses.”
Many organizations start their data science projects by collecting lots of data, and then hope that applying some machine learning to the data will magically show them the way. According to Sistla, organizations are much better off starting out with a more targeted approach.
“Having a clear businesses objective and planning it is one of the biggest challenges,” he says. “Clearly defining the business use case — that’s number one.”
The best data science projects start small with a discovery project, or a pilot project, Sistla says. For example, maybe a company has a manufacturing plant with large pieces of equipment that need to be brought down every few days, hurting productivity. Or maybe the company wants to reduce scrap or waste. These are good use cases for seeing how data analysis can yield insights that give a return on investment.
Part of thinking strategically is bringing the right people to the table. While the IT team definitely plays a major role in data science initiatives, organizations should include all relevant stakeholders in discussions early on in the game, Sistla says.
“You have to merge cross-function teams, bringing your IT as well as the operating team to bear,” he says. “And you also have to make sure that management fully endorses the objective and initiative.”
Mistake No 2: Not Collecting the Right Data
Once a company has identified the initial use case, then they can set out to collect the appropriate data. This is another area where companies often make mistakes, Sistla says.
“What companies don’t realize is how much data they need to collect, and what type of data they need to collect,” he says.
For example, in an industrial setting, it’s important to collect data at the right intervals. It may be useless to collect data from temperatures sensors on a one-second interval if the temperature readings don’t vary significantly during that time, Sistla says.
It’s important to match the data collection process with the patterns and anomalies that are likely to appear, because those are the data that the company will uses to build their models. Sistla points out that, if the company isn’t collecting data with timestamps, it may be impossible to determine what the appropriate data-collection interval is.
“You need to understand what operational parameters that you need to see if you can understand the historical nature of these problems and capture those event data,” he says. “Then once you’re done, then with the help of your data science team and IoT team, you could build a model with a pilot project, and test it out. Through that process, you’re able to actually benefit from the intelligence that you built.”
Mistake No. 3: Not Following Your Data
Once companies have their first data science win under the belt, they’re eager to take the next step. But all too often, that next step can lead them into another common data science mistake: not following where the data leads them.
While organizations should start their IoT data science initiatives in a targeted manner, they should also be free enough to experiment with data and follow their IoT data’s lead, Sistla says.
“Data science is not like your typical software project,” he says. “It’s an adaptive software development process. You are adapting technology, your models, your frameworks based on the result that you are seeing. So there’s a bit of experimentation and ideation and exploration that goes on. That’s something that most companies don’t realize — that it’s not very well-defined.”
This may sound like a contradiction of rule number one, which is to think and act in a targeted and ROI-oriented manner. The key difference, however, is understanding that organizations are usually better off starting their data science journey from a known location and then broadening it out from there, rather than beginning from a broad point and then narrowing it down.
Once a company starts down the data science journey, they must open themselves to changing legacy business processes. It’s pointless to spend the time and money to collect relevant datasets and identify potentially useful patterns or anomalies if they’re not willing to make the changes to monetize those insights, Sistla says.
“Oftentimes when you’re dealing with mass quantities of data and very complex process manufacturing, the challenges are a lot more complex,” he says. “That’s where it’s so critical in data science to learn from it. Oftentimes they’ll end up getting new insights they haven’t even thought about up to that point.”
Sistla points out the high rate of failure that Gartner has identified for big data projects. Companies often end up storing massive amounts of data, but they fail to make adequate use of the data in a timely fashion.
“After all this data is collected, you want to quickly put something in place to make use of data, to extract actionable knowledge and intelligence from it,” he says. “Maybe you even write some rules or some orchestration workflows that say, any time this particular event happens, go ahead and send this information to this ERP system or send it to this SMS or mobile phones or a dashboard with the critical errors.”
Planning it all the way through becomes very critical. “It’s a full contact sport, in my view, because it’s not just putting the IoT sensor itself in,” Sistla continues. “There’s so much more that has to go in to actually benefit from the IoT investment that you have made.”