Why Big Data Isn’t Changing Everything (At Least Not Yet)
Think everything is different in this new big data world, that the world of data lakes, mobile devices, predictive analytics, and the cloud are rewriting the rules of business technology? Then think again, says Tony Fisher, who heads up Progress Software’s data integration business. While these technologies are absolutely allowing organizations to do new and creative things, they’re not impacting existing systems and older business models as much as you might think.
In an industry that’s rife with outrageous claims, it’s refreshing to hear somebody like Fisher tell it like he sees it. Instead of pitching how yesterday’s technology can only support yesterday’s business models, and how only the organizations who adopt the latest technologies will survive, Fisher takes a pragmatic approach that leans heavily on the experiences of his customers.
Take data lakes, for example. There are some outfits who espouse the elimination of older technologies in favor of massive data lakes, often running Hadoop and sometimes running in the cloud. While the data lake premise sounds great theoretically, that pig just doesn’t fly in Fisher’s world.
“You can’t get rid of the old stuff. You just can’t,” Fisher tells Datanami. “Those people who say you need to get rid of everything whole hog are probably not people who are actually in the real world doing the work.”
Progress Software has a number of businesses, including Fisher’s DataDirect division in Raleigh, North Carolina, where it develops tools that allow organizations to connect to, integrate, and cleanse their data for transactional and analytical purposes. Fisher oversees a number of tools in the DataDirect collection that move data into all manner of big data repositories, like Hadoop, Hive, HBase, Spark, and MongoDB; relational databases like Oracle, DB2, and SQL Server; and cloud stores like Salesforce.com and Marketo. Progress also sells a data preparation tool called Easyl that helps users cleanse their data prior to analysis.
Maintaining a Legacy
Fisher hears what his customers say about the challenges of big data. But the way he sees it, the big data explosion and advent of cloud-based software as a service (SaaS) offerings isn’t replacing “legacy” technology like the DataDirect tools. In fact, it’s making it even more necessary.
“There’s a lot of newer things happing in the market that require some of these newer technologies,” he says. “If you look at massive amount of social data, the massive amount of IoT data–these are different. You cannot support them with the old standard technology.”
However, when it comes to existing systems such as CRM and ERP applications, those aren’t going to be running on Hadoop or NoSQL-powered data lakes any time soon. “You can’t just say, there’s heck of a lot of data and therefore everything’s going to get treated in this new big data way. It just doesn’t work that way,” Fisher says. “Just because there are big data stores out there, and just because there are data lakes out there, doesn’t mean you can get rid of your ERP or CRM system.”
That’s not to say that Fisher doesn’t see exciting stuff happening in the world of big data. Only recently have shrink-wrapped predictive analytic platforms, running both on-prem and in the cloud, given organizations the capability to mine their data for insights without requiring huge upfront investments in data scientists, statistical software packages, and data storage infrastructure.
“Things like machine learning and the ability to do predictive analytics is now available to everybody. It’s cheap and prevalent, so anybody can get access to it,” Fisher says. “That was not true just three to four years ago….The new technologies are making things available to more people and making new things available to more people. But they’re not displacing a lot of the stuff we’ve historically done….This whole data integration and data connectivity business will be here for a long, long time.”
Data Integration Challenges
Instead of throwing all your structured, semi-structured, and unstructured data into a single massive data lake and organizing your analysis from there, Fisher foresees a much more jumbled world, where data sits tight where it is until you call on it to do something or go somewhere.
In support of that vision, Progress is investing heavily into two areas, including data self-service and data virtualization. The company, which is headquartered in Bedford, Massachusetts, is a strong believer in the HTAP (hybrid transactional/analytical processing) style of data storage and processing that.
“A lot of the data virtualization technology we’re building today has to do with leaving the data in situ and processing it when it’s required,” Fisher says. “That means I’m not putting data into a data lake.”
Making this data accessible in a self-service manner will be critical as IT resources get stretched thing. Progress foresees being able to allow users to see the data resources that an organization has–including the relationships it has to internal and external data sources–and then access it as needed. “This whole idea of virtualization of data and self service are things that are going to be very big,” Fisher says. “Until we get to the point where we have a HANA environment everywhere, where it’s one big shining data store for everything.”
But we’re not going to be there for a while, he says. “At some point, we may have the processing power and backplanes and clusters that that are fast enough to say, OK I will leave the resolution and the data quality issues to extract time and data usage time, and provide the context when the user is actually extracting the data,” he says. “But we’re just not there yet. Even with all the advances that we’ve made in technology.”
Until then, Progress Software will continue selling the tools that stitch its customers’ old technologies up to the new technologies, because that’s going to be a good business for a long time yet.