Syncsort Tacks to Address 4 ‘Megatrends’
Syncsort’s ship got bigger last year when it acquired Vision Solutions, a provider of data availability and security tools. Now the New York software company is bringing that cargo to bear for its next voyage: helping customers cope with the data management ramifications of four converging “megatrends.”
Syncsort CTO Tendü Yoğurtçu shared some of her company’s plans with Datanami at the recent Strata Data Conference in San Jose, California. First and foremost, the company is looking to chart a reasonable course for its customers through the confluence of four industry megatrends, which include cloud computing, stream data processing, data governance, and data science.
“All of our investments fall…under those four megatrends,” she said. “We are seeing very intra-connected trends with cloud, streaming, data science, and also with data governance. It’s becoming more and more complicated.”
Many of Syncsort’s customers have multiple data lake implementations, ostensibly Hadoop for on-premise storage clusters. Yoğurtçu tells us many of these customers also have one or two data lakes in the cloud, which ostensibly use object stores. The company’s historical focus has been providing ETL tools for IBM mainframes, and with Vision Solutions, it now has thousands of customers who run critical business apps on the IBM i server.
The complexity of managing all of these diverse data stores – including making sure the data is well-integrated, well-governed, well-secured, de-duplicated, high quality, and highly available – is a potentially overwhelming task with multiple pitfalls that threaten to throw a company’s data strategy off course. Addressing that business risk for some of the world’s biggest companies represents a business opportunity for Syncsort and its collection of data management tools.
Syncsort is in the process of lining up its various software assets to deal with each of the challenges described above. The company already has a well-respected ETL engine in DMX-h, which can do the heavy-duty data lifting in a Hadoop cluster with Spark or MapReduce engines, or alternatively run on the Amazon cloud, or even Splunk clusters (with integration with Elastic expected later this week).
Its 2016 acquisition of Trillium gives Syncsort an entry into the data quality business. At the end of 2017, the company integrated the Trillium software with DMX-h, and can now use that product’s “intelligent execution” capability (the one that lets it use MapReduce, Spark, or maybe even Tensorflow in the future) as the underlying engine powering the product, which is formally called Trillium Quality for Big Data.
“We invested enough in that intelligent execution under the ETL product to make sure that we can acutely handle multiple compute framework— Spark, MapReduce, standalone – without making design-time decision,” Yoğurtçu says. “At runtime customers are able to adjust across multiple compute frameworks.”
On the governance front, Syncsort is partnering with Hadoop distributors Cloudera and Hortonworks to make sure its customers and its products work with their flagship governance tools, Cloudera Navigator and Apache Atlas, respectively.
Two weeks ago Syncsort announced an expansion of the work with Cloudera to provide field-level insight into all the transformations that occur to a given piece of data is it travels from its source to the data lake. In addition to giving customers access to that detailed data lineage information via Cloudera Navigator, it’s also accessible via a REST API.
“So if a customer already has something in place and they’re transitioning to a Hadoop-based metadata management solution or metadata repository, they can load that data and use our REST API to get that current visibility,” Yoğurtçu says.
The company also has partnerships in place with Collibra for data cataloging and with mainframe utilities provider ASG Technologies for other pieces of the governance solution. It also has plans to take encryption capabilities that it obtained in its merger with Vision Solutions to its broader customer base. Namely, it’s looking to bring Vision’s encryption technology into its data integration and quality offerings, Yoğurtçu says. The company is also keen to ramp up awareness of Vision’s existing database replication software, which delivers change data capture (CDC) capabilities for relational databases, as well.
“Security, data quality, and lineage are the three areas under data governance we are very focused on,” Yoğurtçu says. “Regulatory compliance and the GDPR also are driving a lot of this in Europe and for global enterprises.”
While Syncsort doesn’t have a direct play on the data science side, it definitely sees a role for itself in the upstream systems that data scientists rely on to deliver data in a reliable, accurate, and secure manner. In that regard, the company is keeping an eye on Kafka, which has become the standard message que for many enterprises.
“Data has to be prepared, data has to be cleansed, matched and ready for artificial intelligence and machine learning and deep learning,” Yoğurtçu says. “We are doing a lot in that pipeline, preparing and cleansing and matching the data.”
Bringing all these pieces together and hooking it all up so it works together is no easy task. “You have on-premise. You have legacy systems, mainframe and IBM i. You have data lake clusters on premise and in the cloud,” Yoğurtçu says. “Now publishing that visibility at the field level with everything that happens with the data as it moves or copied – it’s really a very strong value proposition.”
The company is currently developing new software that will add data profiling capabilities to the Trillium data quality offering. The new software will used supervised machine learning techniques in conjunction with an experienced human analyst with the goal of getting the software to learn how to execute complex business rules for the cleansing of data.
It’s not about fully automating the data cleansing job, but making data scientists’ lives easier by making good recommendations on the business rules, Yoğurtçu says. That software is currently being incubated, with a possible delivery date at the end of the year.
In the meantime, the company is getting ready for a big branding overhaul in late May. “We have been talking about ‘big iron to big data’ for a while now,” Yoğurtçu says. “But ultimately we are advancing the data for customers and making the connections for the next generation of analytics environments.”