Talend Unveils Free Data Streaming Solution
Talend today launched a data streaming solution based on Apache Beam that is free of charge for users. The company says Talend Data Streams makes it easy for non-technical users to build data pipelines in the AWS cloud that tap into a variety of data assets, including cloud data stores, relational databases, and even Kafka topics.
Data pipelines are emerging as a favored method for data scientists and analysts to manage and manipulate data in complex environments that involve a variety of data stores, integration methodologies, and processing needs. For Talend, which made its reputation in the batch-oriented ETL landscape, the shift to real-time data streaming is a natural move.
To that end, Talend says its new Data Streams product makes it easy to build pipelines that can transform, enrich, and integrate cloud-based data in either batch or streaming modes. The software features a GUI interface that lets users select from available data sources and guides them through the process of transforming and enriching that data through filters and aggregations at specific intervals.
Data Streams addresses the ingestion and processing challenges of data pipelines. The company’s GUI builder lets users select data sources and destinations. It also includes light-weight processing capabilities and extensions that let users write custom transformations in Python.
There are two main use cases for Data Streams, says Ciaran Dynes, SVP of Products at Talend. “First, for ad hoc integrators like data scientists and advanced analysts, Talend Data Streams will help them ingest & process data so they can use it for advanced analytics, modeling/machine learning discovery, etc,” he tells Datanami.
“The second major use case is for typical IT and data engineers who will benefit from Talend Data Streams’ ability to simplify complex problems and operationalize other’s work much faster, build pipelines with data that’s been validated and prepared by data experts,” he says.
Talend says that basing the software on Apache Beam, the open source version of Google‘s Dataflow streaming data framework, makes Data Streams extensible, flexible, and able to run on different cloud environments. The choice of Beam is not surprising, considering that Talend software architect Jean-Baptiste Onofré was instrumental in the development of Apache Beam.
“Since Apache Beam is an application layer, customers can connect to Spark, for example, for more micro-batch processing, or to Flink, where advanced streaming is more needed,” Dynes says. “Having Talend Data Streams based on Beam helps IT teams build an agile and flexible architecture that enables different use cases across different business needs.”
Initially, Talend will enable Spark with EMR and Google Dataflow as initial runners, Dynes says. “But in the future we’ll be adding more runners to the cloud under the Apache Beam framework,” he says. “We will enable Beam through a local, single node Spark engine within the AMI version of the application.”
Talend Data Streams is free for single users on the Amazon cloud. There are obviously limitations on the usefulness of such a product, and so it will be no surprise to hear that Talend has plans to offer an enterprise version later this year.
Dynes says: “When Talend Data Streams joins the Talend Data Fabric platform later this year, customers will benefit from the ability for business and IT can collaborate on data integration and governance tasks, easily share work between apps, orchestrate stewardship tasks, increase confidence in their data and put more data to work, faster.”
Talend Cloud Data Streams is the enterprise/commercial version of the product, which will be released in the second half of 2018 and hosted in the Talend Cloud. We’re working on exploring the inclusion of an on-premises version of the solution in future releases to enable more customers.
Talend (NASDAQ: TLND) made the announcement today at Talend Connect US, its annual user conference that’s taking place in New York.