Follow Datanami:
April 6, 2021

Data Engineering Cloud Launched by Trifacta

Alex Woodie

Trifacta today launched what might be the world’s first cloud designed specifically for data engineering. Running on AWS, Microsoft Azure, and Google Cloud, the new Trifacta Data Engineering Cloud provides a place for data engineers to improve the quality of data before its sent to downstream analytics, BI, and machine learning systems.

Trifacta is no newbie on the cloud. In fact, the data quality company has enjoyed an OEM partnership with Google Cloud for years, and its solutions have been widely deployed on all major clouds.

But with the launch of the Trifacta Data Engineering Cloud, the San Francisco-based company is taking the cloud into its own hands, so to speak, and giving data engineers and other data professionals a platform to perform a range of data preparation and quality-improvement tasks.

The new cloud offering is designed to facilitate data prep tasks in an open and collaborative environment. Trifacta is embracing modern DevOps processes and an open tooling environment to enable data engineers to improve the quality of raw data using a range of tools and technologies, the company says.

To that end, Trifacta’s cloud supports traditional ETL workflows, where data transformation processing takes place in an intermediate stage, as well as ELT, in which the data warehouse powers the transformation (in addition to the analytic workload). Transformation scripts developed in the Trifacta cloud environment can be materialized as SQL, Spark, Dataflow/Beam, or Python code.

It’s all about being flexible to adapt to customers’ varied data quality requirements, says Trifacta CEO Adam Wilson.

“Trifacta is addressing the needs of modern data workers by providing a collaborative, cloud environment where users of all skill levels can come together to improve data quality and streamline data operations as they on-board, assess, and refine raw data,” Wilson says in a press release. “Accelerating data preparation and democratizing ETL for these users and their cloud data warehousing projects requires an enterprise-grade data engineering platform that is open, intelligent, and self-service.”

As users interact with Trifacta’s user interface, ML algorithms observe the changes that users make to data, which are used for future recommendations of data transformations

In a separate announcement today, the company boasts a collection of 180 connectors, enabling users to move data into the Data Engineering Cloud from a wide variety of sources. That includes the regular suspects, like databases, Hadoop clusters, and cloud data warehouses, but also other systems with specific requirements, like Jira repositories, Excel spreadsheets, and even Google Analytics.

“At Trifacta, we realize the importance of providing broad connectivity in a secure, self-serve, and scalable manner,” Trifacta CTO and co-founder Sean Kandel says in a press release. “Universal data connectivity helps our customers address their requirements by enabling them to connect to different sources with ease, and at the speed and scale they require for their businesses.”

The new cloud offering builds on Trifacta’s flagship tool, which uses machine learning technology and other techniques to detect the underlying format of unstructured and semi-structured data and automatically develop transformation logic, all in the name of accelerating the data quality workflows that are critical to scaling advanced analytics and AI projects.

The new cloud bolsters that core capability with a new visual “guide and decide” interface that Trifacta says will make resolving transformation issues easy for all users, regardless of their technical skills. The offering also provides “active data quality” profiles that allow users to more easily discover and validate data quality issues, the company says. Users can pick and choose from a variety of pre-built features macros, shareable data flows, recipes, and templates to accelerate their data prep tasks. Finally, the new cloud offering provides a “smart data pipeline” that helps users to model data flows while managing the relationships across data sets and recipes.

Trifacta is using a usage-based pricing scheme to charge for the Data Engineer Cloud. Users can start using it without paying anything, and then pay as they ramp up usage. The company is hosting its Wrangle Summit 2021 this week, from April 7 to 9. You can find more information on that event and register here.

Related Items:

Running Sideline to Sideline with Big Data

Can We Stop Doing ETL Yet?

 

 

Datanami