Follow Datanami:
February 24, 2021

Prophecy Spins Up Low-Code Data Pipeline Tool

Alex Woodie

(pluie_r-scaled/shutterstock)

In recent years, the shortage of data engineers has at times exceeded the shortage of data scientists. To help close the gap, a Silicon Valley startup called Prophecy today unveiled a low-code data engineering tool that enables users to build Spark-based data pipelines in a drag-and-drop manner.

One of the most popular uses of Spark is for creating ETL data pipelines that move and transform massive amounts of data from sources to destinations for the purposes of machine learning and SQL analytics. However, Spark is a complex framework that is not easy to master, which is why companies like Prophecy are building tools that mask its complexity.

Prophecy’s new SaaS offering functions as low-code tooling layer that runs atop Spark and Airflow. Customers must procure their own Spark and Airflow environments, as well as have access to a Git repository, which is where Prophecy stores the code that it generates.

Automation is the name of the game in Prophecy. The SaaS offering presents a visual editor that allows users to build their data workflows by dragging and dropping icons on a screen. When they’re done building the pipeline, they hit a button, and the underlying Spark cluster (supplied by the customer) automatically generates the Spark code, which is pushed to Git.

A video produced by Prophecy shows how the tool works at a more detailed level. Users start by selecting from among different data sources, such as files, data warehouses, or data catalogs. Supported formats include Parquet, ORC, CSV, JSON, plain text, and Databricks’ Delta format. Using drop-down menus, users can name their source files and define the schema and properties.

Users can then perform functions, such as joins (the software has options for inner joins, outer joins, and the various permutations thereof) as well as aggregations. For example, if a user wants to aggregate by customer name, the software presents option for defining that aggregate expression, using SQL, Python, or Scala.

When the user is finished with her pipeline, she names the output file, specifies whether the pipeline is read-only or can overwrite itself, and presto – she’s presented with a finished data pipeline in her Git repository.

Users do not need to know Spark to use Prophecy, although if they do know it, they can also code directly in the tool. In fact, a nifty feature of the tool is its “two-way” environment that allows users to switch back and forth between the visual editor and the code editor. Any changes made to one editor is immediately reflected in the other, company says.

According to Prophecy CEO Raj Bains, the offering will uplift a wide spectrum of users with different skill levels.

“Visual ETL developers, data analysts, and data scientists can all be very productive with Prophecy,” he tells Datanami. “On the other end of the spectrum we have some Spark Scala developers who use Prophecy to improve productivity.”

Prophecy chose Spark because it’s very powerful, Bains says. “We get SQL with a quality optimizer like a data warehouse,” he says. “But then, we can go outside SQL if needed, to RDDs directly. Cases such as ML [machine learning] cannot be handled in SQL. Our customers also want to incrementally move to streaming, and being able to do batch and streaming using the same APIs is critical. Prophecy workflows can run on both.”

The open source nature of Spark and Airflow, which is used for coordinating data workflows, is also important because it reduces the potential for vendor lock-in, Bains says. “Since we started with selling to Fortune 500 enterprises, we see much higher desire to be based on open source standards rather than proprietary products,” he says.

Prophecy was founded in 2017 in Palo Alto, California, and has attracted $6 million in seed funding so far from SignalFire and an investment from Ross Mason, the founder of Mulesoft. Mason, who is currently at Dig Ventures, is impressed with the company.

“As enterprises of all sizes gear up to manage increasingly complex data coming from the full range of operational systems (such as credit card swipes, ERP systems, airline bookings, IoT) in the cloud, Prophecy helps accelerate the digital transformation of these enterprises. Prophecy’s unique low code approach enables enterprises to succeed with their current workforce.”

Prophecy is a Databricks partner, and the new SaaS offering runs atop Databricks’ hosted Spark environment and supports its Delta Lake environment, which runs in AWS, Microsoft Azure, and Google Cloud. It will also be supported on other Spark environment, including Cloudera and Amazon Web Services Elastic MapReduce (EMR), which is primarily a Spark runtime today.

The new SaaS version of Prophecy’s software is free for up to three users. Beyond that, customers must purchase a subscription, which start at $299 per month. For more info, see www.prophecy.io.

Related Items:

Data Pipeline Automation: The Next Step Forward in DataOps

Can We Stop Doing ETL Yet?

The Real-Time Future of ETL

Datanami