Follow Datanami:
April 1, 2017

Apache Software Group Announces “Slug” for Line-by-Line Data Processing

NEW YORK, N.Y., April 1, 2017 — The Apache Software Group has revealed its newest open-source project, Apache Slug. Apache Slug is a complete view into the governance and documentation process for the open-source software stack. The technology, built from the ground up on open-source tools, allows companies to simplify the transparency of analytics projects. Slug claims to be a platform to “allow modern organizations to see real-time, atomistic feedback on a variety of database and analytic operations.” Slug does this with continuous monitoring of microservices which provide real-time feedback on the inner workings of data movement, transformations and advanced algorithmic computations. For example, in a Slug-enabled environment, a new dataset generated using MapReduce on Hadoop or Scala on a Spark data frame will now give users the capability of executing this code in microbatches, even at the row-level. Slug will return each microbatch of information along with complete diagnostics followed by a prompt to the user to then move to the next step.

IT departments would seem to embrace Apache Slug as they can have complete transparency into the process by viewing each element of transformation that is being created. Apache Slug integrates with other Apache security related projects, namely Sentry and Ranger. End-users such as analysts and data science coders now have the flexibility to monitor each step in the process down to how each line of code modifies each record. The new UI for Slug returns those microbatches for visual confirmation. Apache Slug is packaged with a full-set of RESTful API calls which integrate with other Apache projects such as Apache Ambari.

The open-source community is already buzzing about Apache Slug. Florian Douetteau, CEO of Dataiku, a platform which incorporates many of the Apache projects into its Data Science Studio platform said, “Apache Slug is a non-trivial leap forward in the governance capabilities of the large organization. By allowing each line of code to be run step by step, total transparency can be brought to the process.”

Analysts at major research firm said, “We have had transparency like Slug in the single-user and non-parallel processing world for some time. Now that this microbatch, even row-level processing is available in an MPP format. Its potentially game changing technology.”

Following the open-source adoption and success of Hadoop, Spark, Pig, Hive and Impala, Apache Slug appears poised to challenge the proprietary vendors who offer line-by-line processing and monitoring. The Apache Slug technology appears to meet a previously unmet need in the open-source stack and is sure to be a main player for years to come.

Download Apache Slug over here, sign-up for some training, and get in touch with us at [email protected] with any feedback or if you’re interested in participating in the project.


Source: Apache Software Group

Datanami