March 30, 2016

Architecting Immediacy-The Design of a High-Performance Portable Wrangling Engine

Seshadri Mahalingam

At Strata + Hadoop World San Jose this week, I will present with my fellow Trifacta colleague, co-founder Joe Hellerstein, a session entitled “Architecting immediacy: The design of a high-performance, portable wrangling engine.”

A big part of our session will be discussing our new Photon Compute Framework, an enhancement at the core of Trifacta’s data wrangling interface. Photon is specifically architected to provide Trifacta’s users with a richly interactive and intelligent data wrangling experience on large, in-memory data sets.

Why is this a big deal?

Today, we’re used to receiving feedback fast and at Trifacta, we believe data wrangling shouldn’t be any different – performance is essential to the user experience we are pioneering.

Trifacta delivers immediate feedback and intelligent suggestions every time you interact with it. For data scientists and analysts, users are never removed from the flow of their work or forced to wait for processing to complete. Photon’s in-memory engine allows users to interactively wrangle data volumes orders of magnitude more than was previously possible, with rich visualizations to assess data quickly. We engineered Photon for speed, with critical in-memory performance features for modern architectures including multi-threaded parallelism, columnar compression, pipelined data processing and the ability to leverage LLVM for compilation. Yet it only requires a minimal memory footprint.

The improvements offered by Photon are also a step forward for high-performance interoperability. As part of Photon’s development, Trifacta has been collaborating on the design of Apache Arrow with leading open-source organizations including Cloudera, Databricks, Twitter, MapR and Dremio. Arrow is an open-source representation for high-performance compute frameworks to interchange data in memory at the full speed of modern processors. In addition, Photon snaps into Trifacta’s Intelligent Execution architecture to run side-by-side with more resource-intensive distributed computing frameworks like Spark and MapReduce which Trifacta supports for big data processing.

Want to learn more?

Trifacta will unveil Photon at Strata + Hadoop World in San Jose. The product will be launched at the “Architecting immediacy,” session, during which Joe and I will discuss Photon in greater depth. The talk will highlight pain points endemic to data wrangling, including heavy string manipulation, data profiling and second-order transformations, and will demonstrate how we designed it for a fluid, immersive data wrangling experience.

The session is Thursday at 1:50. For more information, click here.

About the author: Seshadri Mahalingam is a senior software engineer at Trifacta and has been with the company since January 2013. In addition to building out wrangle, Trifacta’s domain-specific language for expressing data transformation, he develops the low-latency compute framework that powers Trifacta’s fluid & immersive data wrangling experience. Seshadri holds a B.S. in EECSfrom U.C. Berkeley, where he co-taught a class on open-source software.

Applications: Enterprise Analytics, Predictive Analytics

Technologies: Frameworks, Middleware

Sectors: Other

Vendors: Trifacta

Tags: data wrangling, Strata, Trifacta

Only registered users may comment. Register using the form below.

Check off newsletters you would like to receive*
- HPCwire
- EnterpriseTech
- Datanami
- Technology Conferences & Events
- Advanced Computing Job Bank
- Technology Product Showcase
Email*
Name*
First Last
Organization*
Job Function*
Industry*
Country*
City*
State*
Province*
- Please check here to receive valuable email offers from Datanami on behalf of our select partners.

Architecting Immediacy-The Design of a High-Performance Portable Wrangling Engine

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

April 26, 2024

April 25, 2024

April 24, 2024

Sponsored Partner Content

Get your Data AI Ready – Celebrate One Year of Deep Dish Data Virtual Series!

Supercharge Your Data Lake with Spark 3.3

Learn How to Build a Custom Chatbot Using a RAG Workflow in Minutes [Hands-on Demo]

Overcome ETL Bottlenecks with Metadata-driven Integration for the AI Era [Free Guide]

Gartner® Hype Cycle™ for Analytics and Business Intelligence 2023

The Art of Mastering Data Quality for AI and Analytics

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Top 6 Strategies for Reducing Data Warehouse Costs

Building an Operational Data Warehouse for Real-time Analytics

Sponsored Multimedia

The Power of DataOps: Bring Automation to Life
No Comments

Tactical Steps for Cloud Migration
No Comments

Immuta Data Access Platform
No Comments

Data Mesh: Fact or Fiction?
No Comments

Contributors

Featured Events

AI & Big Data Expo North America 2024

CDAO Canada Public Sector 2024

AI Hardware & Edge AI Summit Europe

AI Hardware & Edge AI Summit 2024

CDAO Government 2024

Architecting Immediacy-The Design of a High-Performance Portable Wrangling Engine

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

April 26, 2024

April 25, 2024

April 24, 2024

Most Read Features

Most Read News In Brief

Most Read This Just In

Sponsored Partner Content

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Sponsored Multimedia

Contributors

Featured Events

Share

Copy short link