Follow Datanami:
March 18, 2021

AWS Tackles Real-Time Data Transformation with S3 Object Lambda

Alex Woodie

Cloud object stores like S3 have become the default storage repository for many companies. But operational challenges arise when one tries to use a single object store as a universal repository for multiple applications, each of which has its own specific data requirements. AWS addressed that challenge with today’s launch of S3 Object Lambda, which allows users to add their own custom code to S3 GET requests.

AWS bills the new S3 Object Lambda service as the best way to avoid the cost and complexity associated with the most common workarounds to the problems that arrive when serving multiple applications from a single data set.

The first solution to this problem is the brute force approach: make multiple copies of the data. That way, each application gets the exact view of the data that it needs. Of course, this comes at the cost of increasing the storage requirements, not to mention writing a data transformation pipeline and maintaining the infrastructure that it runs on.

The second solution, according to AWS, is to build a proxy layer that sits in front of S3 and intercepts and processes data as it is requested in real time. But just like with the first solution, it puts the onus on the user to build the transformation system and manage the additional infrastructure that it requires to run.

AWS says its new S3 Object Lambda, which it launched today, provides a better solution to this problem. The new offering enables users to run any Lambda function during the S3 GET requests, thereby providing a more tailored data set to the application but without significantly increasing the infrastructure behind it.

AWS Lambda, of course, is AWS’s serverless compute service that allows users to run code without the need to provision or manage servers, or even to scale the servers to meet demand. Users just present their code (it supports Node.js, Python, Go, Java, and other languages), and Lambda’s “workload-aware” clustering mechanism automatically runs it at the scale that’s required.

S3 Object Lambda can deliver customized data sets to each requesting application

With S3 Object Lambda, any data transformation routines that a user has written as a Lambda function can now be executed whenever the user requests data from S3, thereby enabling users to deliver customized data objects to different applications.

According to a blog post by Danilo Poccia, AWS’s chief evangelist for EMEA, S3 Object Lambda can be used in a number of use cases, including:

  • Redacting personally identifiable information for analytics or non-production environments;
  • Converting across data formats, such as converting XML to JSON;
  • Augmenting data with information from other services or databases;
  • Compressing or decompressing files as they are being downloaded;
  • Resizing and watermarking images on the fly using caller-specific details, such as the user who requested the object.

Users can invoke S3 Object Lambda from the S3 management console with “just a few steps,” AWS says in a video posted to its Lambda website. The functions can also be called from AWS Command Line Interface (CLI) and AWS SDKs, the company says.

There is a fee associated with S3 Object Lambda (of course). The fee varies by region, according to AWS. In the US East Region, users pay $0.0000167 per GB-second for the duration of their AWS Lambda function, and $0.20 per 1M AWS Lambda requests. Users also pay $0.0004 per 1,000 requests for all S3 GET requests that are invoked by a Lambda function, and a $0.005 per-GB fee for the data S3 Object Lambda returns to your application.

Related Items:

Can We Stop Doing ETL Yet?

Selecting a Data Lake ETL Platform? Here Are 6 Questions to Ask

Running Sideline to Sideline with Big Data