Follow Datanami:
December 1, 2022

AWS Announces 5 New Database and Analytics Capabilities

LAS VEGAS, Dec. 1, 2022 — AWS has announced five new capabilities across its database and analytics portfolios that make it faster and easier for customers to manage and analyze data at petabyte scale. These new capabilities for Amazon DocumentDB (with MongoDB compatibility), Amazon OpenSearch Service, and Amazon Athena make it easier for customers to run high-performance database and analytics workloads at scale.

Additionally, AWS announced a new capability for AWS Glue to automatically manage data quality across data lakes and data pipelines. Finally, Amazon Redshift now offers support for a high availability configuration across multiple AWS Availability Zones (AZs). Today’s announcement helps customers get the most out of their data on AWS by empowering them to access the right tools for their data workloads, operate at scale, and increase availability.

“Data is inherently dynamic, and harnessing it to its full potential requires an end-to-end data strategy that can scale with a customer’s needs and accommodate all types of use cases—both now and in the future,” said Swami Sivasubramanian, vice president of Databases, Analytics, and Machine Learning at AWS. “To help customers make the most of their growing volume and variety of data, we are committed to offering the broadest and deepest set of database and analytics services. The new capabilities announced today build on this by making it even easier for customers to query, manage, and scale their data to make faster, data-driven decisions.”

  1. Amazon DocumentDB Elastic Clusters power petabyte-scale applications with millions of writes per second: Tens of thousands of customers use Amazon DocumentDB to run their document workloads because it is fast, scalable, highly available, and fully managed. While each Amazon DocumentDB node can scale up to 64 tebibytes of data and support millions of read requests per second, a subset of customers with extremely demanding workloads needs the ability to scale beyond these limits to support millions of writes per second and store petabytes of data.
  2. Amazon OpenSearch Serverless automatically scales search and analytics workloads: To power use cases like website search and real-time application monitoring, tens of thousands of customers use Amazon OpenSearch Service. Many of these workloads are prone to sudden, intermittent spikes in usage, making capacity planning difficult. Amazon OpenSearch Serverless automatically provisions, configures, and scales OpenSearch infrastructure to deliver fast data ingestion and millisecond query responses, even for unpredictable and intermittent workloads. With Amazon OpenSearch Serverless, data ingestion and search resources scale independently, allowing these operations to run concurrently without any performance impact. Customers using Amazon OpenSearch Serverless get access to serverless benefits (e.g., automatic provisioning, on-demand scaling, and pay-for-use pricing), along with Amazon OpenSearch Service features, such as built-in data visualizations, that help them understand log data, identify anomalies, and see search relevance rankings.
  3. Amazon Athena for Apache Spark accelerates startup of interactive analytics to less than one second: Customers use Amazon Athena, a serverless interactive query service, because it is one of the easiest and fastest ways to query petabytes of data in Amazon Simple Storage Service (Amazon S3) using a standard SQL interface. Many customers are looking for that same ease of use when it comes to using Apache Spark, an open-source processing framework for big data workloads that supports popular language frameworks (i.e., Java, Scala, Python, and R). While developers enjoy the fast query speed and ease of use of Apache Spark, they do not want to invest time setting up, managing, and scaling their own Apache Spark infrastructure each time they want to run a query. Now, with Amazon Athena for Apache Spark, customers do not have to provision, configure, and scale resources themselves. Interactive Apache Spark applications start in less than one second and execute faster than open source using AWS’s optimized Spark runtime.
  4. AWS Glue Data Quality automatically monitors and manages data freshness, accuracy, and integrity: Hundreds of thousands of customers use AWS Glue to build and manage modern data pipelines quickly, easily, and cost-effectively. Organizations need to monitor the data quality f the information in their data lakes and data pipelines to ensure it is high quality before using it to power their analysis or machine learning applications.
  5. Amazon Redshift now supports multi-AZ deployments: Tens of thousands of AWS customers collectively process exabytes of data with Amazon Redshift every day. To support these customers’ mission-critical workloads, Amazon Redshift offers capabilities that increase availability and reliability, such as automatic backups and the ability to relocate a cluster to another AZ in minutes. Many databases today use a primary-standby replication mode to support high availability where a single database serves live traffic, and standby copies replicate data from the live version in case they need to replace it. Building on these capabilities, Amazon Redshift now offers a high-availability configuration to enable fast recovery while minimizing the risk of data loss.

Source: Amazon

Datanami