Google Cloud Spanner Gets Change Streams
Prospective Google Cloud Spanner users who want to suck data out of the globally consistent cloud database in real time without resorting to hacking the system will get their wishes following Google Cloud’s announcement today of a new change data capture (CDC) functionality in its database.
Cloud Spanner is unique among relational databases in that it can deliver full ACID guarantees in a globally distributed, highly available manner, which is something few other relational databases can do. It achieves this feat of computer science by leveraging exotic hardware, including atomic clocks and GPS systems, which allow it to track the exact flow of committed transactions through time and space to a very fine degree.
While Cloud Spanner has no peers in its class, it lacks some features that are commonly found in more Earth-bound databases, including CDC functionality that allows data to be pulled from the database as soon as it’s committed. Users can’t just plug a third-party CDC tool into Spanner and pull data out for real-time analysis. Due to that limitation, Google Cloud helped some users–mostly financial services firms–hack the database to get at that hot and fresh data.
“Because Spanner does do timestamp transactions and has this very special TrueTime with the GPS and the atomic clocks, we did work with some large customers to kind of hack a change stream-like capability in their applications, because they couldn’t wait,” said Andi Gutmans, the vice president and general manager of databases at Google Cloud. “So there were ways for them to kind of do this, but it was too hard and too specific.”
None of that hacking is necessary now that Google Cloud has unveiled Spanner change streams, which makes CDC functionality a core part of the database. Change streams gives customers the capability to not only track inserts, updates, and deletes to the database, but also track changes made to specific tables and columns, or changes across an entire database, the company says in its blog post.
“Now we’re giving them a very simple API. They can decide anything between a one-day and seven-day change data capture,” Gutmans said. “We also have automatic integration through Dataflow, which basically helps them push that to BigQuery and so on. So this is a complete out-of-the-box change data capture capability that is going to make life really easy for customers.”
In addition to pushing data from Spanner into Google Cloud properties, Spanner change streams will support third-party data integrations, Google Cloud said. Users can also use the API to build their own data integrations, it said.
Google Cloud foresees several ways that customers will put change streams to use. Analytics is one of the most obvious ones, enabling users to push transactional data quickly into BigQuery, where it can be incorporated into the latest dashboard. It can also be used to feed event data into Google Cloud Pub/Sub, a real-time streaming system similar to Apache Kafka, where it can trigger downstream processes. Lastly, Google Cloud sees it being used for compliance purposes, whereby an entire change log is stored for a set period of time.
“This is one of our top asks on the Spanner side, from financial services and other customers,” Gutmans told Datanami. “I think that’s one of the things that really set us apart at GCP, is we’re trying to make sure that most of the plumbing is taken care of by us, whether that’s on the operational side, or how operational connects to analytics or to AI and ML, so customers can really focus on business value.”
Google Cloud change streams will be available in preview soon.