Building a Better (Google) Earth
About 10 years ago, the folks at Google finished indexing the Internet and turned their attention to indexing planet Earth. The resulting product, Google Earth, amazed nearly everybody who used it. But for individuals with geospatial backgrounds, the bubblegum and baling wire holding the product together signaled there had to be a better way.
Andrew Rogers was one of the Google engineers who worked on the first iterations of Google Earth. He helped look for efficient ways to add “layers” onto the screen that could show what the weather was like in a given area or how the traffic was moving or what people were saying on social media. It all appeared to be very cutting edge and real-time, but that image was largely a facade.
“We did everything to make it look as live as possible while working around the limitations of the system,” Rogers tells Datanami. “People would say, ‘Oh you’ve clearly solved this geo-spatial indexing problem.’ No, we just fake it really brilliantly.”
Google had taken its GIS systems, which were based in part on MySQL, PostGIS, and Big Table databases, just about as far as they could go. Whereas those technologies may allow you to manipulate and analyze upwards of 100 million objects at a time, Google saw that it would need to ingest and index trillions of objects generated from this emerging Internet of Things. So in 2006 it brought in experts from academia to advise them and chart a new path forward.
“They came back and said, in principle, ‘This index of reality is buildable, but to build it, you’d have to solve these four computer science problems that haven’t been solved,” Rogers says. “And by the way, we can prove it looks nothing like Big Table. We don’t know what it looks like, but it won’t look like Big Table.”
Andrews, who didn’t have a background in geospatial systems before working on Google Earth, set off to solve the computer science problems with a company he founded, called SpaceCurve. “I basically solved all the computer science problems required to build an infrastructure that would allow you to index every data source in reality in real time,” he says.
Today, four years after founding SpaceCurve, the company announced the general availability of its parallel spatial data platform, also dubbed SpaceCurve. According to Andrews, the company built the parallel database from the ground up to be able to ingest, index, analyze, and store massive amounts of fast-moving data. The software, which is optimized to run on solid state drives (SSDs), can ingest hundreds of thousands of Geo-JSON documents per second per node, while running SQL-based analytics on it at the same time.
The idea behind SpaceCurve is to enable organizations to act upon fast-moving data generated from various sources, such as satellite imagery, social media, weather, and cell phone telemetry data. No big data systems built to date can act on this magnitude of time-series data in anything close to real time, Andrews says.
“This is one of the key problems that had to be solved when we were doing a lot of the analytics, even for Google Earth when we were trying to build these data layers,” he says. “If you look at the existing big data technologies, whether it’s Google’s stack internally or Hadoop, Spark, or Mongo–none of them were really designed to solve analytic problems or to scale out data models around spatial-temporal relationships. And it turns out this is actually a very hard problem to solve.”
Most big data platforms offer a tradeoff between ingesting data at high speed or analyzing it, according to Andrews. There are systems that can ingest 1 PB of data, but because it needs to be written to disk (there are no systems with 1 PB of RAM), it’s going to take you a while. Likewise, there are systems that can analyze data in real time as it comes in, but it can’t do so at high scale, he says.
SpaceCurve gets around this roadblock by using some clever spatial algorithms that break down the source data and allow it to be queried very quickly. “You don’t see the parallelism at the SQL interface level because we translate that into underlying algebra way out at the edge,” Andrews says. “We take your normal SQL statement, translate it into algebra that gets massively parallelized, then the data gets put back together and you don’t really see the parallelism.”
The product also sports a new geometry engine that guarantees a very high level of precision, which is important when computing geo-spatial data across large regions of the Earth. The engine can take into account the curve of the planet, hence its name. “We built the first
computational geometry engine that was designed to do extremely high-precision, non-Euclidian geospatial analytics,” Andrews says. “We carry through 1 part per trillion. We guarantee that sort of precision throughout the process.”
Its computational geometry engine can also enable users to run massive parallel joins, such as finding commonalities between two huge polygons . “We actually came up with some new algorithms for doing very fast computational geometry on curved surfaces that actually didn’t exist in the literature,” he says. “We have at least one patent pending on how we do massive parallel joins, which has always been the Achilles Heel of big data systems.”
In the past, few organizations had a reason to big geometry systems, but that’s changing in the era of the IoT. “When you start doing billions or trillions of records and doing it fast and at massive scale, suddenly the cost of doing the geometry operations matters a lot,” he says. “Every part of the stack was purpose built for geospatial, and spatial analytics in particular.”
The Seattle, Washington company’s software has already been adopted by several firms, and is participating in numerous tests and proof of concepts. With about 40 employees, the company is heavy on the engineering talent, and is now looking to ramp up the sales activity.
SpaceCurve CEO Dane Coyer, who has held executive positions at IBM, Loral, and Lockheed Martin, is confident the technology can help companies harness the power of real-time data. “Whether monitoring the flow of populations and commerce throughout a city in order to understand complex consumer behaviors, or using remote sensing platforms to detect risks in agricultural supply chains, or combining aircraft sensor data with atmospheric measurements to optimize the fuel economy of flights in real-time, the ability to contextualize almost any operational scenario by combining live sensor data with historical or slow moving data, at speed and scale, is game changing,” Coyer says.
Some of the company’s first customers are in the telecommunications space. Mobile carriers, in particular, have a wealth of telemetry data that they’re looking to exploit. The company is also looking to help retailers understand the physical movement of consumers to drive sales.
Analyzing real-time data is new ground for many companies–at least those outside the telecommunications and defense and intelligence industries. So many clients will lean on SpaceCurve to help them build end-to-end solutions.
“We end up not only providing the technology platform to handle the raw data, but we also bring in a lot of the expertise,” Andrews says. “Customers may want to bring in real-time weather, the Twitter firehouse, and carrier telemetry data that tracks aggregate people motion. They want to put it all together, mix it with operational data, and then understand all these things to help them figure out what question they can ask and what value they can uncover.”