October 17, 2016

The Here and Now of Big Geospatial Data

Alex Woodie

(USACE/ERDC)

No matter how sophisticated information technology gets – and who can deny that IT is evolving exceptionally fast these days – there’s nothing that can replicate the combination of two unique pieces of data: Time and place. That’s why the place for geospatial data is here, and the time is now.

The advent of big data analytics has enabled companies to answer all sorts of questions that they couldn’t have before, like precisely who bought exactly what, and when. As data practitioners get deeper into the technology – and especially as they dive into the world of real-time analytics driven by smart phones and other devices connected to the Internet of Things (IoT) – they’re increasingly turning to geospatial data to optimize the delivery of products and services for people as they move about the real world.

In a recent Forrester Wave report on geospatial tools, Forrester analyst Rowan Curran writes that geospatial insights are not only involved in collecting data for sales, CRM, customer support, HR, and marketing initiatives, but they’re also used in the delivery of services.

“These allow companies to use spatial data to drive unprecedented levels of understanding and analysis of users’ habits and behavior,” Curran writes, “but they also provide the platforms to deliver the messaging, content, and other actions directly to users in the most appropriate context.”

The potential applications of geospatial data are vast. Consider these recent real-world examples:

Logistics: The United States Postal Service is using big geospatial analytics to optimize mail route planning and reduce delivery times;
Fraud detection: By tracking the location of attempted credit card transactions—and specifically the physical distance between them—banks have a new tool for detecting fraudulent activity in real time;
Retail: Chains Macy’s are turning to location-sensing technology to deliver a better in-store experience to customers and challenge ecommerce sites for business;
Finance: Investors are turning to satellite or drone-based imagery as a source of data to inform decisions, such as assessing the valuation of commodities trades or predicting consumer demand;
Shipping: Tracking the movement of about 21 million shipping containers atop 100,000 ships in the maritime fleet and using machine learning algorithms to optimize their flow can save millions of dollars;
Advertising: American Express generated promotions to customers based on purchase history and location, thanks to a geo-tagging solution from Foursquare;
Entertainment: Pokémon GO showed how the overlay of cyberspace upon the real world can deliver a compelling augmented reality (AR) experience;
Journalism: Reporters and editors are turning to advance geospatial tools like OpenStreetMap to help tell compelling stories.

There’s no doubt that geospatial data brings a big potential upside for delivering much-needed context to decision making in all sorts of areas. No matter how much of our lives now exist in cyberspace, our terrestrial ties make it important to know where and when people and things exist in the real-world.

Big Geospatial Challenge

However, geospatial data presents a unique set of challenges as well. Depending on how often a tracked device emits its location, the volume and velocity of geospatial data can be the first barrier to successfully leveraging big geospatial data. Traditional relational databases from the likes of Oracle and IBM support geographic data types and queries, often through extensions to the core database.

But these scale-up databases are largely seen as insufficient for the scale of emerging big data use cases. Increasingly, scale-out databases are being used to track big, high-velocity data. It’s no wonder that NoSQL databases are being asked to serve and process geolocation data.

MongoDB, for example, supports storing geolocation data in a JSON document, and also supports some geo-specific query types too. Redis, the super-fast key-value store, has also proven itself adept at storing and serving the two key pieces of data required in geospatial computing: the X and the Y coordinate, or what Redis calls the Geo Set. Geospatial capabilities can also be found in document and wide-column NoSQL databases from Aerospike, Datastax, and Couchbase, in addition to graph stores from the likes of Neo Technologies and MarkLogic.

By some estimates, up to 80% of all data being generated today has a geospatial component. (This is likely due to video, being the biggest data of them all, and the geo-tagged capability that’s enabled when video is taken from a smart phone.) In a business setting, companies may turn to Hadoop to extract insights from geospatial data.

This is precisely what the US trucking company US Xpress is doing. According to a Deutsche Bank white paper, US Xpress is using Hadoop to process and analyze a range of data collected from trucks, including geospatial data, as well as data from tire pressure monitors and engine monitors. According to the bank, the trucking firm is saving millions of dollars per year.

Specialized Geospatial Databases

General purpose data systems like Hadoop, NoSQL, and relational databases, however, are not well-suited for many geospatial use cases. Increasingly, the difficulty in storing geo-location data has given rise to a collection of specialized databases that are specifically geared toward storing geospatial data.

The giant in the geographic information system (GIS) space is California-based Esri, whose ArcGIS product underlies many geo-powered applications. In the open source arena, PostGIS, which overlays a geospatial component atop the Postgres relational database, has a large following. Another open group looking to set standards in the space is Open Geospatial Consortium (OGC), whose goal is to “empower technology developers to make complex spatial information and services accessible and useful with all kinds of applications.”

Databases from Space-Time Insight, CARTO, and SpatialDB are also helping to make processing geospatial data easier. J. Andrew Rogers, who helped build Google Earth, found that the PostGIS tool was insufficient for the work he was trying to do, so he developed his own sharded geospatial engine called SpaceCurve.

Still, other vendors are taking entirely new approaches to ingest and process geospatial data at scale. One of the up-and-coming firms to keep an eye on is Kinetica (formerly GIS Federal). The company’s GPU-powered database, called GPUdb, has been adopted by the USPS, which recently installed tracking devices on about 200,000 mail delivery vehicles as part of its geospatial program. The devices emit a ping every minute, which adds up to about 250 million location data points collected each day. To enable queries on all that data, USPS tapped GPUdb, which runs on a cluster composed of about 200 X86 and GPU processing nodes.

Another firm enabling customers to visualize big geospatial on the fly is MapD. The company, which was spun out of Todd Mostak’s graduate project at MIT, fuses a GPU-based database with a collection of visualization tools to enable users to work with huge geospatial data sets at interactive speed (look for an upcoming feature story from Datanami on MapD).

As the cyber and physical worlds become more intertwined, we’ll increasingly look to geospatial data to enable us to track the location of people and things as they move and to power a new class of location-based services. However, some aspects of geospatial make it difficult to work with. Companies that can master big geospatial data and integrate it with user-facing apps will hold a competitive edge for the foreseeable future.

Building a Better (Google) Earth

Technologies: Middleware

Sectors: Financial Services, Government, Healthcare, Manufacturing, Retail

Vendors: CARTO, Esri, IBM, Kinetica, MapD, MongoDB, Oracle, Redis, Space-Time Insight

Tags: big data, geospatial data, predictive analytics

Only registered users may comment. Register using the form below.

Check off newsletters you would like to receive*
- HPCwire
- EnterpriseTech
- Datanami
- Technology Conferences & Events
- Advanced Computing Job Bank
- Technology Product Showcase
Email*
Name*
First Last
Organization*
Job Function*
Industry*
Country*
City*
State*
Province*
- Please check here to receive valuable email offers from Datanami on behalf of our select partners.

The Here and Now of Big Geospatial Data

Big Geospatial Challenge

Specialized Geospatial Databases

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

April 18, 2024

April 17, 2024

April 16, 2024

Sponsored Partner Content

Get your Data AI Ready – Celebrate One Year of Deep Dish Data Virtual Series!

Supercharge Your Data Lake with Spark 3.3

Learn How to Build a Custom Chatbot Using a RAG Workflow in Minutes [Hands-on Demo]

Overcome ETL Bottlenecks with Metadata-driven Integration for the AI Era [Free Guide]

Gartner® Hype Cycle™ for Analytics and Business Intelligence 2023

The Art of Mastering Data Quality for AI and Analytics

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Building an Operational Data Warehouse for Real-time Analytics

Can You Use Kafka as a Database?

Sponsored Multimedia

The Power of DataOps: Bring Automation to Life
No Comments

Tactical Steps for Cloud Migration
No Comments

Immuta Data Access Platform
No Comments

Data Mesh: Fact or Fiction?
No Comments

Contributors

Featured Events

Call & Contact Center Expo

AI & Big Data Expo North America 2024

AI Hardware & Edge AI Summit 2024

CDAO Government 2024

The Here and Now of Big Geospatial Data

Big Geospatial Challenge

Specialized Geospatial Databases

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

April 18, 2024

April 17, 2024

April 16, 2024

Most Read Features

Most Read News In Brief

Most Read This Just In

Sponsored Partner Content

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Sponsored Multimedia

Contributors

Featured Events

Share

Copy short link