February 28, 2014

These Cabbies Yield…for Big Data

Alex Woodie

Riding in a New York City cab can be a harrowing, life-or-death experience. But as the result of a recent study of taxi traffic patterns in the Big Apple, your wild urban adventure is helping to unearth clues about how best to crunch massive amounts of location data, and could pave the way for more elaborate ways that GPU computing can crunch big location data in the future.

University researchers from New York and Oklahoma recently dug into the location data collected from more than 170 million taxi rides taken in 2009. That year, the city collected location data from more than 13,000 GPS-equipped cabs, which made an average of 500,000 trips per day.

The researchers’ goals were twofold. First, they wanted to determine where, exactly, all those trips started and ended, for the purpose of assisting city planners (the results might surprise you). Secondly–and more importantly–they wanted to flesh out new massively data parallel techniques to speed up the complex spatial query tasks.

The amount of data here is so big that existing approaches to crunching location data, such as Geographical Information Systems (GIS), Spatial Databases (SDB), and Moving Object Databases (MOD), simply don’t work. “The huge data volumes have prevented these existing technologies, which are mostly designed for disk-resident systems based on serial algorithms and running on uniprocessors, from achieving performance close to real-time responses to support interactive inquiries,” write the three researchers: Jianting Zhang from City College of New York, Simin You of CUNY, and Le Gruenwald from the University of Oklahoma.

Trying to tackle this problem in a linear, head-on manner is a good way to induce blunt-force trauma of the big data variety. “Our previous experiments have shown that simply uploading the raw data of 170 million taxi trip records in a PostgreSQL database and creat[ing] a geometry column for the pickup locations would cost 100+ hours on a high-end workstation with 48 GB memory and reasonably up-to-date dual Intel Xeon quadcore,” the researchers wrote.

The Hadoop/MapReduce framework has become a popular place to crunch such data in a parallel way. But in this case, the researchers chose to turn this into a graphic processing equation, using GPUS for general purpose computing (or GPGPU).

Here’s how they approached it. First, they adopted land-use types (as determined by New York’s friendly planning commission) as a proxy for trip purposes at the pickup and drop-off locations. Then, they converted the land use types in NYC into three-quarters of a million tax lot polygons sporting 4.7 million vertices. Finally, they formulated a “large-scale nearest neighbor spatial query problem” based on point-to-polygon distance. Then they let the GPUs fly.

It worked. “Experiments have shown that our GPU implementations can complete such complex spatial queries in about 50-75 seconds using an inexpensive commodity GPU device,” the researchers say. “The performance is 10X-20X higher than the host machine with an Intel dual-core CPU when all the cores and hardware supported threads are fully used.”

Can the GPU approach scale? Up to a point, the researchers say. “While the shared-memory parallel hardware of GPU systems are generally considered lacking good scalability when compared with shared-nothing architectures, we argue that the large numbers of processing cores and the high memory bandwidth available on GPUs have made them competitive in solving big data problems up to a certain scale,” they write.

Going forward, the researchers will be looking to integrate the GPU approach into the Hadoop/MapReduce infrastructure to solve problems of a larger scale.

By the way, the researchers also figured out where all those cabs are headed in NYC. The top three pickup locations, as depicted according to land use types (LUTs), are Commercial & Office Buildings, Open Space & Outdoor Recreation, and Mixed Residential & Commercial Buildings, respectively. Those three LUTS are also the top drop-off points, in the same order.

The researchers’ paper, titled “High-Performance Spatial Query Processing on Big Taxi Trip Data using GPGPUs,” was recently published on the City University of New York’s website.

This is Your Brain on GPUs

NVIDIA GPUs Tackle Big Data Analytics

Applications: Data Mining

Technologies: Processors

Sectors: Academia

Tags: GPU, Hadoop

Only registered users may comment. Register using the form below.

Check off newsletters you would like to receive*
- HPCwire
- EnterpriseTech
- Datanami
- Technology Conferences & Events
- Advanced Computing Job Bank
- Technology Product Showcase
Email*
Name*
First Last
Organization*
Job Function*
Industry*
Country*
City*
State*
Province*
- Please check here to receive valuable email offers from Datanami on behalf of our select partners.

These Cabbies Yield…for Big Data

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

May 10, 2024

May 9, 2024

May 8, 2024

May 7, 2024

Sponsored Partner Content

Get your Data AI Ready – Celebrate One Year of Deep Dish Data Virtual Series!

Supercharge Your Data Lake with Spark 3.3

Learn How to Build a Custom Chatbot Using a RAG Workflow in Minutes [Hands-on Demo]

Overcome ETL Bottlenecks with Metadata-driven Integration for the AI Era [Free Guide]

Gartner® Hype Cycle™ for Analytics and Business Intelligence 2023

The Art of Mastering Data Quality for AI and Analytics

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Top 6 Strategies for Reducing Data Warehouse Costs

Building an Operational Data Warehouse for Real-time Analytics

Sponsored Multimedia

The Power of DataOps: Bring Automation to Life
No Comments

Tactical Steps for Cloud Migration
No Comments

Immuta Data Access Platform
No Comments

Data Mesh: Fact or Fiction?
No Comments

Contributors

Featured Events

AI & Big Data Expo North America 2024

CDAO Canada Public Sector 2024

AI Hardware & Edge AI Summit Europe

AI Hardware & Edge AI Summit 2024

CDAO Government 2024

These Cabbies Yield…for Big Data

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

May 10, 2024

May 9, 2024

May 8, 2024

May 7, 2024

Most Read Features

Most Read News In Brief

Most Read This Just In

Sponsored Partner Content

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Sponsored Multimedia

Contributors

Featured Events

Share

Copy short link