Follow Datanami:
August 20, 2013

Providing Hidden Benefits With Predictive Analytics

Isaac Lopez

While analytics have been making impacts in the online world, such technology is starting to filter its way into the physical world promising to optimize the world around us. Technologies such as graph and predictive analytics are gaining more attention, with the promise to provide hidden benefits to make people happier as they move about, and maybe spend an extra dollar or two along the way.

Graph analytics are being used to help one major international airport to improve the layout of the airport. The goal behind this work, says Dr. Flavio Villanustre, a VP of Information Security at LexisNexis Risk Solutions’ HPCC Systems initiative, is aimed at improving the passenger experience while they’re travelling, reducing lines and optimizing everything from the parking lot to the boarding of the plane (as well as transfers between flights).

The data collection is an interesting and impressive undertaking in itself, as Villanustre explained. Datanami recently reported how retailers such as Nordstrom and Cabela’s are starting to use Wi-Fi tracking to follow customer movements in store. This data collection strategy was put in place for this project, with sensors detecting the presence of passengers’ mobile devices as they move through the airport, picking up their Wi-Fi phone signals and Bluetooth headsets. Additionally, says Villanustre, there are thermal images being fed into the data collection pool to help build the graph.

“The sensors triangulate the presence of the device, so they know with a good degree of approximation where the device is located throughout the layout of the airport,” he explained. “You get a signal – a fingerprint of that device at intervals of time as it moves through the airport.”

The data was collected over several months and put into the HPCC Systems platform, where it was crunched up and distilled into second-by-second graphs that mapped the individual locations at any given time. “This is essentially an extremely large graph problem,” explained Villanustre. “This large graph is not just one graph, but a very large number of graphs with each one showing the location of each particular vertices in a particular location in the airport at a particular time.”

Using the HPCC Systems platform, which includes a general processing data refinery called Thor (as in “wielding the mighty data crunching hammer of Thor”), and a rapid data delivery engine known as “Roxie” (Rapid Online XML Inquiry Engine), they were able to crunch the enormous amount of data collected tracking travellers movements over time. Using a combination of statistical analysis and some graph traversal techniques, they modeled the static location of each traveler at any given time, building a predictive models that showed how the travelers distributed across time. Villanustre says once the model was established, they were abel to verify it with comparisons to the actual real-time traffic.

The next step, which Villanustre says is currently underway, is to use the data to optimize the airport’s layout, achieving benefits like reducing the congestion points, diminishing delays and even changing some behavior to make the passenger more comfortable. “For example, passengers become more annoyed if they need to stand in a particular place for some time than if they were engaged with walking to some place for the same amount of time,” he says. “So even moving the arrival and departure gates of airplanes can change the perception even if it doesn’t make travel any faster.”

Among some of the other items under consideration is reorganizing the layout of shops, optimizing the geometry of the security area, optimizing arrival and departure gates in relation to baggage check and pick-up, and more. Villanustre says it’s all currently under review as the airport gears for the holiday travel season.

While Hadoop gets a lot of attention in the parallel processing world, the open source HPCC Systems platform has been around longer, offering similar functionality using two separate parallel processing environments. The first is a general processing data refinery they call Thor (as in “wielding the mighty data crunching hammer of Thor”), which operates as a batch job execution engine very similar in function and environment to Hadoop, with perhaps fewer moving parts.

The second processing environment contained in the HPCC System Architecture is a rapid data delivery engine known as “Roxie” (Rapid Online XML Inquiry Engine). Taking data crunched from Thor (in the form of distributed index files), Roxie provides a query execution engine similar in function to Hadoop with HBase and Hive capabilities layered on. Both environments use ECL, a data centric parallel programming language developed in 2000 aimed at bypassing certain inefficiencies of SQL by removing some of the lower level decision-making.

The company recently released the 4.0 version of the platform, which Villanustre says has been in development for the last two years. One of the chief enhancements, outside of performance improvements, includes the expansion of its code base, now supporting Java, Python and R code, where previously it only supported ECL and C++.

Using their system, organizations are able to solve problems that lesser equipped systems would struggle with, including problems like massive scale fraud detection, which we covered earlier this year.

While the world plunges deeper into the big data technology trend, expect to hear more about how the benefits of this trend are filtering into the physical world around us in ways that are measurable on paper, but might not necessarily be visible to the people walking around benefitting from them. We’ll look forward to geeking out on and sharing the technology that makes it all possible.

Related items:

Big Data Backs World’s Largest Lie Detector 

Data Athletes and Performance Enhancing Algorithms 

On Algorithm Wars and Predictive Apps