NASA Aims High with Airline Data
Analytics do not necessarily deal with numerical data. Text-based analytics can also be a useful tool in diagnosing problems as a result of analyzing human-generated written reports.
Ashok Srivastava, NASA’s Principal Scientist of Data Mining and Systems Health Management, is working on using text-based analytics in the aeronautics industry to improve the safety of the three million plus flights that happen each year.
According to Srivastava, the rate of fatal airplane accidents decreased from a rate of one every one hundred thousand in 1960 to a rate of about one per every five million for US and Canadian carriers.
While that number is certainly low, given three millions flights each year, there is about a 60% chance a fatal accident will happen in a given year. Srivastava is focused on lowering that risk through increased automation, considering that the vast majority of incidents take place as a result of human error. “We’re trying to take all that data,” Srivastava says, “and we’re looking for what we call precursors to accidents.”
While completely automating the airline industry is out of the question today (although Srivastava mentions that it could be possible within fifty years or so), it is certainly possible to analyze data to predict when incidents may occur and avoid them.
NASA got a hold of a wealth of information for this project, including molecular data on the aircraft materiel, sensor data, software data, and even data on the pilots’ physical condition. It is well documented that a fatigued driver is often as dangerous as a drunk driver. NASA is working with British based EasyJet.com to better determine pilot fatigue.
However, by far the most important data to Srivastava is pilot-provided text report data.“Figuring out why something happened is very difficult to do from numerical data alone, and that’s where text starts to play an important role.”
It is much more difficult to teach computers to understand text and human language than it is to teach them numbers. Computers were built with numbers in mind, not words. However, according to Srivastava, the world has gotten to the point where the volume of word documents that need to be analyzed is too large for humans to take on by themselves.
“You might start with 100,000 reports total. You might boil it down to a few thousand reports for a specific airport (DFW in this case), you might boil it down to a few hundred but you’ve still got hundreds of documents that someone needs to analyze.” Srivastava focused his efforts on keyword recognition, categorizing words like runway, taxi, confusing, apologized, and others.
“There are two ways to look at the world with regard to this problem. One way is to say, ‘look, we’ve got data from the airplanes so let’s go analyze that. We’ve got a hundred thousand text reports, let’s analyze those, and let’s somehow put these two together to form a holistic picture of what’s going on. …The approach we’re taking is really different from that. What we’re saying is that, we’re going to take all of that and put it into a single algorithm that analyzes these things simultaneously. [As a result,] we can get very high accuracy rates as far as making predictions.”
The combined algorithm works to identify anomalies, or departures from what Srivastava calls the “safe state.” For the most part, flights progress from safe state in takeoff to safe state mid-flight to safe state in touchdown. But sometimes anomalies happen, at which point the state progresses to ‘compromised,’ where something odd is afoot but an accident is not imminent. This serves as sort of a yellow alert, where observers should take mind of it but not act quite yet.
From there, the flight may reach the ‘anomalous’ state, where the discrepancies are large enough to pay attention to. This will also likely, according to Srivastava, lead to an accident but it is worth studying. He explained the importance of recovering text reports from those anomalous flights, “Text starts to play a very crucial role in understanding what’s going on in this model. If something anomalous happens and the pilots note it and write it down, that gives us a wealth of information.”
It is important to note that NASA is not yet applying this analysis in a predictive fashion. All text analytics are happening post-flight. However, they are a key to informing which airports are, in pilots’ experiences, inefficient.
For example, pilots’ reports of anomalous activity at the Dallas-Fort Worth airport included many instances of the words runway, tower, and short, indicating aviation problems, as well as high speed, landed, and frequency, indicating communication problems. NASA can then take these findings to the airport and suggest they make improvements to their communication systems.
However, text reports may be affected by a collective bias. Srivastava notes that it is well known in the piloting community that Cleveland’s airport is confusing. This notion could lead to pilots scrutinizing the Cleveland airport more than they would others when they land and take off.
That being said, NASA’s techniques are already valuable and applicable in the real world, especially to major airline Southwest Airlines. “Southwest Airlines has used our algorithms to analyze data from their systems and they’ve discovered what they call operationally significant events. The types of discoveries that they have made are being used on a daily basis.” Among the discoveries they have made is the pilots’ unusual propensity for having the plane’s nose up upon landing, no doubt a concern if the plane’s altitude is supposedly decreasing when landing.
Eventually, Srivastava will be able to transform this information into a truly real time predictive system, which he says will be available in the Boeing 777 and 787 in 2015. Until then, text analytics are a useful tool in guarding against potentially disastrous anomalies in aeronautics.