Sunday, April 27, 2014

Timely Data Acquisition for the Aviation Industry

originally posted on March 30, 2014 on csdoctorsister.blog.com

In the wake of #MH370 (classified as missing since March 8th), I posted to FB on March 13th: How do you LOSE A PLANE with all these technological mediums?!? Where the freak’n frack is Malaysian Airlines Flight 370?!? #hearthurts #disappointed

The fact that a plane carrying 227 passengers and 12 crew goes missing is unimaginable and unacceptable. As a #datahead, I immediately think about what data was available and not captured that could be useful in finding this plane. Unfortunately, #MH370 is a recent example of aviation’s need for better data understanding.

Here’s another recent example: in April 2011, an EF4 tornado ripped through St. Louis, right passed the Lambert-St. Louis International Airport, with at least one airplane reported as  landing due to the Air Traffic Control (ATC) being unaware that the tornado was on the ground. In light of the April 2011 EF4 tornado, my colleague, Mary Johnson, and I are working on ways to assist the aviation industry in handling their data.

Our efforts, approach and prior works are given below.

Unknown to most commercial airline passengers, extensive information is reviewed by the pilot and the dispatcher who must both sign off and agree that the flight is ready for take-off. Both the pilot and the dispatcher must be FAA certificate holders to perform their jobs. Information is aggregated by ATC from sources such as the National Weather Service (NWS), flight tracking websites, airport changes, official notices, pilots, FAA or similar agencies, in an effort to inform air transport personnel about flight scheduling viability. These data may not arrive in a timely manner to make decisions regarding flight dispatch which has a role at take-off, in-flight, and landing to ensure safety and operational control. Twitter may provide a mechanism to improve the timeliness of accessing accurate data, and data not normally available to ATC, dispatch services and/or pilots. Since twitter is publicly accessible and contains user-generated content, the vulnerability, reliability and trustworthiness of its data must be assessed before introducing this information stream into the aviation industry that requires accurate and timely information. This project investigates the use of twitter to improve the data timeliness and possibly increase the data coverage to ATC for air transport personnel, pilots, aircraft dispatchers, airline managers, and airport managers.  As part of the investigations, this project seeks to assess twitter’s vulnerability, reliability and trustworthiness for use in Federal Aviation Administration controlled airspace to augment the data currently available and used by aircraft dispatchers.

We have begun our inquiry by implementing a twitter-based prototype (Marshall, Johnson et al., 2012) that considers 4 major commercial airline carriers and 30 US airports during Hurricane Sandy in October 2012 that captured up-to-the-minute aviation conditions via airport ICAO and IATA codes and keyword analysis (Marshall, Johnson et al., 2013). We implemented K-means clustering and computed the 2 mutual information evaluations. Of the collected tweets’ containing airport codes during Hurricane Sandy, we observe in the largest cluster of the airport codes are not directly located in Hurricane Sandy’s path, indicating that intermediate airports that were affected due to lack of aircrafts and air transport system personnel. The second experiment dealt with keywords. Sample keywords and phrases include jet blue, delta, united, southwest, flight, cancel, flood, storm, weather, sandy, nor’easter, new york, new jersey, philadelphia, washington, west virginia, maryland, food shortage and electricity. Of the collected tweets’ containing these types of keywords, we noticed 2 relatively large groupings including the airline keywords. As expected, the keywords related to Delta Airlines and aviation/flight appear in nearly all the clusters. Surprisingly, the keywords related to Hurricane Sandy only appeared in one cluster implying the tweets centered on the consequences of the storm — not the progression or status of the storm itself.

The current prototype system is written on Chrome’s JavaScript runtime node.js platform for easily building fast, scalable network applications. Between the end of October 2012 to the beginning of November 2012, about 13,000 tweets were collected using the twitter REST API. Our database contains three types of clusters namely Airport, Airlines and, Path.Airport. For our selected airlines, they generally provide flight schedules in one month increments in the form of downloadable PDF and require data format conversion to JSON.

References
  1. Marshall, B., Johnson, M., Magikar A., Ghanekar A., Mathew I., Delaveau L., Budhiraju R., Chapparala R. (2012). Flight Data Analyzer using Twitter.  Journal of Emerging Trends in Computing and Information Sciences, 3(8):1226-1234.
  2. Marshall, B., Johnson, M. and Chunduru, N. (in press). Towards general aviation using Twitter as a virtual aircraft dispatcher. Proceedings of the Conference on Telecommunications and Information Technology. Murray, Kentucky: Information and Telecommunications Education and Research Association.

1 comment: