Font Size: a A A

Analyse logique de donnees pour estimer le taux de presence des passagers en transport aerien

Posted on:2011-06-03Degree:M.Sc.AType:Thesis
University:Ecole Polytechnique, Montreal (Canada)Candidate:Dupuis, ChristineFull Text:PDF
GTID:2441390002963774Subject:Engineering
Abstract/Summary:
The objective of this master's project is to build a model that would improve the accuracy of predictions for show and no-show passengers, and consequently adjust the overbooking levels. The chosen method is known as the "Logical Analysis of Data", also referred to as LAD. Specifically, this method classifies all passengers into three groups: positive (showing up), negative (no-shows) and unclassified. Each of these three groups has its own show rate. The weighted sum of these groups and their show rate results in the total show rate for the evaluated group of passengers.;The LAD method can be broken down into four phases. The first is to gather the data, and ensure that it's all numerical. LAD method does not work on categorical attributes. The second phase is to build a system of cutpoints or a grid of separation. The data is then indexed to fit its position in the grid.;The third phase is the pattern generation. After scanning the first half of the database chosen randomly, the LAD method proposes a list of patterns to the user. A pattern contains a number of conditions or bounded attributes: the choice of boundaries is limited to the cutpoints. The user must fix different parameters in order to guide the pattern generation: homogeneity (proportion of attending passengers in the group), prevalence (proportion of passengers included in the group versus all the passengers of this type), degree (maximum number of conditions that can be used for one pattern). The second half of the database is then evaluated in function of the pattern list, and classified into the three groups.;In order to implement the whole method, the LAD Datascope V2.0, developed by Alexe Sorin, at RUTCOR Center, Rutgers's University in New Jersey, was used. Although both the first two phases can be performed with the software, we chose to develop our own tools, using mostly Excel and Access. The software has then been very helpful for the pattern generation and the evaluation. We also had to develop a few Visual Basic programs to be able to make the comparisons with Air Canada's actual methods of forecast and to classify new data. These programs can per example read the patterns, and classify the data.;This approach was chosen not only for its originality, but also for its success in various sectors. It differs from other conventional data mining methods by its ability to detect combinatory information about the passengers. The input consists of a number of observations (passengers), each described by a vector of attributes derived from characteristics such as booking class, day of the week, departure time, itinerary origin, etc. The LAD method detects sets of conditions on attributes for which the group of passengers respecting these conditions have a significantly higher or lower show rate.;Air Canada's tool for overbooking forecasts, PROS, is based on historical statistics for the flight. When compared to PROS, the LAD method appears to be very competitive. In fact, the sum of squared errors between actual observations and LAD method is only half of the one obtained by PROS. In addition, the standard deviation is reduced to two thirds of PROS value. Examination of the R coefficients of correlation also shows that the LAD method seems superior. The correlation between LAD predictions and actual observations is as high as 0,99635, while PROS coefficient is only 0,97118.;To further reap the benefit of this study, it is strongly recommended to spread the application of this method to other routes (city pairs), but first the method needs to be refined. It is also not used up to its full potential. There are still a few steps of the LAD method that we did not analyze deeply, such as cutpoints systems or selection of patterns from the list to build the model. Another suggestion consists in developing a new LAD model that would not classify the passengers as shows or no-shows, but flights instead. They could be classified into different groups according to their show rate. (Abstract shortened by UMI.)...
Keywords/Search Tags:LAD method, Show, Passengers, PROS
Related items