The first publication of forecasting models for cancellations and no-shows was in 1958 by Beckmann and Bobkoski. They applied three different distributions for total passenger arrival and they made assumptions about demand arrival that may no longer be valid. In the almost sixty years after this publication, a lot has changed in the airline industry. Nowadays, Passenger Name Record-based cancellation and no-show forecasting are commonly viewed in the airline industry as one of the best methods available. These methods are part of revenue
management. The objective of revenue management is to maximize profits; however, airline short-term costs are largely fixed, and variable costs per passenger are small; thus, in most situations, it is sufficient to seek booking policies that maximize revenues (McGill & Van Ryzin, 1999). A revenue management system must take into account the possibility that a booking may be cancelled, or that a booked customer may fail to show up at the time of service (no-show), which is a special case of cancellation that happens at the time of service (Morales & Wang, 2008). Iliescu, Garrow and Parker (2006) studied airline passenger cancellation behaviour and stated that leisure passengers, who are more likely to book further in advance of flight departure, are less likely to cancel than business passengers. However, as the flight nears departure, both leisure and business travellers are more likely to refund and exchange their tickets. The study of Iliescu, Garrow and Parker (2006) points out that cancellation proportions of 30% or more are not uncommon today. Cancellation forecast is one important
aspect of revenue management. Accurate forecasts of the expected number of cancellations for each flight can increase airline revenue by reducing the number of spoiled seats (empty
seats that might otherwise have been sold) and the number of involuntary denied boardings at the departure gate (Lawrence, Hong, & Cherrier, 2003). Another important aspect of revenue management in airlines is overbooking. Overbooking intends to increase revenues by deciding the number of seats to be offered for sale (virtual capacity) such that it maximizes the chance of the aircraft seats being occupied (physical capacity) when the flight departs (Talluri & van Ryzin, 2004). The overbooking levels are based on a cancellation forecast in combination with service criteria to keep the risk of having
too many passengers showing up very small. It is therefore critical to have accurate cancellation rates.
The task of forecasting the probability of cancellation of a single booking can be modelled as a two-class probability estimation problem with the two classes being “cancelled” and
“not cancelled” (Morales & Wang, Cancellation forecasting Using Support Vector Machine with Discretization, 2008). There are different classification techniques one can use to solve
this estimation problem such as Decision Trees (DT), Logistic Regression (LR), Stochastic Gradient Descent (SGD), Support Vector Machines (SVM), Naive Bayes (NB) and Random
Forests (RF). Morales and Wang (2008) showed that decision trees perform better than logistic regression since there are practical limitations on logistic regression. Furthermore, the study by Wang on airline data showed that dynamic decision trees outperform logistic regression in terms of runtime and forecast accuracy (Morales & Wang, Identify Critical Values for
Supervised Learning Involving Transactional Data, 2009). But with the latest developments in machine learning techniques maybe it is possible for logistic regression to outperform the
dynamic decision tree. Modern-day ML techniques are able to deal with a lot of dummy variables. To forecast cancellation rates, KLM uses dynamic decision trees. They experience some
practical problems with these trees, namely, when is it better to make the decision tree less dynamic or less static, what are the best thresholds to create a node, what are the best attributes to consider in the dynamic decision tree model and what is the best pruning method for decision trees? The objective of this report is to investigate whether a decision tree is the best model to predict cancellations. This is examined by comparing the decision tree model with the five other classification models mentioned earlier to see which one performs best. The rest of this report is structured as follows. In the second chapter, the cancellation forecasting problem is described and detailed background information about previous research on this topic is given. Also, some basic terminology used throughout this paper is given. The third chapter describes the real-world dataset used for this research. Also, the attributes are explained. Chapter 4 explains the five different methods commonly used for
binary classification. Besides that, some techniques are discussed. Chapter 5 discusses the main results followed by the conclusion, discussion and the possibilities for future research
in the sixth and last chapter.
To optimize the expected revenue of an airline company, it is essential to have an accurate passenger cancellation forecast. With this forecast the risk of unnecessary empty seats
on a flight will be reduced by overbooking. Overbooking is the fact that the number of seats available for sale is higher than the physical capacity of the airplane. An optimized
overbooking rate leads to reduced expenses due to denied boardings and to reduced revenue loss due to seats that are not sold although there is a demand for those seats (Hueglin &
Vannotti, 2001). In this chapter, we discuss some papers about different cancellation forecasting models proposed in the literature and some basic definitions used in this report
2.1 Existing Forecasting Models
Most of the proposed forecasting models in the literature focus on the no-show case. However, these models can also be used to forecast cancellation rates. Conventional forecasting
methods predict the number of cancellations using time-series methods such as taking the seasonally-weighted moving average of cancellations for previous instances of the same flight leg (Lawrence, Hong, & Cherrier, 2003). Time series forecasting looks at sequences of data points, trying to identify patterns and regularities in their behaviour that might also apply to future values (Lemke & Gabrys, 2008). Weatherford, Gentry and Wilamowski (2002) compared traditional forecasting methods such as moving averages, exponential smoothing
and regression with the neural network method. Neural networks represent a promising generation of intelligent machines that are capable of processing large and complex forms of information (Weatherford, Gentry, & Wilamowski, 2002). Weatherford, Gentry and Wilamowski (2002) concluded that the most basic neural network can outperform the traditional forecasting methods. Lawrence et al. (2003) used two
different passenger-based forecast models to predict no-show rates based on the Passenger Name Record (PNR) and implemented these models by using different classification methods such as Naive Bayes, Adjusted Probability Model
(APM), which is an extension of Naive Bayes, ProbE (based on tree-algorithms) and C4.5 (an algorithm for making decision trees). They have shown that ”models incorporating specific information on individual passengers can produce more accurate predictions of noshow rates than conventional, historical-based, statistical methods”. Neuling, Riedel and
Kalka (2003) also used C4.5 decision tree based on PNRs. Hueglin and Vanotti (2001) used classification trees and logistic regression models to predict the cancellation probability of
passengers. They concluded that ”the accuracy of no-show forecasts can be improved when individual passenger information extracted from passenger name records (PNRs) is used as input”. The three publications mentioned above conclude that making use of PNR data improves forecasting performance. The PNR data mining approach models cancellation rate forecasting as a two-class probability estimation problem (Morales & Wang, Forecasting
Cancellation Rates for Services Booking Revenue Management Using Data Mining, 2009). Popular two-class probability estimation methods are tree-based methods and kernel-based methods. Probability estimation trees estimate the probability of class membership, in our case the probability that a booking will be cancelled or not. Quinlan (1993) developed
an algorithm, C4.5, that generates decision trees. The trees produced by C4.5 are small and accurate, resulting in fast reliable classifiers and therefore decision trees are valuable
and popular methods for classification. In contrast to Provost and Domingos (2003) who concluded that the performance of conventional decision-tree learning programs is poor and therefore they have made some modifications to the C4.5 algorithm. The C4.4 uses information gain criteria to divide the tree nodes and no pruning is used. Fierens, Ramon,
Blockeel and Bruynooghe (2005) concluded that overall the C4.4-approach outperforms the C4.5-approach. However, the trees of the C4.5-approach are much smaller than for the C4.4-
approach. The C4.4 method builds a single tree, however, random forests can improve the predictive performance of a single tree by aggregating many decision trees. Random forests are a combination of tree predictors such that each tree depends on the values of a random vector sampled independently and with the same distribution for all trees in the forests (Breiman, 2001). For a large number of trees, it follows from the Strong Law of Large Numbers and the tree structure that random forests always converge so that overfitting is not a problem (Breiman, 2001). In random forests, the idea is to decorrelate the several trees and then reduce the variance in the trees by averaging them (Random Forests in R, 2017). Averaging the trees helps to reduce the variance and improve the performance of the trees
and eventually avoid overfitting. Kernel based methods make use of kernel functions which map input data points to a
higher dimensional space, such that a linear method in a new space becomes non-linear in the original space and therefore these methods are able to model non-linear relationships between dependent and independent variables (Morales & Wang, Forecasting Cancellation Rates for Services Booking Revenue Management Using Data Mining, 2009). One of the most popular kernel based methods for class probability estimation is Support Vector Machine (SVM). If we have labelled data, SVM can be used to generate multiple separating hyperplanes such that the data space is divided into segments and each segment contains only one kind of data (Machine Learning Using Support Vector Machines, 2017). SVM is able to find the hyperplane that creates the biggest margin between the training points for class 1 and −1 (Hastie, Tibshirani, & Friedman, 2001).
Caruana and Niculescu-Mizil (2006) evaluated the performance of SVMs, logistic regression, naive Bayes, random forests, decision trees and more supervised learning algorithms
on binary classification problems. From the five methods mentioned, random forests were the best learning method overall followed by SVMs. The poorest performing models were logistic regression, naive Bayes and decision trees. However even the best models sometimes perform poorly, and models with poor average performance occasionally perform exceptionally well (Caruana & Niculescu-Mizil, 2006). Rich (2001) did an empirical study of the NB classifier and concluded that ”despite its unrealistic independence assumption, the naive
Bayes classifier is surprisingly effective in practice since its classification decision may often be correct even if its probability estimates are inaccurate”. In this report, we evaluate the performance of decision tree, logistic regression, support
vector machines, naive Bayes and random forests on a real-world data set to forecast cancellation rates. In the next section, the terminology used throughout this report is given.
In this section, we introduce some basic definitions used throughout this report: a cancellation, a no-show, a passengers show-up and denied boarding are described. Passengers are said to cancel when the associated confirmed seat reservations are freed and returned to the inventory for sale. For every passenger, Ck is the random variable associated with the realization of the cancellation indicator, which is equal to 1 if the passenger has cancelled his booking before departure, 0 otherwise. The passenger cancellation probability is denoted as ck = P(Ck = 1). Passengers are said to no-show when their confirmed booking was not cancelled, but they do not show up at the departure time. For every passenger, Nk is the random variable associated with the realization of the no-show indicator, which is equal to 1 if the passenger does not cancel but does not show up at the time of boarding, 0 otherwise. The passenger no-show probability is denoted as nk = P(Nk = 1|Ck = 0). Passengers which do not cancel or no-show are said to show-up. In this case, both the cancellation indicator and the no-show indicator are equal to 0. We denote Sk as the random variable associated with the realization of the show-up indicator. And hence, the passenger show-up probability is
P(Sk = 1) = P(Nk = 0 ∩ Ck = 0) = P(Ck = 0)P(Nk = 0|Ck = 0) = (1 − ck)(1 − nk).
Passengers which show-up with a confirmed reservation, but which cannot be accommodated on a flight are said to be denied boarding.
Information about bookings is available in the form of Passenger Name Records (PNRs), which are typically transferred to a PNR database from an airline’s flight reservation system.
A new PNR is generated whenever a customer makes a flight reservation and contains information such as the creation date, the number of passengers, departure date, ticketing status, price class and many other attributes about the booking. Each time a customer contacts the airline in order to change the state of the booking (confirmation, cancellation, etc.), an additional transaction record is written into the PNR and stored in the reservation system. A PNR may include more than one passenger flying the same itinerary. If one of the
passengers in a PNR decides to deviate from the existing itinerary, then the PNR is split. For this passenger, a new PNR is generated. Each PNR is tagged with a label indicating
whether the booking is cancelled or not; 1 for a cancellation and 0 otherwise. When a PNR is cancelled, all passengers in the PNR have cancelled. This label is used as the target variable
for modelling the cancellation probability. The database that is
investigated contains booking records of flights from KLM and
AirFrance. All the booking records with a departure date between 01.10.2016 and 01.10.2017 are taken from the database, which is almost 92 million bookings. The datasets that are
used in this report are random samples of those 92 million bookings. The PNRs of these data samples were created in the time period between 22.05.2015 and 01.10.2017. Table 3.1
summarizes the characteristics of the datasets. For all datasets the mean cancellation rate is calculated as follows:
Note that the mean cancellation rate is more than 41% for all datasets. To investigate whether the fitted models on each dataset make the same predictions, another sample is generated from the 92 million bookings, which contains 10561 observations. Predictions will be made on this out-of-sample set. Figure 3.1a shows the number of bookings per month and figure 3.1b the number of bookings per day for all bookings in dataset L with a departure date between 01.10.2016 and
01.10.2017. Most flights are booked in January, March and May. December is the least popular month for booking. The number of bookings in weekdays is more or less the same, whereas the number of bookings in the weekends is much lower. Attributes are used to predict whether a PNR is cancelled or not. Table 3.3 summarizes the set of attributes extracted from the PNR database. The class-label attribute, IsCancelled, tells whether a booking is cancelled or not and has two values: 1 if the booking
is cancelled and 0 if not. The rest of the attributes is used to predict cancellations. Figure 3.2 visualizes the influence of three different attributes on the observed cancellation frequency.
The thickness of the bars in the bar charts indicates the relative number of observations in the dataset. This means, if we look at figure 3.2c, there are many observations with value N or V for the PricingClass attribute and only a few observation with value F, O or P. The number of observations for the different values of the DepartureMonth and DepartureDayofWeek attributes is more or less the same. All departure months have on average
the same cancellation probability ≈ 40%, the same holds for the departure day of week. In contrast to these two attributes, the cancellation rate for different price classes is not the same. For example, the Z price class, one of the cheapest chair in the business cabin with flexible cancellation standards, has a higher cancellation rate than the G price class, which is the cheapest chair in the economy cabin without flexible cancellation standards.
Some of the attributes need more explanation. First the attributes IsTrueLocal, TrueOriginAirport, TrueDestinationAirport, KarmaOriginAirport and KarmaDestinationAirport are explained. Suppose that a certain booking consists of a set of two flight legs, for example,
LHR-CDG and CDG-FRA (London-Paris-Frankfurt). The TrueOriginAirport is always LHR and the TrueDestinationAirport is FRA. If both flight legs are executed by KLM or AirFrance, then the IsTrueLocal attribute is 1, the KarmaOriginAirport is LHR (the same as the TrueOriginAirport) and the KarmaDestinationAirport is FRA (the same as the TrueDestinationAirport). If for example only the first flight leg is executed by KLM or AF, then the IsTrueLocal attribute is 0, the KarmaOriginAirport is LHR and the KarmaDestinationAirport
is CDG. And if only the last flight leg is executed by KLM or AF, then the IsTrueLocal attribute is also 0, the KarmaOriginAirport is CDG and the KarmaDestinationAirport is FRA.
The IsOutboundFlow attribute is 1 if the passenger or passengers in the PNR begin their journey, otherwise, the attribute is 0. The NegoSpaceType attribute is a special case
of a group booking and is made by a travel organization. Now suppose that a certain travel organization makes a group booking for fifty passengers, then all these passengers are on
the same flight with the same departure date. This booking is called the ’master’ booking and the NegoSpaceType attribute has value 1. When for example four of these passengers
desire a different departure date, then they are split from the ’master’ booking and a new booking is created. However, this booking is from the original group booking made by the travel agent and therefore the value of the NegoSpaceType attribute is 2. In all other cases, the NegoSpaceType attribute is 0.
Table 3.2 gives the ranges for the attributes LengthofStayRange, NbPaxRange and TimeFrameLabel. The TimeFrameLabel attribute of a booking is calculated by the demand date minus the departure date and gives the booking time in days before departure. Figure 3.3 shows the fraction of cancellation for each time frame. The thickness of the bars in the bar chart indicates
the relative number of observations in the dataset. This means in our dataset there are many observations that are booked between 31 and 90 days before the departure date of the
flight and much fewer observations booked 1 day before departure or even on the departure date. The cancellation rate decreases when the day of departure comes closer. However, a
booking that is made 200 days before departure and is still active 5 days before departure is unlikely to cancel.
All four datasets are stored as a nxp-matrix, where n is the number of bookings and p the number of attributes. These four models, also called the train sets, are used to fit the models.
The out-of-sample set is stored as a nxp-matrix as well and is used to estimate the prediction error of the models. After this, for both the train sets and the out-of-sample set, dummy
variables are created for the explanatory attributes, i.e. all attributes except the class label attribute IsCancelled. Each explanatory attribute returns the number of levels minus 1 as
Example: the explanatory attribute DepartureDayofWeek has 7 levels, namely Monday, Tuesday, Wednesday, Thursday, Friday, Saturday and Sunday. When we turn this attribute into a dummy, 6 variables are created: DepartureDayofWeeki for i = 2, …, 7. Only one or none of these variables can have the value 1, all other variables have the value 0. If all six variables have the value 0, the departure day is Monday and if the variable DepartureDayofWeek2 has value 1, the departure day is Tuesday etc.. With these dummy variables, the train and out-of-sample sets are now nxk-matrices, where n is the number of observations in the train or out-of-sample set and k is the total number of (levels-1) of all explanatory attributes. These matrices contain only zeros and ones. For faster computation in R we ’delete’ the zeros and create a sparse matrix, for both the train and out-of-sample sets. Using the sparse matrix of the train set six different models are fit and the sparse matrix of the out-of-sample set is used to estimate the prediction error of the
model. The next chapter describes the classification models used in this report. The models have to work well with sparse matrices or with many factor attributes with a lot of levels in
order to fit the model.
Methodology and Techniques
In this chapter, the methodology and techniques used in this report are discussed. The first section gives an overview of the classification models that are compared to each other. This is done with the use of five accuracy measures which are discussed in the second section of this chapter. Furthermore, a test for significance is explained in the last section
4.1 Classification models
The task of forecasting the probability of cancellation of a single booking can be modelled as a two-class probability estimation problem with the two classes being “cancelled” and “not cancelled” (Morales & Wang, Cancellation forecasting Using Support Ve
ctor Machine with Discretization, 2008). Classification is a process for predicting qualitative responses. Some of the methods commonly used for binary classification are:
1. Decision Trees
2. Logistic Regression
3. Support Vector Machines
4. Naive Bayes classifier
5. Random Forests
In this chapter, the above five models are described.
4.1.1 Decision Trees
The first model is a decision tree which is an area of data mining techniques. A decision tree is a structure that can be used to divide a large collection of records into successively smaller sets of records by applying a sequence of simple decision rules (Berry & Linoff, 2004). The model construction method proceeds in two steps. In a first step, a tree is grown using a greedy heuristic to create the nodes. The algorithm
starts with a root node containing the entire population and proceeds by recursively selecting an attribute and splitting the nodes into child nodes which bear the same attribute value.
Splitting is performed until a termination criterion is met or there is nothing left to split. Each attribute for a split is selected as the one which locally maximizes a heuristic criterion
known as the gain function. In a second step, the tree can be pruned off some of its nodes and branches. This phase
...(download the rest of the essay above)
About this essay:
This essay was submitted to us by a student in order to help you with your studies.
If you use part of this page in your own work, you need to provide a citation, as follows:
Essay Sauce, Overbooking. Available from:<https://www.essaysauce.com/miscellaneous-essays/overbooking/> [Accessed 26-08-19].
Review this essay:
Please note that the above text is only a preview of this essay.