Framework for determining airport daily departure and arrival delay thresholds: statistical modelling approach

The study derives a framework for assessing airport efficiency through evaluating optimal arrival and departure delay thresholds. Assumptions of airport efficiency measurements, though based upon minimum numeric values such as 15 min of turnaround time, cannot be extrapolated to determine proportions of delay-days of an airport. This study explored the concept of delay threshold to determine the proportion of delay-days as an expansion of the theory of delay and our previous work. Data-driven approach using statistical modelling was employed to a limited set of determinants of daily delay at an airport. For the purpose of testing the efficacy of the threshold levels, operational data for Entebbe International Airport were used as a case study. Findings show differences in the proportions of delay at departure (μ = 0.499; 95 % CI = 0.023) and arrival (μ = 0.363; 95 % CI = 0.022). Multivariate logistic model confirmed an optimal daily departure and arrival delay threshold of 60 % for the airport given the four probable thresholds {50, 60, 70, 80}. The decision for the threshold value was based on the number of significant determinants, the goodness of fit statistics based on the Wald test and the area under the receiver operating curves. These findings propose a modelling framework to generate relevant information for the Air Traffic Management relevant in planning and measurement of airport operational efficiency.

delay thresholds and its effect on drawing such important conclusions about levels and differences between airports. In his recent study (Wesonga 2015) published the first study that attempted to analyse delay thresholds at airport.
This study introduces the concept of threshold to be employed so as to determine the minimum acceptable proportion above which a day is declared a delay-day at an airport. This study is based on our previous work (Wesonga et al. 2012).
In this paper, data modelling was performed through algorithm design to determine an acceptable threshold for airport delay day (Wong and Tsai 2013;Autey et al. 2013). Furthermore, data modelling was done to a limited set of determinants of delay at an airport for the purpose of testing the efficacy of the threshold levels (Wang et al. 2012;Agustin et al. 2012), using Entebbe International Airport as a case study.

Data and methodology
Data for the period of 2004 through 2008 were collected on the variables as shown in Table 1. The aviation and aeronautical meteorology variables known to influence airport delay were carefully chosen and tested for autocorrelation before being applied into the modelling process.
For each day at an airport, there are registered levels of delay. These vary in proportions over time and would be misleading if one performed analysis based on the consideration that any positively registered delay at an airport is actually a delay in its real sense. Some delays are meant to enable an aircraft perform more efficiently throughout its trajectory with minimum disturbances and distortions such as being re-routed through other airports or even being cancelled. Therefore, if not all delays are bad in the real sense, a question of what proportion of delay should be treated as a threshold for computational and modelling purpose became eminent and a subject for this study.

Statistical model framework
Modelling was premised on the fact that different levels of thresholds could dynamically affect the statistical significance of determinants for airport delay. The question of their levels of influence was studied using generalised linear models as demonstrated in Eqs.
(1), (2) and (3). Logistic regression model with dummies '0' for airport's daily on-time performance while and '1' for daily airport delay, constituted the dependent variable (Konishi and Kitagawa 2007;Nerlove and Press 1973). Determining what threshold to apply in this generalised linear modelling was an area of interest for this study. An aircraft is said to have delayed if the difference between the actual and scheduled times of arrival or departure were positive. In this study, a value for the dependent variable change based on what threshold is applied. The threshold start point was a proportion of 1 % and the ultimate being 100 % which implied that on any given day for any reporting based on the chosen proportion (1 through 100 %) of delay, such a day would be classified as a delay-day (DD) otherwise not-delay-day (NDD). Note that the daily proportions of delay were obtained by dividing the number of aircrafts that delay their operation by the total number for such an operation multiplied by one hundred; the operations could be departures or arrivals.
Furthermore, a logistic regression model, known to estimate the probability with which a certain event would happen or the probability of a sample unit with certain characteristics expressed by the categories of the predictor variables, to have the property expressed by the value 1 representing an airport's delay day was employed. Estimation of the probability was done by the logistic distribution as in Eq. (2), where β's are the regression coefficients of the categories to which the sample unit belongs.
The following formulation was deemed as appropriate for modelling departure and arrival delay.
where β j represent coefficients of the model; X i = X i1 , X i2 , . . . , X ip represent a set of explanatory variables.
The logit ln π(X i ) 1−π(X i ) on the left hand side of Eq. (1) represent the logarithm of the odds ratio which symbolize the conditional probability for DD given a set of explanatory variables and its determinants were subsequently tested for significance of the underlying relationship.
Therefore, the odds are exponential function of X i that provided a basic interpretation of the magnitude of the coefficients. Positive β j 's imply an increasing rate while negative β j implies a decreasing rate and in either way, the magnitude of β j show the effect or level of contribution towards determining DD. On the contrary, if β j = 0 then the airport's DD was said to be independent of X i .
Note that the values 0 ≤ π(X i ) ≤ 1 represent the probability of delay-day based on a set of meteorological and aviation parameters as shown in Table 1.
Since the logistic regression model is known to exhibit a curve rather than a linear appearance, the logistic function implied that the rate of change in the odds π(X i ) per unit change in the explanatory variables X i varied according to the rela- For example, if the odds of the proportion of delay π(X i ) = 1 2 and the coefficient of the number of 'scheduled flights' β = 0.46, then the slope The value 0.115 represents a change in the odds of departure delay, π(X i ) per unit change in the number of 'scheduled flights' . In simpler terms, for every 100 scheduled flights at Entebbe International Airport, 11 delay to departure. The R platform for statistical computing scientists (Chambers 2008; Dalgaard 2008) was applied because of its known strengths in computing that include, but not restricted to: the most comprehensive statistical analysis package available because it incorporates all of the standard statistical tests, models and analyses, as well as provides a comprehensive language for managing and manipulating data.

Data structure
Over the period under study, on every day, the total number of aircrafts departing and arriving at Entebbe International airport was recorded. For each departure and arrival, each aircraft's operational performance was assessed in terms of the scheduled and actual times and thus categorised accordingly. Thus, on every day and for every N aircrafts at the airport, there were N D and N A departures and arrivals respectively. And for every N D and N A , some N Dd or N Ad and N Dt or N At were computed to represent either departure or arrival delays and on-time departure or arrival respectively. Therefore, on an ith day, the following computations were derived where the proportions for daily aircraft departures and arrivals were computed on the one to one relationship; Subsequently, for any ith day, a decision was taken to categorise it as a delay-day, DD or not a delay day, NDD based on a set of delay thresholds dT = {10, 20, 30, 40, 50, 60, 70, 80, 90, 100}. However, the decision to determine a DD for departures and arrivals was based on the following one to many comparisons below; The delay thresholds dT = {10, 20, 30, 40, 90, 100} were found inappropriate to model because they are logically not suitable since values for delay proportions less than 50 % could imply that on time performance was more than delay and 90 with 100 % tended to imply that all flights delayed, which in our case study did not arise on any day.

Descriptive statistics for the dependent dummy threshold levels
To be able to employ the logistic regression modelling approach, we thus created dummy variables for departure and arrival for each of the four candidate delay thresholds dT = {50, 60, 70, 80} as dT = dT 50, dT 60, dT 70, dT 80 and aT = {aT 50, aT 60, aT 70, aT 80} respectively. Table 2 shows the descriptive statistics for the candidate departure and arrival delay thresholds. From Table 2, examining the candidate thresholds for departure delay descriptive statistics, for one to get an unbiased threshold, it was desirable that the statistics point at the middle values as much as possible. In the event that there was no one candidate presenting the desired exact middle values, then the threshold candidate with values approximating the middle characteristics was preferred. Therefore, preliminary findings in this study based on the actual operational data at Entebbe International Airport both for departure (X = 0.499; SE = 0.012) and arrival (X = 0.363; SE = 0.011) delay thresholds propose for recommendation a delay thresholds of 60 % (Ivanov et al. 2012).

Algorithm for determination of thresholds for departure and arrival delays
In Table 3, a set of processes for the algorithm employed to take care of the computational procedure of the study is presented. Table 4 presents the adjusted odds ratios for the logistic models under different prior thresholds showing the levels of significance for the determinants of departure delay. All the four threshold values were assumed with the Wald goodness of fit test-statistics computed for each model representing a certain threshold level. The areas under the ROC curves were presented.

Departure delay determinants
The effects of parameters on departure delays was examined as shown in Table 4. Model coefficients were examined for all determinants of departure delay that were  (50,60,70,80). The Wald test-statistics were examined for each model for statistical significance at the four candidate threshold levels. The criterion for selection of the best model and thus the most appropriate threshold level was done based on the variable qualities; besides the Wald test-statistics and the area under the ROC curve as shown in Fig. 1. As a result, the delay threshold of 60 % was found to generate the best model, followed by 70, 50 and 80 % respectively. Table 5 presents models at the different levels of significance for determinants of arrival delay. All the four threshold values (50,60,70,80) were assumed and estimates of the logit model computed at every level. The Wald test-statistics were examined for each model and statistical significance for the predictors at the four candidate threshold levels. The quality of variables; the Wald test-statistics and the area under the ROC curves as shown in Fig. 2 were applied to determine the best model. As a result, the delay threshold of 60 % was found to generate the best model, followed by 70, 50 and 80 % respectively.

Discussions and conclusions
We explored modelling approach premised on the binary logistic regression to determine a better level of delay threshold that optimally evaluates the dynamics of air traffic delay during departure and arrival at an airport (Santos and Robin 2010). Four different

Table 3 General algorithm for determining suitable thresholds for departure and arrival delays
Step number Step model description    (Wesonga and Nabugoomu 2014;Helmuth et al. 2011). These findings are significance in two ways; first, to the air traffic flow managers that daily proportions of aircraft delay below the 60 % threshold level could be considered normal operations. Therefore, such daily delays may be attributed to normal airport operational such as the turn-around time before actual departures and arrivals. Secondly, to the other aviation stakeholders including air passengers, the higher threshold level would indicate inefficiency of traffic flows. Comparison of air traffic flow inefficiencies based on the findings for departures are in the threshold order of 60 %, then 70 % compared to arrival threshold of 60 % followed by 50 % indicating that traffic flow at arrival was less inefficient than that during departure since arrivals permitted lower threshold level than departures (Wesonga et al. 2013;Zheng et al. 2010).
Besides, comparing aircraft flow performance between daily departures and arrivals, this framework is candidate to providing methodology for assessment and ranking of airports based on their departure and arrival operational efficiency. Airports with derived higher delay thresholds would be assessed as operationally more inefficient than those with lower delay thresholds (Chou 2009;Wei et al. 2011). Therefore, a multiairport analysis based on this framework is recommended as a possible area of further analysis and application of the derived framework of this study (Mukherjee and Hansen 2009;Bianco et al. 2001).