On studentized residuals in the quantile regression framework

Although regression quantiles (RQs) are increasingly becoming popular, they are still playing a second fiddle role to the ordinary least squares estimator like their robust counterparts due to the perceived complexity of the robust statistical methodology. In order to make them attractive to statistical practitioners, an endeavor to studentize robust estimators has been undertaken by some researchers. This paper suggests two versions of RQs studentized residual statistics, namely, internally and externally studentized versions based on the elemental set method. The more preferred externally studentized version is compared to the one based on standardized median absolute deviation (MAD) of residuals using a well-known data set in the literature. While the MAD based outlier diagnostic seemed to be uniform and more aggressive to flagging outliers the RQ externally studentized one exhibited a dynamic pattern consistent with RQ results.

Outliers (unusual observations in the Y-space) can adversely influence the regression model fit thereby invalidating the pertinent statistical inferences (see e.g. Rousseeuw and Leroy 2003; Barnett and Lewis 1998). The Koenker and Basset (1978) regression quantiles (RQs) are fairly robust to outliers as their influence functions are bounded in the Y-space. As a result, not only have RQs been employed as alternatives and complementary tools to the OLS estimator but also in robust outlier detection techniques (Portnoy 1991). These detection methods are based on a two-fold approach, namely, the "peeling" of observations fit exactly by extreme RQs and those based on RQ computation, i.e., observations lying below the RQs hyperplanes q Y |x (τ ) and/or lying above q Y |x (1 − τ ) corresponding to β(τ ) and β(1 − τ ), τ ∈ (0, 1), respectively (see expression (7)) may be identified as outliers. Complemented by the ordinary least squares (OLS) one consequence of the latter approach is the Ruppert and Carroll (1980) regression trimmed mean estimator. Outliers in the X-space are referred to as high leverage points. A worse outcome can result if outliers are further coupled with high leverage points in a data set than when either data aberration manifests alone, especially in the case of RQs. This stems from the fact that RQs are very susceptible to high leverage points since their influence functions are unbounded in the X-space. This curtails their effectiveness to detect outliers that are also high leverage (outlier-leverage) points due to the not yet so well-perceived trade-off between the RQs high affinity for high leverage points and their exclusion of (resistance to) outliers. Studentization may be a solution as it involves incoorperating some X-information.
Most of the existing outlier diagnostics in the RQ framework are in relation to the global orientation (centre) of the data and not relative to each quantile level τ ∈ (0, 1) , i.e., a conditional quantile model, Q Y |X (τ ), especially extreme ones. Very few quantile level specific diagnostics exist. One such single case outlier diagnostic in existence is based on the standardized median absolute deviation (MAD) of residuals (Huber and Ronchetti 2009). Given that it is well-known that regression outlier diagnostics do not always agree in flagging outliers the conventionally agreed practice of employing a wide spectrum of diagnostics before the analyst arrives at a verdict cannot be exercised in the RQ framework. The focus of this paper is to contribute by adding some new outlier diagnostics to the few existing ones in the RQ framework and further bring in the OLS's attractiveness to this framework via studentization of residual statistics. This is a convenient approach as RQs have a common link with the OLS estimator that can be fruitfully exploited. This link exists via the elemental set (ES) method (Hawkins et al. 1984). So a studentized residual statistics are suggested for RQs here based on the ES method.
An ES consists of exactly the minimum number (p) of observations to fit the regression model parameters. Such a proposal is motivated by the fact that the basic optimal solution of a linear programming (LP) problem giving a RQ coincides with the p points of an ES (see Koenker and Basset 1978, Theorem 3.1; Ranganai 2016). Applying the OLS procedure to the p ES observations yields a specific elemental regression (ER). Thus RQ leverage and residual statistics and ER ones are identical. A deterrent to employing the ES method is the possibly huge load involved in computing all the K = n p .
However, the number of LP optimization solutions giving RQs is approximately equal to n < K. Thus the ES approach benefits from the existence of efficient LP optimization algorithms giving RQs as solutions. Also, it is shown that the suggested RQ studentized residual statistics follow a t distribution from which a wide spectrum of cut-off values can be obtained like their OLS based counterparts. These are desirable attributes for the practitioner.
In summary the motivations for the development of studentized outlier diagnostics in RQ frame work, are the following: • Very few RQ τ level specific outlier diagnostics with the efficacy to deal with all outlier configurations currently exist in the literature. Therefore the conventionally accepted practice of employing a wide spectrum of diagnostics cannot be carried out in the RQ framework unless more get developed. • Use of of efficient LP algorithms lessens the possibly huge load involved in computing all the K ESs as approximately n < K RQs from the LP solutions are of interest to this study. • Ease of implementation via OLS and the existence of a wide spectrum of cut-off values from the t distribution brings in the attractive of OLS to practitioners. • There is need to develop more single case outlier diagnostics in light of the not so well perceived opposing phenomena between outlier and high leverage behaviours in outlier-leverage points. • Outlier-leverage points may be identified better using outlier diagnostics as the suggested studentized diagnostics have some leverage (X information) inherent in them unlike the entirely residual (Y information) based ones.
Motivated by this background, this paper suggests outlier diagnostics based on studentization and ER. The rest of the paper is organized as follows; Some OLS leverage statistics and residuals are elaborated on in the next section; RQ leverage statistics and residuals are discussed in "Regression quantiles leverage statistics and residuals" section; "Studentized residuals in the quantile regression scenario" section dwells on the construction of the suggested RQ studentized residual statistics; Applications are given in "Applications" section while conclusions are given in the last section.

Some OLS leverage statistics and residuals
Consider the linear regression model, where Y is an n × 1 vector of response observations, 1 n is an n × 1 vector of ones, X is an n × (p − 1) matrix of predictor variables, β is a (p − 1) × 1 vector of regressors, ε is an n × 1 vector of errors, ε ∼ N n 0 n , σ 2 I n , 0 n is an n × 1 vector of zeros, and I n is an n × n identity matrix. The ith OLS residual is given by It is well-known in the literature that the analysis of (raw) residuals (2) is far less potent in flagging outliers than the analysis of their transformed versions. (1) There are four versions of transformed residuals most frequently employed to identify outliers in the literature. We list them here in order of increasing effectiveness. These are the normalized, the standardized, the internally studentized and externally studentized residuals. The standardized OLS residuals are given by where σ = √ MSE with MSE = SSE/n − p and SSE denoting the usual OLS sum of squares of the error terms.
Finally, the externally studentized residuals follow from substituting σ in (3) by Another version of the residuals that is often used to assess prediction are the jackknife (predicted) residuals The jackknife residuals have been found to be more effective than the OLS ones in assessing prediction and flagging outliers in the literature (see e.g. Myers et al. 2010). The predicted sum of squares gives the well-known PRESS statistic, In the next section some of the analogues of the OLS statistics discussed here are adapted to the RQ scenario.

Regression quantiles leverage statistics and residuals
The τ th RQ based on the linear model is a solution to the linear programming (LP) problem The basic optimal solution to this LP problem (7)  where X J is p × p and X I is (n − p) × p matrices. Let X J Y J be a generic ES, then K = n p is the number of ESs. The subset J corresponds to the set of subscripts {h 1 , ..., h p } such that (x ′ hi , y hi ), i = 1, ..., p, is the the ith case of ES J. Applying OLS to an ES based on a subset J of size p of the original data results in the following vector of regression coefficients estimates where X J is a square matrix and assumed to be nonsingular. Since a RQ solution of (7) corresponds to ER (8) then their leverage statistics and residuals are identical.
RQ/ER leverage statistics are the diagonal elements of the matrix The statistic h iJ , i � ∈ J is referred to as the ER predicted (ERP) leverage. Note that this statistic is the jackknife analogue of the ith diagonal element The RQ/ER residuals are given by The residuals e iJ , i � ∈ J which are the analogues of the jackknife (predicted) residuals (6) are referred to as elemental predicted residuals (EPRs). EPRs have has variance Following from this variance, Hawkins et al. (1984) referred to h iJ , i / ∈ J as the residual freedom, to "convey the impression of its property of measuring the extent to which the elemental set J fails to predict Y i . " Consequently e iJ /σ √ 1 + hiJ , i / ∈ J ∼ N (0, 1).

Summing the EPRs gives the analogue of the PRESS statistic
Residual analysis in the ER case is redundant, since the ER (internal) residuals suffer from the exact fit property, i.e., the (internal) residuals are constants (zeros), and hence, Var e iJ = σ 2 1 + h i J for i / ∈ J .
the same applies for the RQ case. However, the external ones, i.e., ER predicted (ERP) residuals which are the analogues of the jackknife (leave one observation out) residuals are useful. Similarly ERP leverage is also useful. Thus in the next section RQs studentized residuals are constructed using ERP residuals and ERP leverage values.

Studentized residuals in the quantile regression scenario
In this section we construct a version of studentized residuals for RQs. We do this by first suggesting a scaled version of the RQ predictive residuals (EPRs), where J τ denotes the ES corresponding to the τ th RQ for τ ∈ (0, 1) since we are only interested in RQs (ESs corresponding to RQs). The statistic σ (J τ ) is the scaled prediction variance with the p observations left out corresponding to a RQ (ER) J τ left out, i.e.
where PRESS ′ J τ = i / ∈J τ e 2 iJ τ /(1 + h iJ τ ) and α = 2p accounting for the p parameters as well as the p ER observations left out corresponding to e iJ τ = 0 for i ∈ J τ . In line with the literature convention the RQ externally studentized residuals or externally studentized EPRs (SEPRs) should be based the jackknife residual variance i.e., with the ith observation left out. This statistic is given by where ε iJ τ = e iJ τ / √ 1 + hiJ τ , i / ∈ J τ to flag outliers. The internally studentized version is given by The distributions of the these statistics ((14) and (15)) are given by Theorems 1 and 2 from which we determine the appropriate cut-off values.

Theorem 1 Under model (1) the RQ externally studentized residuals
Substituting (12) into θ i , we have (11) The second factor can be simplified as The denominator component in the square root sign can be expressed as where PRESS ′ (i)J τ = j� =i e 2 jJ τ /(1 + h jJ τ ) = j� =i ε 2 jJ τ , for i, j / ∈ J τ . Then Multiplying this result by the first factor in υ iJ τ we have Therefore since ε iJτ Theorem 2 Under model (1) the RQ studentized internally residuals υ iJ τ ∼ t(n − 2p).
Proof The proof follows from that of Theorem 1 by substituting (n − α − 1) −1 PRESS ′ (i)J τ with (n − α) −1 PRESS ′ J τ for the estimated EPR variance. Thus the final result becomes since ε iJτ σ = e iJ τ /σ √ 1 + hiJ τ ∼ N (0, 1) and 1 σ 2 (n−α) PRESS ′ J τ ∼ χ 2 (n − α). Taking α = 2p we have Therefore the appropriate Bonferroni critical values are t(1 − α/2(n − p); n − 2p − 1). The advantage of these critical values is that the Bonferroni method is simple and allows many comparisons to be made simultaneously while still maintaining an overall confidence coefficient. In the literature externally studentized diagnostics are shown to outperform their internal versions counterparts. Therefore it is preferred here to compare the externally SEPR υ (i)J τ 's outlier flagging pattern to the MAD version in the SAS QUANTREG procedure. Using the MAD based version of the RQ predicted residuals, outliers are identified as where the multiplier k usually takes values, 3, 4 or 5. The scale parameter σ m is the corrected median of absolute values σ m = median |e iJ τ |/θ 0 , 1 ≤ i ≤ n , where θ 0 = � −1 (0.75) is an adjustment consistency with the normal distribution.
In the next sections the flagging rate of outliers based on this cut-off value in expression (16) and the ones from (14) based on critical values of the t distribution are compared using the Hocking and Pendleton (1983) data set.

Applications
In this Section we consider the Hocking and Pendleton (1983) data set. This data set is a plausible candidate to study the efficacy of the SEPR in flagging outliers as it has various various outlier and high leverage scenarios that are both easy and challenging to deal with in the RQ framework. These include a very high leverage observation 24, an outlier in 17 and two outlier-leverage points 11 and 18 with varying degrees of high leverage. Observation 24 will almost always be included in the ES corresponding to RQs due to RQs affinity for high leverage points. Thus it will often have a zero residual while observation 17 will almost always be excluded in this ES and will often have a very large residual. The challenge is on outlier-leverage points 11 and 18 which will depend on the trade-off of the two antagonistic phenomena, namely, the RQs' affinity for leverage points versus their exclusion (resistance) to outliers.
It is well-known that externally studentized residual statistics always perform better than their internally studentized counterparts since (5) and (14) are based on σ (i) and (16) e iJ τ ≡ non outlier, if e iJ τ ≤ k σ m outlier, Otherwise, σ 2 (i)(J τ ) which are both more robust to problems of gross errors in the ith observation than σ 2 and σ 2 (J τ ) on which (4) and (15) are based, respectively (Chatterjee and Hadi 1988, pg 79). Therefore the externally studentized residual criterion (14) is compared to the robust version one based the standardized MAD of residuals (16). Criterion (16) is the only single case similar RQ level related outlier diagnostic with which to validate the efficacy of (14). Firstly the robust and multivariate location and scale diagnostics computed using the minimum covariance determinant (MCD) method of Rousseeuw and Driessen (1999) are applied to circumvent the masking and swamping phenomena so as to expose all the single case high leverage points and outliers. The resulting diagnostic outcome is given in Fig. 1.
Remark ESs Corresponding to RQs are the p = 4 observations (with zero residuals) in the basic optimal solution of LP problem (7) obtained using effeicient linear programing algorithms.
The two outlier diagnostics do not always agree as is the norm in any regression diagnosis outcome using different diagnostics. Observation 24 with the highest leverage and non outlying is never flagged at all. The major difference to note here is the uniform flagging exhibited by (16) from τ = 0.2046 to τ = 0.8276 and only otherwise in very extreme τ levels. It is hard to conceive that results for below and above τ = 0.50 are similar to this extent. This is inconsistent with the well-known outcome of RQ results due their ability to capture the changing conditional distribution of the response variable, Y given the predictor factors, X at different quantile levels (Chamberlain 1994;Cade and Noon 2003). On the other hand criterion (14) has a dynamic pattern consistent with RQs results as expected.

Conclusion
The version of the studentized RQ predicted residuals (SEPRs) suggested here are useful and of benefit to statistical practitioners as they add to the few existing single case outlier diagnostics in the RQ scenario. Further, the methodology is easy to implement as they have cut-off values that parallel the OLS based versions. Thus they offer alternatives to non-specialists who may fight it too hard to comprehend the robust outlier detection methodology. However, if possible these diagnostics must be used together as recommended by Tukey (1979).