Some classes of estimators in the presence of non-response using auxiliary attribute

In this paper, possible solutions of problem of non-response in the variable of interest are proposed when information about an auxiliary attribute is available. By taking motivation from the previous work, modified classes have been suggested for estimating population mean. Two new generalized classes of estimators are presented along with their asymptotic biases and variances. Efficacy analysis of the suggested classes is acquired with the usual regression estimator. Two real examples have been provided to show the efficiency of the proposed design approach and comparison of suggested estimators with the linear regression estimator.

suggested regression type estimators using auxiliary attribute in presence of non-response. According to best of our knowledge little attention is given to the problem of non-response using information on auxiliary attributes.
The aim of this paper is to provide solution to this important problem. At this end, two generalized classes are suggested for the unknown population mean along with the supplementary information about the proportion P of the population units having the auxiliary attribute φ. Taking motivation from Shabbir and Gupta (2007), Koyuncu (2012) and Singh and Solanki (2012), some modified classes are also suggested in the existence of missing information.
Let a finite population U of N distinctive units. Let Y be the variable of interest having values y i , i = (1, . . . , N ) with unknown mean Ȳ = N i=1 y i /N and unknown variance S 2 y = N i=1 (y i −Ȳ ) 2 /(N − 1) assuming that non-response occurs in Y. Let φ be the auxiliary attribute correlated with Y having values φ i , i = (1, . . . , N ). Consider φ i = 1 , if the ith unit of the population has attribute φ and 0, otherwise. Let A = N i=1 φ i and a = n i=1 φ i be the total number of units in the population and in the sample having attribute φ, P = (A/N ) and P = (a/n) denote the proportion of units in the population and in the sample having attribute φ. Let P is known and used to estimate the mean Ȳ . Hansen and Hurwitz (1946) suggested the following sub-sampling technique in presence of non-response. Let a large sample u of size n (n < N ) by simple random sampling without replacement (SRSWOR) to collect information on Y. Assuming at first phase, Y can be observed only for n 1 units out of n and, the remaining n 2 = n − n 1 units are taken as non-response. A sub-sample of size r = n 2 /k, k > 1 is selected from non-response units where r would be an integer or must be rounded. Assuming that all r selected units give full response on second call. In this fashion, the population is said to be divided into two groups U 1 and U 2 of sizes N 1 and N 2 , where U 1 is a group of respondents that would give response on the first call at second phase and U 2 is non-respondents group which would respond on the second call. Obviously N 1 and N 2 are unknown quantities.
Considering the above situations, Hansen and Hurwitz (1946) have suggested the following estimator for population mean, where As well known ȳ * is an unbiased estimator of Ȳ where The variance of ȳ * is given by (1) y * = d 1ȳ1 + d 2ȳ2r , (2) where We can write (2)

Modified classes
In this section, we have considered Shabbir and Gupta (2007), Koyuncu (2012) and Singh and Solanki (2012) classes because those are more efficient than the regression estimator. Shabbir and Gupta (2007) suggested a class of ratio estimator for the population mean Ȳ using known information of the auxiliary attribute. Later Singh and Solanki (2012) and Koyuncu (2012) proposed classes on the same subject using different known population parameters such as (ρ pb , C p ) etc. Therefore, taking motivation from the work of the just quoted authors, we modify their classes in frame work of non-response on Y .
(3) y * reg =ȳ * + w(P −P), Darda SpringerPlus (2016) 5:1271 First modified class Shabbir and Gupta (2007) proposed the class of estimators where w 1 and w 2 are suitable weights. If we assume that there is non-response on Y, the class (6) can be modified as For an easy computation of the bias (B) and the mean square error (MSE) of ȳ * M1 up to first order approximation, it is convenient to express (7) in terms of δ's where and then use Taylor expansion of (8). So, remembering that we have and The mean square error of ȳ * M1 will be minimum and (6) y SG =ȳ w 1 + w 2 (P −P) P P ,

Koyuncu (2012) suggested a regression-cum-ratio class
where w 1 and w 2 having the same expressions defined earlier, η and ψ are either real numbers or functions of the known parameter associated with an auxiliary attribute such as C p , β 2 (φ) and ρ pb . Of course the aim was to increase the performance using more information.
We can modify the class (12), assuming non-response on Y The bias and the MSE of ȳ * M2 to the first order of approximation can be written as and where τ = ηP ηP + ψ .
The mean square error of ȳ * M2 is minimized for and To emphasize the comparison with the regression estimator (3), we can express the minimum MSE of ȳ * M2 as

Third modified class
Singh and Solanki (2012) proposed the class (12) where γ is a real number, δ is an integer which takes values +1 and -1 for designing the estimators and keeping (w 1 , w 2 , η, ψ) same as defined before. Note that Shabbir and Gupta (2007) class is a member of this class. We can modify the class (17)

Suggested classes
In this section, we have introduced two general classes of estimators for the population mean assuming (as in the previous Section) non-response occurs in the study variable with known information on the auxiliary attribute. The first suggested class is obtained starting from a generalization of the third modified class, defined in (18), while the second class is the result of motivation from Diana et al. (2011).

First class
(22) y S1 =ȳ * (w 1 + w 2 u)g(u), where u = (P −P), w 1 and w 2 are constants to be chosen properly and g is a generic function that satisfy the following mild conditions • g is continuous and bounded in a neighborhood of zeros.
• g is a three times differentiable function with continuous and bounded derivatives.
Expanding g(u) using Taylor's series up to order o p (u 2 ), the resulting expression for the class is given by where g(0) is a constant term, g ′ (0) is first order partial derivative of g(u) in zero and g ′′ (0) is second order partial derivative in zero. For sake of simplicity, we can write

Now expressing (23) in terms of δ's
The bias and the MSE of ȳ S1 up to the first order of approximation are and Minimizing (26) with respect to w 1 and w 2 , we obtain and (23) where and Hence, with the help of above notations, one can write minimum MSE of ȳ S1 as follows where and

Second class
Motivated by Diana et al. (2011), we consider the class where u and g(u) are explained in earlier section. If we expand g(u) again using Taylor's series, ȳ S2 becomes The bias and the MSE to the first order of approximation can be written as By minimizing MSE(ȳ S2 ), one can get the optimum values of the constants w 1 and w 2 and Therefore, It is observed that Rao (1991) class is a member of Diana et al. (2011). Also in our case, if considered g(u) appears as an identity function, the class (28) reduces to that is the corresponding version of Rao (1991) class when non-response is present. The bias and the MSE of ȳ * S2(R) can be written as and The optimum values of constants w 1 and w 2 are and (31) The minimum MSE of ȳ * S2(R) is given by From (36) is clear that the estimator ȳ * S2(R) performs always better than ȳ * reg , in their optimal case.
From (16) and (36), it is easy to see that the minimum MSE of ȳ * M2 becomes equal to the minimum MSE of ȳ * S2(R) when τ = 0. One can also observe that regression-cum-ratio estimator (13) may perform better than regression estimator for different choices of τ. Note that the MSE of ȳ S2 looks like the MSE of Diana et al. (2011), however, their work is related to complete information for the study variable Y and the auxiliary variable X. But the class ȳ S2 highlights the non-response problem especially when someone estimates the population mean using information of the auxiliary attribute.

Choice of function g
The performance of the proposed classes ȳ S1 and ȳ S2 depends upon the selection of function g. Careful choice for g is a crucial factor and it requires deep insight both from theoretical and practical point of view. There are many possible choices for g but we consider only ratio and exponential function, because they are found good choices from both theoretical and practical point of view.
(i) Consider a ratio type function suggested by Singh and Solanki (2012) assuming γ = 1 Expanding g(u) by Taylor's theorem, we get When δ = 1, g(u) is similar to function considered by Koyuncu (2012). When η = 1 and ψ = 0 then g(u) become similar to the ratio function suggested by Shabbir and Gupta (2007).
(ii) Consider an exponential function Using Taylor's theorem to expand g(u), we have Then ȳ S1 and ȳ S2 can be written as and The minimum MSE of ȳ * S1(2) and ȳ * S2(2) are given by and

Efficiency comparisons
In this section, efficiency of the proposed estimators on the basis of their minimum mean square error has been evaluated by analyzing the performance of estimators, when possible, specially numerically. It is well known that the regression estimator ȳ * reg is always more efficient than the Hansen and Hurwitz (1946) estimator (for instance, see (2) and (4)). For this reason we make efficiency comparison of the proposed classes with the regression estimator. From the comparison of (4) with (32), after some computation, one can get This expression will be certainly > 0 if 2a 0 c 0 − 3b 2 0 ≥ 0 and hence ȳ S2 is more efficient than the regression estimator. Now making comparison of ȳ * M2 and ȳ * S2(R) with ȳ * reg and Remark It can be observed from (4) and (16) and, from (4) and (36) respectively, that these expressions are always positive. Furthermore, it is not easy to make analytical comparison for ȳ * M1 ,ȳ * M3 ,ȳ S1 .
So in this Section, we make numerical comparison of modified and suggested classes using two population data sets as earlier considered by Shabbir and Gupta (2007), Abd-Elfattah et al. (2010) and Koyuncu (2012).
From Tables 1 and 2, it is observed that the estimator ȳ * M2 with different values of η and ψ performs similar like ȳ * S2(R) . It should be noted that the estimator ȳ * S2(2) with exponential function perform better than the estimators ȳ * M2 and ȳ * S2(1) with ratio function. After careful y = Number of villages in the circles. φ = A circle consisting of more than five villages. N = 89, n = 23,Ȳ = 3.36, P = 0.124, C y = 0.601, C p = 2.678, ρ pb = 0.766, β 2 (φ) = 6.612. analysis of performance of ȳ * M3 , it is observed that different possible values of η and ψ increase the efficiency of the estimator. For (η = 1, ψ = 0), the estimators ȳ * M1 and ȳ * S1(1) perform similar as expected. The PRE of ȳ * S1(2) is higher than those of ȳ * M1 , ȳ * M3 and ȳ * S1(1) which leads to the conclusion that exponential function may be a better choice than ratio. It can be seen that as the inverse sampling rate k increases, the PREs of the estimators ȳ * M2 ,ȳ * S2(R) ,ȳ *