 Research
 Open Access
 Published:
Some classes of estimators in the presence of nonresponse using auxiliary attribute
SpringerPlus volume 5, Article number: 1271 (2016)
Abstract
In this paper, possible solutions of problem of nonresponse in the variable of interest are proposed when information about an auxiliary attribute is available. By taking motivation from the previous work, modified classes have been suggested for estimating population mean. Two new generalized classes of estimators are presented along with their asymptotic biases and variances. Efficacy analysis of the suggested classes is acquired with the usual regression estimator. Two real examples have been provided to show the efficiency of the proposed design approach and comparison of suggested estimators with the linear regression estimator.
Introduction and literature review
In sample surveys while estimating an unknown population parameter of the study variable, use of the auxiliary information increases the efficiency of the estimates. In many real scenarios, some auxiliary attributes are highly correlated with the study variable, for instance, person’s weight and gender, cow’s amount of produced milk and breed, crop production and seed type, income and ownership of a house, etc. Many authors have suggested estimators based on auxiliary attributes in simple random sampling or two phase sampling [see Naik and Gupta (1996), Jhajj et al. (2006), Singh et al. (2008), Shabbir and Gupta (2010), AbdElfattah et al. (2010), Singh and Solanki (2012), Koyuncu (2012) and references cited therein]. Sometimes people might refuse to reveal their weight or income, or some information are missing because of respondents’ non availability at time of the survey. These situations are related to nonresponse problem when objects are unavailable or people refuse to answer. In the literature, much work has been done in the existence of nonresponse problem in the variable of interest and in single or two phase sampling scheme [see Khare and Srivastava (1997), Okafor and Lee (2000), Singh and Kumar (2010), Riaz et al. (2014) and references cited therein].
Diana et al. (2012) suggested regression type estimators using auxiliary attribute in presence of nonresponse. According to best of our knowledge little attention is given to the problem of nonresponse using information on auxiliary attributes.
The aim of this paper is to provide solution to this important problem. At this end, two generalized classes are suggested for the unknown population mean along with the supplementary information about the proportion P of the population units having the auxiliary attribute \(\phi\). Taking motivation from Shabbir and Gupta (2007), Koyuncu (2012) and Singh and Solanki (2012), some modified classes are also suggested in the existence of missing information.
Let a finite population U of N distinctive units. Let Y be the variable of interest having values \(y_i\), \(i=(1,\dots ,N)\) with unknown mean \(\bar{Y}=\sum _{i=1}^Ny_i/N\) and unknown variance \(S_y^2=\sum _{i=1}^N(y_i\bar{Y})^2/(N1)\) assuming that nonresponse occurs in Y. Let \(\phi\) be the auxiliary attribute correlated with Y having values \(\phi _i\), \(i=(1,\dots ,N)\). Consider \(\phi _i=1\), if the ith unit of the population has attribute \(\phi\) and 0, otherwise. Let \(A=\sum _{i=1}^N{\phi _i}\) and \(a=\sum _{i=1}^n{\phi _i}\) be the total number of units in the population and in the sample having attribute \(\phi\), \(P=\left( A/N\right)\) and \(\hat{P}=\left( a/n\right)\) denote the proportion of units in the population and in the sample having attribute \(\phi\). Let P is known and used to estimate the mean \(\bar{Y}\). Hansen and Hurwitz (1946) suggested the following subsampling technique in presence of nonresponse. Let a large sample u of size n \((n<N)\) by simple random sampling without replacement (SRSWOR) to collect information on Y. Assuming at first phase, Y can be observed only for \(n_1\) units out of n and, the remaining \(n_{2}=nn_1\) units are taken as nonresponse. A subsample of size \(r=n_2/k\), \(k>1\) is selected from nonresponse units where r would be an integer or must be rounded. Assuming that all r selected units give full response on second call. In this fashion, the population is said to be divided into two groups \(U_1\) and \(U_2\) of sizes \(N_1\) and \(N_2\), where \(U_1\) is a group of respondents that would give response on the first call at second phase and \(U_2\) is nonrespondents group which would respond on the second call. Obviously \(N_1\) and \(N_2\) are unknown quantities.
Considering the above situations, Hansen and Hurwitz (1946) have suggested the following estimator for population mean,
where
As well known \(\bar{y}^*\) is an unbiased estimator of \(\bar{Y}\)
where
The variance of \(\bar{y}^*\) is given by
where
We can write (2) as
If P is known, we can first of all consider the regression estimator of \(\bar{Y}\)
where w is an unknown constant to be selected properly.
The mean square error (MSE) of \(\bar{y}^*_{\text {reg}}\) is minimum when
The minimum MSE of \(\bar{y}^*_{\text {reg}}\) is given by
where
We can write (4) as
where
Modified classes
In this section, we have considered Shabbir and Gupta (2007), Koyuncu (2012) and Singh and Solanki (2012) classes because those are more efficient than the regression estimator. Shabbir and Gupta (2007) suggested a class of ratio estimator for the population mean \(\bar{Y}\) using known information of the auxiliary attribute. Later Singh and Solanki (2012) and Koyuncu (2012) proposed classes on the same subject using different known population parameters such as \((\rho _{pb}, \; C_p)\) etc. Therefore, taking motivation from the work of the just quoted authors, we modify their classes in frame work of nonresponse on Y .
First modified class
Shabbir and Gupta (2007) proposed the class of estimators
where \(w_1\) and \(w_2\) are suitable weights.
If we assume that there is nonresponse on Y, the class (6) can be modified as
For an easy computation of the bias (B) and the mean square error (MSE) of \(\bar{y}^*_{\text {M1}}\) up to first order approximation, it is convenient to express (7) in terms of \(\delta\)’s
where
and then use Taylor expansion of (8). So, remembering that
we have
and
The mean square error of \(\bar{y}^*_{\text {M1}}\) will be minimum
and
Second modified class
Koyuncu (2012) suggested a regressioncumratio class
where \(w_1\) and \(w_2\) having the same expressions defined earlier, \(\eta\) and \(\psi\) are either real numbers or functions of the known parameter associated with an auxiliary attribute such as \(C_p\), \(\beta _2(\phi )\) and \(\rho _{pb}\). Of course the aim was to increase the performance using more information.
We can modify the class (12), assuming nonresponse on Y
The bias and the MSE of \(\bar{y}^*_\text {M2}\) to the first order of approximation can be written as
and
where \(\tau =\dfrac{{\eta }P}{{\eta }P+\psi }\).
The mean square error of \(\bar{y}^*_{\text {M2}}\) is minimized for
and
To emphasize the comparison with the regression estimator (3), we can express the minimum MSE of \(\bar{y}^*_{\text {M2}}\) as
Third modified class
Singh and Solanki (2012) proposed the class
where \(\gamma\) is a real number, \(\delta\) is an integer which takes values +1 and 1 for designing the estimators and keeping \((w_1 , \; w_2, \; \eta , \; \psi )\) same as defined before. Note that Shabbir and Gupta (2007) class is a member of this class.
We can modify the class (17) considering incomplete information on Y
The bias and MSE of \(\bar{y}^*_{\text {M3}}\) to the first order of approximation are given by
and
where
Minimizing the \(\text {MSE}(\bar{y}^*_{\text {M3}})\) to achieve optimum values of constants \(w_1\) and \(w_2\)
and
Suggested classes
In this section, we have introduced two general classes of estimators for the population mean assuming (as in the previous Section) nonresponse occurs in the study variable with known information on the auxiliary attribute. The first suggested class is obtained starting from a generalization of the third modified class, defined in (18), while the second class is the result of motivation from Diana et al. (2011).
First class
Let
where \(u=(P\hat{P})\), \(w_1\) and \(w_2\) are constants to be chosen properly and g is a generic function that satisfy the following mild conditions

g is continuous and bounded in a neighborhood of zeros.

g does not depend on n, N and \((\phi _1,\dots ,\phi _N)\).

g is a three times differentiable function with continuous and bounded derivatives.
Expanding g(u) using Taylor’s series up to order \(o_p(u^2)\), the resulting expression for the class is given by
where g(0) is a constant term, \(g'(0)\) is first order partial derivative of g(u) in zero and \(g''(0)\) is second order partial derivative in zero. For sake of simplicity, we can write \(g(0)=a_0\), \(g'(0)=b_0\) and \(\dfrac{1}{2}g''(0)=c_0\).
Now expressing (23) in terms of \(\delta\)’s
The bias and the MSE of \(\bar{y}_{\text {S1}}\) up to the first order of approximation are
and
Minimizing (26) with respect to \(w_1\) and \(w_2\), we obtain
and
where
and
Hence, with the help of above notations, one can write minimum MSE of \(\bar{y}_\text {S1}\) as follows
where
and
Second class
Motivated by Diana et al. (2011), we consider the class
where u and g(u) are explained in earlier section.
If we expand g(u) again using Taylor’s series, \(\bar{y}_{\text {S2}}\) becomes
The bias and the MSE to the first order of approximation can be written as
and
By minimizing \(\text {MSE}(\bar{y}_\text {S2})\), one can get the optimum values of the constants \(w_1\) and \(w_2\)
and
Therefore,
It is observed that Rao (1991) class is a member of Diana et al. (2011). Also in our case, if considered g(u) appears as an identity function, the class (28) reduces to
that is the corresponding version of Rao (1991) class when nonresponse is present.
The bias and the MSE of \(\bar{y}^*_{\text {S2(R)}}\) can be written as
and
The optimum values of constants \(w_1\) and \(w_2\) are
and
The minimum MSE of \(\bar{y}^*_{\text {S2(R)}}\) is given by
From (36) is clear that the estimator \(\bar{y}^*_{\text {S2(R)}}\) performs always better than \(\bar{y}^*_{\text {reg}}\), in their optimal case.
From (16) and (36), it is easy to see that the minimum MSE of \(\bar{y}^*_{\text {M2}}\) becomes equal to the minimum MSE of \(\bar{y}^*_{\text {S2(R)}}\) when \(\tau =0\). One can also observe that regressioncumratio estimator (13) may perform better than regression estimator for different choices of \(\tau\). Note that the MSE of \(\bar{y}_{\text {S2}}\) looks like the MSE of Diana et al. (2011), however, their work is related to complete information for the study variable Y and the auxiliary variable X. But the class \(\bar{y}_{\text {S2}}\) highlights the nonresponse problem especially when someone estimates the population mean using information of the auxiliary attribute.
Choice of function g
The performance of the proposed classes \(\bar{y}_{\text {S1}}\) and \(\bar{y}_{\text {S2}}\) depends upon the selection of function g. Careful choice for g is a crucial factor and it requires deep insight both from theoretical and practical point of view. There are many possible choices for g but we consider only ratio and exponential function, because they are found good choices from both theoretical and practical point of view.
 (i):

Consider a ratio type function suggested by Singh and Solanki (2012) assuming \(\gamma =1\)
$$\begin{aligned} g(u)=\left( \dfrac{{\eta }P+{\delta }{\psi }}{{\eta }P+{\delta }{\psi }{\eta }u}\right) . \end{aligned}$$(37)
When \(\delta =1\), g(u) is similar to function considered by Koyuncu (2012).
When \(\eta =1\) and \(\psi =0\) then g(u) become similar to the ratio function suggested by Shabbir and Gupta (2007).
The suggested classes \(\bar{y}_{\text {S1}}\) and \(\bar{y}_{\text {S2}}\) become
and
If we consider \(\eta =1\) and \(\psi =0\), \(\bar{y}^*_{\text {S1(1)}}\) is equivalent to \(\bar{y}^*_{\text {M1}}\) and \(\bar{y}^*_{\text {S2(1)}}\) is equal to \(\bar{y}^*_{\text {M2}}\), for \(\delta =1\).
Hence, we can conclude that \((\bar{y}^*_{\text {M1}}, \; \bar{y}^*_{\text {M3}})\) belong to the class \(\bar{y}_{\text {S1}}\) and \(\bar{y}^*_{\text {M2}}\) is a member of the class \(\bar{y}_{\text {S2}}\).
 (ii):

Consider an exponential function
$$\begin{aligned} g(u)=\exp \left( \dfrac{u}{2Pu}\right) . \end{aligned}$$(38)
Then \(\bar{y}_{\text {S1}}\) and \(\bar{y}_{\text {S2}}\) can be written as
and
The minimum MSE of \(\bar{y}^{*}_{\text {S1(2)}}\) and \(\bar{y}^{*}_{\text {S2(2)}}\) are given by
and
Efficiency comparisons
In this section, efficiency of the proposed estimators on the basis of their minimum mean square error has been evaluated by analyzing the performance of estimators, when possible, specially numerically. It is well known that the regression estimator \(\bar{y}^*_{\text {reg}}\) is always more efficient than the Hansen and Hurwitz (1946) estimator (for instance, see (2) and (4)). For this reason we make efficiency comparison of the proposed classes with the regression estimator.
From the comparison of (4) with (32), after some computation, one can get
when
This expression will be certainly \(> 0\) if \(\left( 2a_0c_03b_0^2\right) \ge 0\) and hence \(\bar{y}_{\text {S2}}\) is more efficient than the regression estimator.
Now making comparison of \(\bar{y}^*_{\text {M2}}\) and \(\bar{y}^*_{\text {S2}(R)}\) with \(\bar{y}^*_{\text {reg}}\)
and
Remark
It can be observed from (4) and (16) and, from (4) and (36) respectively, that these expressions are always positive. Furthermore, it is not easy to make analytical comparison for \(\left( \bar{y}^*_{\text {M1}}, \; \bar{y}^*_{\text {M3}}, \; \bar{y}_{\text {S1}}\right)\).
So in this Section, we make numerical comparison of modified and suggested classes using two population data sets as earlier considered by Shabbir and Gupta (2007), AbdElfattah et al. (2010) and Koyuncu (2012).
Population I
[Source: Sukhatme and Sukhatme (1970), p. 256]
The nonresponse rate in the population is considered to be 25 percent, taken as last 22 units of the population
Population II
[Source: Sukhatme and Sukhatme (1970), p. 256]
The nonresponse units of the population are taken as last 22 units (\(25\,\%\) of N )
The comparison is performed in terms of Percent Relative Efficiency (PRE)
where \(\bar{y}^*_{(\bullet )}=\left( \bar{y}^*_\text {M1}, \; \bar{y}^*_\text {M2}, \; \bar{y}^*_\text {M3}, \; \bar{y}^*_\text {S1(1)}, \; \bar{y}^*_\text {S1(2)}, \; \bar{y}^*_\text {S2(R)}, \; \bar{y}^*_\text {S2(1)}, \; \bar{y}^*_\text {S2(2)}\right)\).
From Tables 1 and 2, it is observed that the estimator \(\bar{y}^*_{\text {M}2}\) with different values of \(\eta\) and \(\psi\) performs similar like \(\bar{y}^*_{\text {S2}(R)}\). It should be noted that the estimator \(\bar{y}^*_{\text {S2(2)}}\) with exponential function perform better than the estimators \(\bar{y}^*_{\text {M}2}\) and \(\bar{y}^*_{\text {S2(1)}}\) with ratio function. After careful analysis of performance of \(\bar{y}^*_{\text {M3}}\), it is observed that different possible values of \(\eta\) and \(\psi\) increase the efficiency of the estimator. For \((\eta =1, \; \psi =0)\), the estimators \(\bar{y}^*_{\text {M1}}\) and \(\bar{y}^*_{\text {S1(1)}}\) perform similar as expected. The PRE of \(\bar{y}^{*}_{\text {S}1(2)}\) is higher than those of \(\bar{y}^*_{\text {M1}}\), \(\bar{y}^*_{\text {M3}}\) and \(\bar{y}^{*}_{\text {S}1(1)}\) which leads to the conclusion that exponential function may be a better choice than ratio. It can be seen that as the inverse sampling rate k increases, the PREs of the estimators \(\left( \bar{y}^*_{\text {M}2}, \; \bar{y}^*_{\text {S}2(R)}, \; \bar{y}^*_{\text {S}2(1)}\right)\) also increase but the PREs of \(\left( \bar{y}^*_{\text {M}1}, \; \bar{y}^*_{\text {M}3}, \; \bar{y}^*_{\text {S}2(2)}, \; \bar{y}^*_{\text {S}1(1)}, \; \bar{y}^*_{\text {S}1(2)}\right)\) decrease.
It has been shown in Singh and Solanki (2012) that the estimator \(\bar{y}_{\text {SG}}\) with \((\eta =1, \; \psi =0)\) performs better than the estimator \(\bar{y}_{\text {SS}}\) with complete information on Y . The same behavior is observed for \(\left( \bar{y}^*_{\text {M1}}, \; \bar{y}^*_{\text {M3}}\right)\) in case of incomplete information on Y , in Table 1 and Table 2. Hence, from practical point of view, \(\bar{y}^*_{\text {M1}}\) is preferable than \(\left( \bar{y}^*_{\text {M2}}, \;\bar{y}^*_{\text {M3}}\right)\) because it is showing higher efficiency by using less auxiliary information as compared with others.
Conclusions
In this paper, two new generalized classes of biased estimators for the population mean have been proposed when information on the auxiliary attribute is available, along with considering the problem of nonresponse on the study variable. Further, three modified classes of estimators motivated by Shabbir and Gupta (2007), Koyuncu (2012) and Singh and Solanki (2012) have also been considered in presence of nonresponse. Henceforth, linear regression estimator is considered as benchmark for comparing efficiency of the proposed classes. Our suggested classes \(\bar{y}_{\text {S1}}\) and \(\bar{y}_{\text {S2}}\) depend on the choice of function g and for this we consider ratio and exponential functions. Numerical results are reported in Tables 1 and 2 to show superiority of the suggested classes with the regression estimator. The main purpose of this paper is to highlight the nonresponse problem in the study variable when information of auxiliary attribute is available for estimating the unknown population mean.
References
AbdElfattah AM, ElSherpieny EA, Mohamed SM, Abdou OF (2010) Improvement in estimating the population mean in simple random sampling using information on auxiliary attribute. Appl Math Comput 215:4198–4202
Diana G, Giordan M, Perri PF (2011) An improved class of estimators for the population mean. Stat Methods Appl 20:123–140
Diana G, Riaz S, Shabbir J (2012) A general class of regression type estimators when auxiliary variable is an attribute. In: Proceedings of the 12th Islamic countries conference on statistical sciences: 19–22 Dec 2012; Qatar, vol 23, pp 277–284. ISOSS: Pakistan
Hansen MH, Hurwitz WN (1946) The problems of nonresponse in sample surveys. J Am Stat Assoc 41:517–529
Jhajj HS, Sharma MK, Grover LK (2006) A family of estimators of population mean using information on auxiliary attribute. Pak J Stat 22:43–50
Khare BB, Srivastava S (1997) Transformed ratio type estimtors for the population mean in the presence of nonresponse. Commun Stat Theory Methods 26:1779–1791
Koyuncu N (2012) Efficient estimators of population mean using auxiliary attributes. Appl Math Comput 218(22):10900–10905
Naik VD, Gupta PC (1996) A note on estimating of mean with known population of an auxiliary character. J Indian Soc Agric Stat 48:151–158
Okafor FC, Lee H (2000) Double sampling for ratio and regression estimation with subsampling the nonrespondents. Surv Methodol 26(2):183–188
Rao TJ (1991) On certain methods of improving ratio and regression estimators. Commun Stat Theory Methods 20:3325–3340
Riaz S, Diana G, Shabbir J (2014) Improved classes of estimators for population mean in presence of nonresponse. Pak J Stat 30(1):83–100
Shabbir J, Gupta S (2007) On estimating the finite population mean with known population proportion of an auxiliary variable. Pak J Stat 23:1–9
Shabbir J, Gupta S (2010) Estimation of the finite population mean in two phase sampling when auxiliary variables are attributes. Hacet J Math Stat 39:121–129
Singh HP, Kumar S (2010) Improves estimation of population mean under two phase sampling with subsampling the nonrespondents. J Stat Plann Inference 140:2536–2550
Singh HP, Solanki RS (2012) Improved estimation of population mean in simple random sampling using information on auxiliary attribute. Appl Math Comput 218:7798–7812
Singh R, Chauhan P, Sawan N, Smarandache F (2008) Ratio estimators in simple random sampling using information on auxiliary attribute. Pak J Stat Oper Res 4:47–53
Sukhatme PV, Sukhatme BV (1970) Sampling theory of surveys with applications. Asia Publishing House, New Dehli
Authors' contributions
Both authors have equal contribution in this work. Both authors read and approved the final manuscript.
Competing interests
The authors declare that they have no competing interests.
Author information
Additional information
Saba Riaz and Md. Abud Darda contributed equally to this article
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
About this article
Received
Accepted
Published
DOI
Keywords
 Proportion
 Bias
 Mean square error
 Ratio function
 Exponential function
 Efficiency