Some classes of estimators in the presence of nonresponse using auxiliary attribute
 Saba Riaz†^{1} and
 Md. Abud Darda†^{2}Email authorView ORCID ID profile
Received: 30 May 2016
Accepted: 27 July 2016
Published: 5 August 2016
Abstract
In this paper, possible solutions of problem of nonresponse in the variable of interest are proposed when information about an auxiliary attribute is available. By taking motivation from the previous work, modified classes have been suggested for estimating population mean. Two new generalized classes of estimators are presented along with their asymptotic biases and variances. Efficacy analysis of the suggested classes is acquired with the usual regression estimator. Two real examples have been provided to show the efficiency of the proposed design approach and comparison of suggested estimators with the linear regression estimator.
Keywords
Introduction and literature review
In sample surveys while estimating an unknown population parameter of the study variable, use of the auxiliary information increases the efficiency of the estimates. In many real scenarios, some auxiliary attributes are highly correlated with the study variable, for instance, person’s weight and gender, cow’s amount of produced milk and breed, crop production and seed type, income and ownership of a house, etc. Many authors have suggested estimators based on auxiliary attributes in simple random sampling or two phase sampling [see Naik and Gupta (1996), Jhajj et al. (2006), Singh et al. (2008), Shabbir and Gupta (2010), AbdElfattah et al. (2010), Singh and Solanki (2012), Koyuncu (2012) and references cited therein]. Sometimes people might refuse to reveal their weight or income, or some information are missing because of respondents’ non availability at time of the survey. These situations are related to nonresponse problem when objects are unavailable or people refuse to answer. In the literature, much work has been done in the existence of nonresponse problem in the variable of interest and in single or two phase sampling scheme [see Khare and Srivastava (1997), Okafor and Lee (2000), Singh and Kumar (2010), Riaz et al. (2014) and references cited therein].
Diana et al. (2012) suggested regression type estimators using auxiliary attribute in presence of nonresponse. According to best of our knowledge little attention is given to the problem of nonresponse using information on auxiliary attributes.
The aim of this paper is to provide solution to this important problem. At this end, two generalized classes are suggested for the unknown population mean along with the supplementary information about the proportion P of the population units having the auxiliary attribute \(\phi\). Taking motivation from Shabbir and Gupta (2007), Koyuncu (2012) and Singh and Solanki (2012), some modified classes are also suggested in the existence of missing information.
Let a finite population U of N distinctive units. Let Y be the variable of interest having values \(y_i\), \(i=(1,\dots ,N)\) with unknown mean \(\bar{Y}=\sum _{i=1}^Ny_i/N\) and unknown variance \(S_y^2=\sum _{i=1}^N(y_i\bar{Y})^2/(N1)\) assuming that nonresponse occurs in Y. Let \(\phi\) be the auxiliary attribute correlated with Y having values \(\phi _i\), \(i=(1,\dots ,N)\). Consider \(\phi _i=1\), if the ith unit of the population has attribute \(\phi\) and 0, otherwise. Let \(A=\sum _{i=1}^N{\phi _i}\) and \(a=\sum _{i=1}^n{\phi _i}\) be the total number of units in the population and in the sample having attribute \(\phi\), \(P=\left( A/N\right)\) and \(\hat{P}=\left( a/n\right)\) denote the proportion of units in the population and in the sample having attribute \(\phi\). Let P is known and used to estimate the mean \(\bar{Y}\). Hansen and Hurwitz (1946) suggested the following subsampling technique in presence of nonresponse. Let a large sample u of size n \((n<N)\) by simple random sampling without replacement (SRSWOR) to collect information on Y. Assuming at first phase, Y can be observed only for \(n_1\) units out of n and, the remaining \(n_{2}=nn_1\) units are taken as nonresponse. A subsample of size \(r=n_2/k\), \(k>1\) is selected from nonresponse units where r would be an integer or must be rounded. Assuming that all r selected units give full response on second call. In this fashion, the population is said to be divided into two groups \(U_1\) and \(U_2\) of sizes \(N_1\) and \(N_2\), where \(U_1\) is a group of respondents that would give response on the first call at second phase and \(U_2\) is nonrespondents group which would respond on the second call. Obviously \(N_1\) and \(N_2\) are unknown quantities.
Modified classes
In this section, we have considered Shabbir and Gupta (2007), Koyuncu (2012) and Singh and Solanki (2012) classes because those are more efficient than the regression estimator. Shabbir and Gupta (2007) suggested a class of ratio estimator for the population mean \(\bar{Y}\) using known information of the auxiliary attribute. Later Singh and Solanki (2012) and Koyuncu (2012) proposed classes on the same subject using different known population parameters such as \((\rho _{pb}, \; C_p)\) etc. Therefore, taking motivation from the work of the just quoted authors, we modify their classes in frame work of nonresponse on Y .
First modified class
Second modified class
Third modified class
Suggested classes
In this section, we have introduced two general classes of estimators for the population mean assuming (as in the previous Section) nonresponse occurs in the study variable with known information on the auxiliary attribute. The first suggested class is obtained starting from a generalization of the third modified class, defined in (18), while the second class is the result of motivation from Diana et al. (2011).
First class

g is continuous and bounded in a neighborhood of zeros.

g does not depend on n, N and \((\phi _1,\dots ,\phi _N)\).

g is a three times differentiable function with continuous and bounded derivatives.
Second class
From (16) and (36), it is easy to see that the minimum MSE of \(\bar{y}^*_{\text {M2}}\) becomes equal to the minimum MSE of \(\bar{y}^*_{\text {S2(R)}}\) when \(\tau =0\). One can also observe that regressioncumratio estimator (13) may perform better than regression estimator for different choices of \(\tau\). Note that the MSE of \(\bar{y}_{\text {S2}}\) looks like the MSE of Diana et al. (2011), however, their work is related to complete information for the study variable Y and the auxiliary variable X. But the class \(\bar{y}_{\text {S2}}\) highlights the nonresponse problem especially when someone estimates the population mean using information of the auxiliary attribute.
Choice of function g
 (i):

Consider a ratio type function suggested by Singh and Solanki (2012) assuming \(\gamma =1\)$$\begin{aligned} g(u)=\left( \dfrac{{\eta }P+{\delta }{\psi }}{{\eta }P+{\delta }{\psi }{\eta }u}\right) . \end{aligned}$$(37)
When \(\eta =1\) and \(\psi =0\) then g(u) become similar to the ratio function suggested by Shabbir and Gupta (2007).
 (ii):

Consider an exponential function$$\begin{aligned} g(u)=\exp \left( \dfrac{u}{2Pu}\right) . \end{aligned}$$(38)
Efficiency comparisons
In this section, efficiency of the proposed estimators on the basis of their minimum mean square error has been evaluated by analyzing the performance of estimators, when possible, specially numerically. It is well known that the regression estimator \(\bar{y}^*_{\text {reg}}\) is always more efficient than the Hansen and Hurwitz (1946) estimator (for instance, see (2) and (4)). For this reason we make efficiency comparison of the proposed classes with the regression estimator.
Remark
It can be observed from (4) and (16) and, from (4) and (36) respectively, that these expressions are always positive. Furthermore, it is not easy to make analytical comparison for \(\left( \bar{y}^*_{\text {M1}}, \; \bar{y}^*_{\text {M3}}, \; \bar{y}_{\text {S1}}\right)\).
So in this Section, we make numerical comparison of modified and suggested classes using two population data sets as earlier considered by Shabbir and Gupta (2007), AbdElfattah et al. (2010) and Koyuncu (2012).
Population I
Population II
PRE of the estimators with respect to \(\bar{y}^*_\text {reg}\) for different values of k for Pop I
Estimator  \(\gamma\)  \(\delta\)  \(\eta\)  \(\psi\)  k  

2  3  4  5  
\(\bar{y}^*_{\text {M}2}\)  –  –  \(C_p\)  \(\beta _2(\varphi )\)  100.94  101.39  101.85  102.30 
–  –  \(\beta _2(\varphi )\)  \(C_p\)  100.95  101.41  101.87  102.33  
–  –  1  \(C_p\)  100.94  101.39  101.85  102.30  
–  –  1  \(\beta _2(\varphi )\)  100.94  101.39  101.85  102.30  
\(\bar{y}^*_{\text {S}2(R)}\)  –  –  1  0  100.94  101.39  101.85  102.30 
\(\bar{y}^*_{\text {S}2(1)}\)  –  –  1  0  101.22  101.81  102.40  102.99 
\(\bar{y}^{*}_{\text {S}2(2)}\)  –  –  1  0  118.34  114.94  113.55  112.93 
\(\bar{y}^*_{\text {M3}}\)  1  1  n  \(1n/N\)  114.65  112.36  111.49  111.18 
1  1  N  P  125.51  120.09  117.82  116.73  
1  1  N  \(k_p\)  125.21  119.88  117.65  116.58  
\(\bar{y}^*_{\text {M1}}\)  –  –  1  0  126.32  120.65  118.27  117.11 
\(\bar{y}^*_{\text {S}1(1)}\)  –  –  1  0  126.32  120.65  118.27  117.11 
\(\bar{y}^{*}_{\text {S}1(2)}\)  –  –  1  0  135.75  126.43  122.47  120.40 
From Tables 1 and 2, it is observed that the estimator \(\bar{y}^*_{\text {M}2}\) with different values of \(\eta\) and \(\psi\) performs similar like \(\bar{y}^*_{\text {S2}(R)}\). It should be noted that the estimator \(\bar{y}^*_{\text {S2(2)}}\) with exponential function perform better than the estimators \(\bar{y}^*_{\text {M}2}\) and \(\bar{y}^*_{\text {S2(1)}}\) with ratio function. After careful analysis of performance of \(\bar{y}^*_{\text {M3}}\), it is observed that different possible values of \(\eta\) and \(\psi\) increase the efficiency of the estimator. For \((\eta =1, \; \psi =0)\), the estimators \(\bar{y}^*_{\text {M1}}\) and \(\bar{y}^*_{\text {S1(1)}}\) perform similar as expected. The PRE of \(\bar{y}^{*}_{\text {S}1(2)}\) is higher than those of \(\bar{y}^*_{\text {M1}}\), \(\bar{y}^*_{\text {M3}}\) and \(\bar{y}^{*}_{\text {S}1(1)}\) which leads to the conclusion that exponential function may be a better choice than ratio. It can be seen that as the inverse sampling rate k increases, the PREs of the estimators \(\left( \bar{y}^*_{\text {M}2}, \; \bar{y}^*_{\text {S}2(R)}, \; \bar{y}^*_{\text {S}2(1)}\right)\) also increase but the PREs of \(\left( \bar{y}^*_{\text {M}1}, \; \bar{y}^*_{\text {M}3}, \; \bar{y}^*_{\text {S}2(2)}, \; \bar{y}^*_{\text {S}1(1)}, \; \bar{y}^*_{\text {S}1(2)}\right)\) decrease.
PRE of the estimators with respect to \(\bar{y}^*_\text {reg}\) for different values of k for Pop II
Estimator  \(\gamma\)  \(\delta\)  \(\eta\)  \(\psi\)  k  

2  3  4  5  
\(\bar{y}^*_{\text {M}2}\)  –  –  \(C_p\)  \(\beta _2(\varphi )\)  101.20  101.56  101.93  102.29 
–  –  \(\beta _2(\varphi )\)  \(C_p\)  101.21  101.58  101.95  102.32  
–  –  1  \(C_p\)  101.20  101.56  101.93  102.29  
–  –  1  \(\beta _2(\varphi )\)  101.20  101.56  101.93  102.29  
\(\bar{y}^*_{\text {S}2(R)}\)  –  –  1  0  101.20  101.56  101.93  102.29 
\(\bar{y}^*_{\text {S}2(1)}\)  –  –  1  0  101.56  102.03  102.51  102.98 
\(\bar{y}^{*}_{\text {S}2(2)}\)  –  –  1  0  115.99  114.28  113.40  112.94 
\(\bar{y}^*_{\text {M3}}\)  1  1  n  \(1n/N\)  111.48  110.64  110.29  110.18 
1  1  N  P  118.59  116.52  115.45  114.88  
1  1  N  \(k_p\)  118.48  116.43  115.37  114.81  
\(\bar{y}^*_{\text {M1}}\)  –  –  1  0  119.10  116.93  115.81  115.20 
\(\bar{y}^*_{\text {S}1(1)}\)  –  –  1  0  119.10  116.93  115.81  115.20 
\(\bar{y}^{*}_{\text {S}1(2)}\)  –  –  1  0  128.32  123.85  121.40  119.92 
Conclusions
In this paper, two new generalized classes of biased estimators for the population mean have been proposed when information on the auxiliary attribute is available, along with considering the problem of nonresponse on the study variable. Further, three modified classes of estimators motivated by Shabbir and Gupta (2007), Koyuncu (2012) and Singh and Solanki (2012) have also been considered in presence of nonresponse. Henceforth, linear regression estimator is considered as benchmark for comparing efficiency of the proposed classes. Our suggested classes \(\bar{y}_{\text {S1}}\) and \(\bar{y}_{\text {S2}}\) depend on the choice of function g and for this we consider ratio and exponential functions. Numerical results are reported in Tables 1 and 2 to show superiority of the suggested classes with the regression estimator. The main purpose of this paper is to highlight the nonresponse problem in the study variable when information of auxiliary attribute is available for estimating the unknown population mean.
Notes
Declarations
Authors' contributions
Both authors have equal contribution in this work. Both authors read and approved the final manuscript.
Competing interests
The authors declare that they have no competing interests.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
Authors’ Affiliations
References
 AbdElfattah AM, ElSherpieny EA, Mohamed SM, Abdou OF (2010) Improvement in estimating the population mean in simple random sampling using information on auxiliary attribute. Appl Math Comput 215:4198–4202Google Scholar
 Diana G, Giordan M, Perri PF (2011) An improved class of estimators for the population mean. Stat Methods Appl 20:123–140View ArticleGoogle Scholar
 Diana G, Riaz S, Shabbir J (2012) A general class of regression type estimators when auxiliary variable is an attribute. In: Proceedings of the 12th Islamic countries conference on statistical sciences: 19–22 Dec 2012; Qatar, vol 23, pp 277–284. ISOSS: PakistanGoogle Scholar
 Hansen MH, Hurwitz WN (1946) The problems of nonresponse in sample surveys. J Am Stat Assoc 41:517–529View ArticleGoogle Scholar
 Jhajj HS, Sharma MK, Grover LK (2006) A family of estimators of population mean using information on auxiliary attribute. Pak J Stat 22:43–50Google Scholar
 Khare BB, Srivastava S (1997) Transformed ratio type estimtors for the population mean in the presence of nonresponse. Commun Stat Theory Methods 26:1779–1791View ArticleGoogle Scholar
 Koyuncu N (2012) Efficient estimators of population mean using auxiliary attributes. Appl Math Comput 218(22):10900–10905Google Scholar
 Naik VD, Gupta PC (1996) A note on estimating of mean with known population of an auxiliary character. J Indian Soc Agric Stat 48:151–158Google Scholar
 Okafor FC, Lee H (2000) Double sampling for ratio and regression estimation with subsampling the nonrespondents. Surv Methodol 26(2):183–188Google Scholar
 Rao TJ (1991) On certain methods of improving ratio and regression estimators. Commun Stat Theory Methods 20:3325–3340View ArticleGoogle Scholar
 Riaz S, Diana G, Shabbir J (2014) Improved classes of estimators for population mean in presence of nonresponse. Pak J Stat 30(1):83–100Google Scholar
 Shabbir J, Gupta S (2007) On estimating the finite population mean with known population proportion of an auxiliary variable. Pak J Stat 23:1–9Google Scholar
 Shabbir J, Gupta S (2010) Estimation of the finite population mean in two phase sampling when auxiliary variables are attributes. Hacet J Math Stat 39:121–129Google Scholar
 Singh HP, Kumar S (2010) Improves estimation of population mean under two phase sampling with subsampling the nonrespondents. J Stat Plann Inference 140:2536–2550View ArticleGoogle Scholar
 Singh HP, Solanki RS (2012) Improved estimation of population mean in simple random sampling using information on auxiliary attribute. Appl Math Comput 218:7798–7812Google Scholar
 Singh R, Chauhan P, Sawan N, Smarandache F (2008) Ratio estimators in simple random sampling using information on auxiliary attribute. Pak J Stat Oper Res 4:47–53View ArticleGoogle Scholar
 Sukhatme PV, Sukhatme BV (1970) Sampling theory of surveys with applications. Asia Publishing House, New DehliGoogle Scholar