Skip to main content
Log in

Estimation of parameters of logistic regression for two-stage randomized response technique

  • Original paper
  • Published:
Computational Statistics Aims and scope Submit manuscript

Abstract

When a survey study is related to sensitive issues such as political orientation, sexual orientation, and income, respondents may not be willing to reply truthfully, which leads to bias results. To protect the respondents’ privacy and improve their willingness to provide true answers, Warner (J Am Stat Assoc 60:63–69, 1965) proposed the randomized response (RR) technique in which respondents select a question by means of a random device in order to ensure that they maintain privacy. Huang (Stat Neerl 58:75–82, 2004) extended the RR design of Warner (1965) to propose a two-stage RR design. Not only can this method be used to estimate the population proportion of persons with a sensitive characteristic, but also estimate the honest answer rate in the first stage. This work develops a covariate extension of the two-stage RR design of Huang (2004) by applying logistic regression to investigate the effects of covariates on a sensitive characteristic and an honest response. Simulation experiments are conducted to study the finite-sample performance of the maximum likelihood estimators of the logistic regression parameters. The proposed methodology is applied to analyze the survey data of sexuality of freshmen at Feng Chia University in Taiwan in 2016.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

References

  • Chang HJ, Huang KC (2001) Estimation of proportion and sensitivity of a qualitative character. Metrika 53:269–280

    Article  MathSciNet  Google Scholar 

  • Chaudhuri A (2002) Estimating sensitive proportions from randomized responses in unequal probability sampling. Calcutta Stat Assoc Bull 52:315–322

    Article  MathSciNet  Google Scholar 

  • Chaudhuri A, Mukerjee R (1988) Randomized response: theory and techniques. Marcel Dekker, New York

    MATH  Google Scholar 

  • Christofides TC (2003) A generalized randomized response technique. Metrika 57:195–200

    Article  MathSciNet  Google Scholar 

  • Christofides TC (2005) Randomized response in stratified sampling. J Stat Plan Inference 128:303–310

    Article  MathSciNet  Google Scholar 

  • Cruyff MJLF, Böckenholt U, van den Hout A, van der Heijden PGM (2008) Accounting for self-protective responses in randomized response data from a social security survey using the zero-inflated Poisson model. Ann Appl Stat 2:316–331

    Article  MathSciNet  Google Scholar 

  • Foutz RV (1977) On the unique consistent solution to the likelihood equations. J Am Stat Assoc 72:147–148

    Article  MathSciNet  Google Scholar 

  • Greenberg BG, Abul-Ela A, Simmons WR, Horvitz DG (1969) The unrelated question randomized response model: theoretical framework. J Am Stat Assoc 64:520–539

    Article  MathSciNet  Google Scholar 

  • Groenitz H (2014) A new privacy-protecting survey design for multichotomous sensitive variables. Metrika 77:211–224

    Article  MathSciNet  Google Scholar 

  • Hsieh SH, Lee SM, Shen PS (2009) Semiparametric analysis of randomized response data with missing covariates in logistic regression. Comput Stat Data Anal 53:2673–2692

    Article  MathSciNet  Google Scholar 

  • Hsieh SH, Lee SM, Shen PS (2010) Logistic regression analysis of randomized response data with missing covariates. J Stat Plan Inference 140:927–940

    Article  MathSciNet  Google Scholar 

  • Hsieh SH, Lee SM, Li CS (2020) A two-stage multilevel randomized response technique with proportional odds models and missing covariates. Sociol Methods Res. https://doi.org/10.1177/0049124120914954

    Article  Google Scholar 

  • Huang KC (2004) A Survey technique for estimating the proportion and sensitivity in a dichotomous finite population. Stat Neerl 58:75–82

    Article  MathSciNet  Google Scholar 

  • Kim JM, Warde WD (2004) A stratified Warner’s randomized response model. J Stat Plan Inference 120:155–165

    Article  MathSciNet  Google Scholar 

  • Kim JM, Warde WD (2005) A mixed randomized response model. J Stat Plan Inference 133:211–221

    Article  MathSciNet  Google Scholar 

  • Kim JM, Tebbs JM, An SW (2006) Extensions of Mangat’s randomized-response model. J Stat Plan Inference 136:1554–1567

    Article  MathSciNet  Google Scholar 

  • Kuk AYC (1990) Asking sensitive questions indirectly. Biometrika 77:436–438

    Article  MathSciNet  Google Scholar 

  • Lee SM, Peng TC, Tapsoba JDD, Hsieh SH (2017) Improved estimation methods for unrelated question randomized response techniques. Commun Stat Theory Methods 46:8101–8112

    Article  MathSciNet  Google Scholar 

  • Maddala GS (1983) Limited-dependent and qualitative variables in econometrics. Cambridge University Press, Cambridge

    Book  Google Scholar 

  • Saha A (2004) On efficacies of Dalenius–Vitale technique with compulsory versus optional randomized responses from complex surveys. Calcutta Stat Assoc Bull 54:223–230

    Article  MathSciNet  Google Scholar 

  • Scheers NJ, Dayton CM (1988) Covariate randomized response models. J Am Stat Assoc 83:969–974

    Article  Google Scholar 

  • Warner SL (1965) Randomized response: a survey technique for eliminating evasive answer bias. J Am Stat Assoc 60:63–69

    Article  Google Scholar 

  • Yu JW, Tian GL, Tang ML (2008) Two new models for survey sampling with sensitive characteristic: design and analysis. Metrika 67:251–263

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

The authors are very thankful for a reviewer’s constructive comments that improved the presentation. The research of S.M. Lee and K.H. Pho was supported by the Ministry of Science and Technology (MOST) of Taiwan, ROC (105-2118-M-035-005-MY2 and 107-2118-M-035-004-MY2).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chin-Shang Li.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

1.1 Proof of Lemma 1

\(\varvec{\Psi }_i(\Theta )\) in (4) can be expressed as

$$\begin{aligned} \varvec{\Psi }_i(\Theta )&=\frac{Y_{i1}}{g_{i1}(\Theta )}\left( \frac{\partial {g_{i1}}(\Theta )}{\partial \Theta }\right) + \frac{Y_{i2}}{g_{i2}(\Theta )}\left( \frac{\partial {g_{i2}}(\Theta )}{\partial \Theta }\right) \\&\quad -\frac{1-Y_{i1}-Y_{i2}}{1-g_{i1}(\Theta )-g_{i2}(\Theta )}\left( \frac{\partial {g_{i1}}(\Theta )}{\partial \Theta } + \frac{\partial {g_{i2}}(\Theta )}{\partial \Theta }\right) \\&=\frac{\left[ Y_{i1}-g_{i1}(\Theta )\right] g_{i2}(\Theta )[1-g_{i2}(\Theta )]}{g_{i1}(\Theta )g_{i2}(\Theta )[1-g_{i1}(\Theta )-g_{i2}(\Theta )]} \left( \frac{\partial {g_{i1}}(\Theta )}{\partial \Theta }\right) \nonumber \\&\quad + \frac{\left[ Y_{i2}-g_{i2}(\Theta )\right] g_{i1}(\Theta )[1-g_{i1}(\Theta )]}{g_{i1}(\Theta )g_{i2}(\Theta )[1-g_{i1}(\Theta )-g_{i2}(\Theta )]} \left( \frac{\partial g_{i2}(\Theta )}{\partial \Theta }\right) \nonumber \\&\quad + \frac{\left[ Y_{i1}-g_{i1}(\Theta )\right] g_{i1}(\Theta )g_{i2}(\Theta )}{g_{i1}(\Theta )g_{i2}(\Theta )[1-g_{i1}(\Theta )-g_{i2}(\Theta )]} \left( \frac{\partial {g_{i2}}(\Theta )}{\partial \Theta }\right) \nonumber \\&\quad + \frac{\left[ Y_{i2}-g_{i2}(\Theta )\right] g_{i1}(\Theta )g_{i2}(\Theta )}{g_{i1}(\Theta )g_{i2}(\Theta )[1-g_{i1}(\Theta )-g_{i2}(\Theta )]} \left( \frac{\partial {g_{i1}}(\Theta )}{\partial \Theta }\right) \nonumber \\&=\left( \frac{\partial {g_{i1}}(\Theta )}{\partial \Theta }, \frac{\partial {g_{i2}}(\Theta )}{\partial \Theta }\right) \frac{1}{det(\varvec{V}_{i}(\Theta ))}\\&\quad \begin{bmatrix} g_{i2}(\Theta )[1-g_{i2}(\Theta )] &{} g_{i1}(\Theta )g_{i2}(\Theta ) \\ g_{i1}(\Theta )g_{i2}(\Theta ) &{} g_{i1}(\Theta )[1-g_{i1}(\Theta )] \end{bmatrix} [\varvec{Y}_i-\varvec{g}_i(\Theta )]^T\\&=\left( \frac{\partial \varvec{g_i}(\Theta )}{\partial \Theta }\right) {\varvec{V}_i^{-1}(\Theta )}[\varvec{Y}_i-\varvec{g}_i(\Theta )]^T,\ i=1,2,\dots , n, \end{aligned}$$

where \(det(\varvec{V}_i(\Theta ))= g_{i1}(\Theta )g_{i2}(\Theta )(1-g_{i1}(\Theta )-g_{i2}(\Theta ))\) and \(\varvec{V}_i(\Theta )\) is given in (8). Hence the score function \(\varvec{U}_n(\Theta )\) can be written as

$$\begin{aligned} \varvec{U}_n(\Theta ) =\sum _{i=1}^n\varvec{\Psi }_i(\Theta ) =\sum _{i=1}^{n}\left( \frac{\partial \varvec{g}_i(\Theta )}{\partial \Theta }\right) \varvec{V}_{i}^{-1}(\Theta )[\varvec{Y}_i-\varvec{g}_i(\Theta )]^T. \end{aligned}$$
(9)

1.2 Proof of Theorem 1

1.2.1 (a) Proof of consistency of \({\widehat{\Theta }}\)

Because from condition (C1) and the inverse function theorem of Foutz (1977), \(\varvec{U}_n(\Theta )=\varvec{0}\) has a unique solution, the ML estimator \({\widehat{\Theta }}\) is a consistent estimator of \(\Theta \).

1.2.2 (b) Proof of asymptotic normality of \(\sqrt{n}({\widehat{\Theta }}-\Theta )\)

Let \(\varvec{{\mathcal {U}}}_n={\frac{1}{\sqrt{n}}}\varvec{U}_n(\Theta )=\frac{1}{\sqrt{n}}\sum _{i=1}^n\varvec{\Psi }_i(\Theta )\). By a Taylor’s expansion of \(\varvec{{\mathcal {U}}}_n({\widehat{\Theta }})\) at \(\Theta \), we can have

$$\begin{aligned} \varvec{0}=\varvec{{\mathcal {U}}}_n({\widehat{\Theta }})&=\varvec{{\mathcal {U}}}_n(\Theta ) +\frac{\partial \varvec{{\mathcal {U}}}_n(\Theta )}{\partial \Theta ^T} ({\widehat{\Theta }}-\Theta )+o_p\left( \sqrt{n}[({\widehat{\Theta }}-\Theta )]^{\otimes 2}\right) \nonumber \\&=\varvec{{\mathcal {U}}}_n(\Theta )+\frac{\partial \varvec{{\mathcal {U}}}_n(\Theta )}{\sqrt{n}\partial \Theta ^T}\sqrt{n}({\widehat{\Theta }}-\Theta )+o_p(\varvec{1}), \end{aligned}$$
(10)

where

$$\begin{aligned} \frac{\partial \varvec{{\mathcal {U}}}_n(\Theta )}{\sqrt{n}\partial \Theta ^T}&=\frac{1}{n}\sum _{i=1}^{n}\frac{\partial \varvec{\Psi }_i(\Theta )}{\partial \Theta ^T}\\&=\frac{1}{n}\sum _{i=1}^{n}\frac{\partial \left[ \left( \dfrac{\partial \varvec{g}_i(\Theta )}{\partial \Theta }\right) \varvec{V}_i^{-1}(\Theta )\right] }{\partial \Theta ^T} [\varvec{Y}_i-\varvec{g}_i(\Theta )]^T\\&\quad -\frac{1}{n}\sum _{i=1}^{n}\left( \frac{\partial \varvec{g}_i(\Theta )}{\partial \Theta }\right) \varvec{V}_i^{-1}(\Theta ) \left( \frac{\partial \varvec{g}_i(\Theta )}{\partial \Theta }\right) ^T. \end{aligned}$$

Because \((Y_{i1},Y_{i2},1-Y_{i1}-Y_{i2})|\varvec{X}_i\sim {Mult}\left( 1,g_{i1}(\Theta ),g_{i2}(\Theta ),1-g_{i1}(\Theta )-g_{i2}(\Theta )\right) \), \(i=1,2,\dots \), we have

$$\begin{aligned} E\left\{ [\varvec{Y}_i-\varvec{g}_i(\Theta )]|\varvec{X}_i\right\}&=\varvec{0}, \end{aligned}$$
(11)
$$\begin{aligned} E[\varvec{Y}_i-\varvec{g}_i(\Theta )]&=E\left\{ E\left\{ [\varvec{Y}_i-\varvec{g}_i(\Theta )]|\varvec{X}_i\right\} \right\} =\varvec{0}, \nonumber \\ Var\left\{ [\varvec{Y}_i-\varvec{g}_i(\Theta )]|\varvec{X}_i\right\}&=E\left\{ [\varvec{Y}_i-\varvec{g}_i(\Theta )][\varvec{Y}_i-\varvec{g}_i(\Theta )]^T|\varvec{X}_i\right\} \nonumber \\&=\begin{bmatrix} g_{i1}(\Theta )[1-g_{i1}(\Theta )] &{}\quad -g_{i1}(\Theta )g_{i2}(\Theta ) \\ -g_{i1}(\Theta )g_{i2}(\Theta ) &{}\quad g_{i2}(\Theta )[1-g_{i2}(\Theta )] \end{bmatrix} =\varvec{V}_i(\Theta ). \end{aligned}$$
(12)

Because \((\varvec{Y}_i,\varvec{X}_i)\), \(i=1,2,\ldots ,n\), are independent and identically distributed and \(E[\varvec{Y}_i-\varvec{g}_i(\Theta )]=\varvec{0}\), it can be shown according to condition (C2) and the weak law of large numbers that

$$\begin{aligned} \frac{1}{n}\sum _{i=1}^n\left\{ \frac{\partial }{\partial \Theta ^T}\left[ \left( \dfrac{\partial \varvec{g}_i(\Theta )}{\partial \Theta }\right) \varvec{V}_i^{-1}(\Theta )\right] \right\} [\varvec{Y}_i- \varvec{g}_i(\Theta )]^T\overset{p}{\longrightarrow }\varvec{0}, \end{aligned}$$

and

$$\begin{aligned} \frac{1}{n}\sum _{i=1}^n\left( \frac{\partial \varvec{g}_i(\Theta )}{\partial \Theta }\right) \varvec{V}_i^{-1}(\Theta ) \left( \frac{\partial \varvec{g}_i(\Theta )}{\partial \Theta }\right) ^T\overset{p}{\longrightarrow }E\left[ \left( \frac{\partial \varvec{g}_1(\Theta )}{\partial \Theta }\right) \varvec{V}_1^{-1}(\Theta )\left( \frac{\partial \varvec{g}_1(\Theta )}{\partial \Theta }\right) ^T\right] . \end{aligned}$$

Hence

$$\begin{aligned} -\frac{\partial \varvec{{\mathcal {U}}}_n(\Theta )}{\sqrt{n}\partial \Theta ^T} \overset{p}{\longrightarrow }E\left[ \left( \frac{\partial \varvec{g}_1(\Theta )}{\partial \Theta }\right) \varvec{V}_1^{-1}(\Theta )\left( \frac{\partial \varvec{g}_1(\Theta )}{\partial \Theta }\right) ^T\right] =\varvec{\Delta }^{-1}. \end{aligned}$$

From (10), \(\sqrt{n}({\widehat{\Theta }}-\Theta )\) can be expressed as

$$\begin{aligned} \sqrt{n}({\hat{\Theta }}-\Theta )&=\left( -\frac{\partial \varvec{{\mathcal {U}}}_n(\Theta )}{\sqrt{n}\partial \Theta ^T}\right) ^{-1}\varvec{{\mathcal {U}}}_n(\Theta )+o_p(\varvec{1}) \nonumber \\&=\left[ \varvec{\Delta }+o_p(\varvec{1})\right] \varvec{{\mathcal {U}}}_n(\Theta )+o_p(\varvec{1}) \nonumber \\&=\varvec{\Delta }\varvec{{\mathcal {U}}}_n(\Theta )+o_p(\varvec{1}). \end{aligned}$$
(13)

Because from (11) and (12) we can have

$$\begin{aligned} E\left[ \varvec{\Psi }_i(\Theta )\right] =E\left[ \left( \frac{\partial \varvec{g}_i(\Theta )}{\partial \Theta }\right) \varvec{V}_{i}^{-1}(\Theta )[\varvec{Y}_i-\varvec{g}_i(\Theta )]^T\right] =\varvec{0} \end{aligned}$$

and

$$\begin{aligned} Var\left[ \varvec{\Psi }_i(\Theta )\right]&=E\left[ \varvec{\Psi }_i(\Theta )\varvec{\Psi }_i^T(\Theta )\right] \\&=E\left\{ E\left[ \varvec{\Psi }_i(\Theta )\varvec{\Psi }_i^T(\Theta )|\varvec{X}_i\right] \right\} \\&=E\left[ \left( \frac{\partial \varvec{g}_i(\Theta )}{\partial \Theta }\right) \varvec{V}_i^{-1}(\Theta ) E\left\{ [\varvec{Y}_i-\varvec{g}_i(\Theta )]^T[\varvec{Y}_i-\varvec{g}_i(\Theta )]|\varvec{X}_i\right\} \right. \\&\quad \left. \varvec{V}_i^{-1}(\Theta )\left( \frac{\partial \varvec{g}_i(\Theta )}{\partial \Theta }\right) ^T\right] \\&=E\left[ \left( \frac{\partial \varvec{g}_i(\Theta )}{\partial \Theta }\right) \varvec{V}_i^{-1}(\Theta ) \left( \frac{\partial \varvec{g}_i(\Theta )}{\partial \Theta }\right) ^T\right] =\varvec{\Delta }^{-1},\ i=1,2,\dots ,n, \end{aligned}$$

it can be shown via the central limit theorem that

$$\begin{aligned} \varvec{{\mathcal {U}}}_n(\Theta ) =\frac{1}{\sqrt{n}}\sum _{i=1}^{n}\varvec{\Psi }_i(\Theta ) \overset{d}{\longrightarrow }{\mathcal {N}}\left( \varvec{0},E[\varvec{\Psi }_1(\Theta )\varvec{\Psi }_1^{T}(\Theta )]\right) . \end{aligned}$$

Therefore

$$\begin{aligned} \varvec{\Delta }\varvec{{\mathcal {U}}}_n(\Theta ) = \varvec{\Delta }\left[ \frac{1}{\sqrt{n}}\sum _{i=1}^{n}\varvec{\Psi }_i(\Theta )\right] \overset{d}{\longrightarrow }{\mathcal {N}}(\varvec{0},\varvec{\Delta }), \end{aligned}$$

and by Slutsky’s theorem \(\sqrt{n}({\widehat{\Theta }}-\Theta )=\varvec{\Delta }\varvec{{\mathcal {U}}}_n(\Theta ) +o_p(\varvec{1})\overset{d}{\longrightarrow }N(\varvec{0},\varvec{\Delta })\) to finish the proof.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chang, PC., Pho, KH., Lee, SM. et al. Estimation of parameters of logistic regression for two-stage randomized response technique. Comput Stat 36, 2111–2133 (2021). https://doi.org/10.1007/s00180-021-01068-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00180-021-01068-5

Keywords

Navigation