Next Article in Journal
Design and Numerical Implementation of V2X Control Architecture for Autonomous Driving Vehicles
Next Article in Special Issue
Respondent Burden Effects on Item Non-Response and Careless Response Rates: An Analysis of Two Types of Surveys
Previous Article in Journal
Existence and U-H-R Stability of Solutions to the Implicit Nonlinear FBVP in the Variable Order Settings
Previous Article in Special Issue
Multiple Ordinal Correlation Based on Kendall’s Tau Measure: A Proposal
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Estimation of the Average Kappa Coefficient of a Binary Diagnostic Test in the Presence of Partial Verification

by
José Antonio Roldán-Nofuentes
1,* and
Saad Bouh Regad
2
1
Department of Statistics, School of Medicine, University of Granada, 18016 Granada, Spain
2
Epidemiology and Public Health Research Unit and URMCD, School of Medicine, University of Nouakchott Alaasriya, Nouakchott BP 880, Mauritania
*
Author to whom correspondence should be addressed.
Mathematics 2021, 9(14), 1694; https://doi.org/10.3390/math9141694
Submission received: 22 June 2021 / Revised: 13 July 2021 / Accepted: 14 July 2021 / Published: 19 July 2021

Abstract

:
The average kappa coefficient of a binary diagnostic test is a measure of the beyond-chance average agreement between the binary diagnostic test and the gold standard, and it depends on the sensitivity and specificity of the diagnostic test and on disease prevalence. In this manuscript the estimation of the average kappa coefficient of a diagnostic test in the presence of verification bias is studied. Confidence intervals for the average kappa coefficient are studied applying the methods of maximum likelihood and multiple imputation by chained equations. Simulation experiments have been carried out to study the asymptotic behaviors of the proposed intervals, given some application rules. The results obtained in our simulation experiments have shown that the multiple imputation by chained equations method provides better results than the maximum likelihood method. A function has been written in R to estimate the average kappa coefficient by applying multiple imputation. The results have been applied to the diagnosis of liver disease.

1. Introduction

A binary diagnostic test (BDT) is a medical test used to determine whether or not a patient has a certain disease. Scintigraphy for the diagnosis of liver disease is an example of BDT. Sensitivity and specificity are the fundamental parameters to assess the effectiveness of a BDT. Sensitivity (Se) is the probability of a positive result for the BDT when the patient has the disease, and specificity (Sp) is the probability of a negative result for the BDT when the patient does not have the disease. When considering the losses associated with a misclassification with the BDT, the effectiveness of a BDT is measured with the weighted kappa coefficient [1,2], which depends on Se and Sp of the BDT, on the disease’s prevalence and on a weighting index, which is a measure of the relative loss between the false positives and the false negatives and it is a value set by the clinician and takes a value between 0.5 and 1 when the BDT is used as a screening test, and the weighting index takes a value between 0 and 0.5 when the BDT is used as a confirmatory test. Therefore, the investigator must assign a value to the weighting index according to the utility of the BDT (screening test or confirmatory test). Roldán-Nofuentes and Olvera-Porcel [3] have defined a new measure to evaluate the effectiveness of a BDT based on the weighted kappa coefficient: the average kappa coefficient. The average kappa coefficient depends on the Se and Sp of the BDT and on the disease prevalence, but it does not depend on the weighting index; the average kappa coefficient solves the problem of assigning values to the weighting index.
In order to obtain unbiased estimators of the parameters of a BDT it is necessary to know the disease status of each patient in a random sample. The medical test through which the disease status of a patient is known is called gold standard (GS), and therefore the effectiveness of a BDT is assessed in relation to a GS. A biopsy for the diagnosis of liver disease is an example of GS. The most common sampling design to evaluate the effectiveness of a BDT is cross-sectional sampling. This design consists of applying the BDT and the GS to all patients in a random sample. In this situation the true disease state (disease present or disease absent) is known for all patients in the sample. In the cross-sectional sample there are no missing data and therefore corresponds to a complete data situation.
In clinical practice, it is common that when evaluating a BDT, the GS is not applied to all patients in the sample, giving rise to a problem called partial verification of the disease [4]. If the GS is an expensive medical test or a medical test that involves risks for the patient, then the GS is not applied to all the patients in the sample. In this situation, if Se and Sp are estimated without considering the patients for whom the GS is unknown, the estimators are affected by so-called verification bias [4,5]. Begg and Greenes [4] deduced the maximum likelihood estimators of Se and Sp when the missing data mechanism is missing at random (MAR). The MAR assumption holds that the selection of a patient to verify their disease status with the GS depends only on the result of the BDT. Therefore, the true disease state (disease present or disease absent) is unknown for a subset of patients; the missing information is the true disease status for this subset of patients in the sample. Harel and Zhou [6] have studied the estimation of Se and Sp of a BDT through multiple imputation, assuming the MAR assumption, and they have shown through simulation experiments that multiple imputation provides better results than the method of Begg and Greenes [4]. A review of the impact of verification bias in estimating the accuracy of a BDT (and a continuous test) can be seen in Alonzo [7]. Roldán-Nofuentes and Luna [8] have studied the estimation of the weighted kappa coefficient in the presence of partial disease verification.
In this manuscript we study the estimation of the average kappa coefficient in the presence of verification bias. The manuscript is structured as follows: in Section 2, the weighted kappa coefficient and the average kappa coefficient of a BDT are presented. In Section 3 we study the estimation of the average kappa coefficient with complete data. In Section 4 we study the estimation of the average kappa coefficient when there are missing data, applying the maximum likelihood method and the multiple imputation by chained equations method. In Section 5, simulation experiments are carried out to study the asymptotic behaviors of the confidence intervals proposed in Section 3. In Section 6, we present a function written in R to estimate the average kappa coefficient in the presence of missing data. In Section 7, the results obtained are applied to an example on the diagnosis of liver disease, and in Section 8 the results are discussed.

2. Weighted Kappa Coefficient and Average Kappa Coefficient

Let us consider a BDT whose performance is assessed in relation to a GS. Let L be the loss that occurs when the BDT gives a negative result for a diseased patient, and let L’ be the loss that occurs when the BDT gives a positive result for a non-diseased patient. Losses are assumed to be zero when BDT correctly classifies a diseased patient or a non-diseased patient. Loss L is associated with a false negative and loss L’ with false positive. For example, let us consider the diagnosis of liver disease using scintigraphy as a diagnostic test. If the scintigraphy is positive for a non-disease patient (false positive), the patient will undergo a biopsy which will finally be negative. Loss L’ will be determined from the economic costs of the diagnosis, taking into account the risks, stress and anxiety caused for the patient. If the scintigraphy is negative for a disease patient (false negative), the patient may be diagnosed later. In this situation the disease can progress or get worse, decreasing the chance of successful treatment for the disease. Loss L will be determined from these considerations. Therefore, losses L and L’ are not only measured in economic terms but also in terms of risk, stress, anxiety, etc. Therefore, in practice it is not possible to determine the values of the losses L and L’. Finally, we examine the weighted kappa coefficient and the average kappa coefficient.

2.1. Weighted Kappa Coefficient

The weighted kappa coefficient κ c is a measure of the beyond chance agreement between the BDT and the GS, and it is expressed [1,2] as
κ c = p q Y p c 1 Q + q 1 c Q ,   0 c 1
where p is the disease prevalence, q = 1 p , Y = S e + S p 1 is the Youden index [9], Q = p S e + q 1 S p = P T = 1 , and c = L / L + L is the weighting index. The weighted kappa coefficient can also be written as
κ c = κ 0 κ 1 c κ 0 + 1 c κ 1 , 0 c 1 .
The value of the weighting index is assumed depending on the clinician’s knowledge about false positives and false negatives [1,2]. If the clinician is more concerned about false positives, as is the case in which the BDT is used as a confirmatory test prior to the application of a risk treatment (for example a surgical operation), then L > L and 0 c < 0.5 . For example, if the clinician decides that a false positive is three times more important than a false negative then L = 3 L and c = 1 / 1 + 3 = 0.25 . If the clinician is more concerned about false negatives, as is the case in which the BDT is used as a screening test, then L > L and 0.5 < c 1 . For example, if the clinician decides that a false negative is five times more important than a false positive then L = 5 L and c = 5 / 5 + 1 = 5 / 6 . Value c = 0.5 is used for a simple diagnosis (false positives and false negatives have the same importance), being κ 0.5 the Cohen kappa coefficient.
The weighted kappa coefficient can be classified in the following scale of values [10]: 0–0.20, the agreement is slight; 0.21–0.40, the agreement is fair; 0.41–0.60, the agreement is moderate; 0.61–0.80, the agreement is substantial; and 0.81–1, the agreement is almost perfect. Another scale based on levels of clinical significance is [11]: < 0.40 , poor; 0.40–0.59, fair; 0.60–0.74, good; and 0.75–1, excellent. The weighted kappa coefficient has the following properties: (a) if c = 0 then κ 0 = S p 1 Q / Q and if c = 1 then κ 1 = S e Q / 1 Q ; (b) if S e = S p = 1 then κ c = 1 , and the agreement between BDT and GS is perfect; (c) if the sensitivity and the specificity are complementary S e = 1 S p then κ c = 0 , and the BDT and the GS are independent (the BDT is random and therefore not informative); (d) the weighted kappa coefficient is a function of the index c, which is increasing if Q > p , decreasing if Q < p , or equal to the Youden index if Q = p .

2.2. Average Kappa Coefficient

From the weighted kappa coefficient, Roldán-Nofuentes and Olvera-Porcel [3] have defined a new measure to evaluate the performance of a BDT with respect to a GS: the average kappa coefficient. For fixed values of S e , S p and p , the weighted kappa coefficient is a continuous function of the index c . If the clinician considers that L > L , and therefore 0 c < 0.5 , the average kappa coefficient is [3]
κ 1 = 1 0.5 0 0.5 κ c d c = 2 κ 0 κ 1 κ 0 κ 1 ln κ 0 + κ 1 2 κ 1 ,   p Q Y , p = Q ,
i.e., the average kappa coefficient κ 1 is the average value of κ c when 0 c < 0.5 . If the clinician considers that L > L , and therefore 0.5 < c 1 , the average kappa coefficient is [3]
κ 2 = 1 0.5 0.5 1 κ c d c = 2 κ 0 κ 1 κ 0 κ 1 ln 2 κ 0 κ 0 + κ 1 , p Q Y , p = Q ,
i.e., the average kappa coefficient κ 2 is the average value of κ c when 0.5 < c 1 , where
κ 0 = S p 1 Q Q   and   κ 1 = S e Q 1 Q .
As the weighted kappa coefficient is a measure of the beyond-chance agreement between the BDT and the GS, the average kappa coefficient is a measure of the beyond-chance average agreement between the BDT and the GS, and does not depend on the weighting index c . The values of the average kappa coefficient can be classified on the same scales [10,11] as the values of the weighted kappa coefficient. The average kappa coefficients κ 1 and κ 2 have the following properties [3]:
 
If S e = S p = 1 then κ 1 = κ 2 = 1 , and if S e = 1 S p then κ 1 = κ 2 = 0 . Therefore 0 κ i 1 , i = 1 , 2 .
 
Coefficient κ 1 is greater than κ 2 if p > Q , and κ 1 is lower than κ 2 if Q > p .
 
κ 1 minimizes the expression 2 0 0.5 κ c x 2 d c and κ 2 minimizes the expression 2 0.5 1 κ c x 2 d c . Therefore, when x = κ 1   x = κ 2 the first (second) expression is the variance of the weighted kappa coefficients around κ 1   κ 2 .
 
For fixed values of κ 0 and κ 1 (or Se, Sp and p), the weighted kappa coefficient is a function of c which is continuous in the interval 0 , 1 . Therefore, the average kappa coefficient κ i coincides with a value of the weighted kappa coefficient in the interval 0 , 1 . This value of the weighted coefficient kappa has a value of weighting index c . So, as κ i = κ c for some value of c , from Equation (1) and for a specific sample it is possible to calculate a value of the weighting index c associated to the estimated average kappa coefficient. Thus, the estimation of the average kappa coefficient allows us to estimate how much greater (or smaller) the loss due to the false negatives is than the loss due to the false positives.

3. Estimation with Complete Data

When the BDT and the GS are applied to all patients in a random sample sized m , the observed frequencies in Table 1 are obtained, where the variable T models the result of the BDT ( T = 1 when the result is positive and T = 0 when it is negative) and the variable D models the result of the GS ( D = 1 when the patient has the disease and D = 0 when the patient does not have the disease). In Table 1, each observed frequency x i ( y i ) is the number of diseased (non-diseased) patients in which T = i , x = x 1 + x 0 , y = y 1 + y 0 , m i = x i + y i and n = x + y = m 1 + m 0 , with i = 0 , 1 . In this situation the disease status (disease present or disease absent) of all patients is verified by applying the GS, and it corresponds to a cross-sectional sampling.
In this situation, the maximum likelihood estimator (MLE) of the weighted kappa coefficient [1,2] is
κ ^ c = x 1 y 0 x 0 y 1 m 0 x c + m 1 y 1 c ,   0 c 1 ,
and that the MLEs of κ 0 and κ 1 are
κ ^ 0 = x 1 y 0 x 0 y 1 m 1 y   and   κ ^ 1 = x 1 y 0 x 0 y 1 m 0 x .
Finally, the MLEs of the average kappa coefficients κ 1 and κ 2 are [3]
κ ^ 1 = 2 x 1 y 0 x 0 y 1 m 0 x m 1 y ln m 1 y + m 0 x 2 m 1 y , x 0 y 1 x 1 y 0 x 0 y 1 x y , x 0 = y 1 ,
and
κ ^ 2 = 2 x 1 y 0 x 0 y 1 m 0 x m 1 y ln 2 m 0 x m 1 y + m 0 x , x 0 y 1 x 1 y 0 x 0 y 1 x y , x 0 = y 1 ,
respectively.
If x 0 = y 1 = 0 then κ i cannot be estimated. If x 1 y 0 = x 0 y 1 then κ ^ i = 0 . If x 1 y 0 < x 0 y 1 , or if x 1 = 0 or y 0 = 0 , then Y ^ < 0 and it is necessary to interchange the results of the BDT (the positive result should be T = 0 and the negative result should be T = 1 ). A fundamental analysis in inference statistics is formign a confidence interval (CI) for an unknown parameter. In this context and with respect to the average kappa coefficient, Roldán-Nofuentes and Olvera-Porcel [3] have studied various CIs for κ 1 and κ 2 . These CIs are approximate and their asymptotic behaviors have been studied through simulation experiments. Following this work, two confidence intervals (CIs) for κ 1 and κ 2 studied by Roldán-Nofuentes and Olvera Porcel (Wald CI and logit CI) are summarized and a new CI (arcsine CI) is also presented.

3.1. Wald CI

Based on the asymptotic normality of κ ^ i κ i / V ^ a r κ ^ i , i.e., κ ^ i κ i / V ^ a r κ ^ i N 0 , 1 when m is large, the 100 1 α % Wald CI for κ i is [3]
κ i κ ^ i ± z 1 α / 2 V ^ a r κ ^ i ,   i = 1 , 2 ,
where z 1 α / 2 is the 100 1 α / 2 th percentile of the normal standard distribution. Expressions of the estimated variances are shown in Appendix A.

3.2. Logit CI

Based on the logit transformation of κ ^ i , ln κ ^ i / 1 κ ^ i , is closer to a normal distribution with mean ln κ i / 1 κ i , the 100 1 α % CI for the logit of κ i is
logit κ ^ i ± z 1 α / 2 V ^ a r logit κ ^ i ,   i = 1 , 2 ,
Taking exponential in this expression, the 100 1 α % logit CI for κ i is [3]
κ i exp logit κ ^ i z 1 α / 2 V ^ a r logit κ ^ i 1 + exp logit κ ^ i z 1 α / 2 V ^ a r logit κ ^ i   ,   exp logit κ ^ i + z 1 α / 2 V ^ a r logit κ ^ i 1 + exp logit κ ^ i + z 1 α / 2 V ^ a r logit κ ^ i ,   i = 1 , 2 ,
where the estimated variance is obtained by applying the delta method, i.e.,
V ^ a r logit κ ^ i = V ^ a r κ ^ i κ ^ i 2 1 κ ^ i 2 ,   i = 1 , 2 .

3.3. Arcsine CI

The arcsine is a transformation that has been used to estimate parameters, for example, see the work of Martín-Andrés et al. [12] on the estimation of a binomial proportion. A new CI for κ i can be obtained by applying this transformation. Based on the asymptotic normality of sin 1 κ ^ i sin 1 κ i / V ^ a r sin 1 κ ^ i , i.e., sin 1 κ ^ i sin 1 κ i / V ^ a r sin 1 κ ^ i N 0 , 1 when m is large, the 100 1 α % CI for sin 1 κ i is
sin 1 κ i sin 1 κ ^ i ± z 1 α / 2 V ^ a r sin 1 κ ^ i ,   i = 1 , 2 ,
where the variance V ^ a r sin 1 κ ^ i is easily obtained by applying the delta method, i.e.,
V ^ a r sin 1 κ ^ i = V ^ a r κ ^ i 4 κ ^ i 1 κ ^ i ,   i = 1 , 2 .
Finally, undoing the transformation, the 100 1 α % arcsine CI for κ i is
κ i sin 2 sin 1 κ ^ i ± z 1 α / 2 2 κ ^ i 1 κ ^ i V ^ a r κ ^ i ,   i = 1 , 2 .

4. Estimation in the Presence of Partial Verification

The evaluation of a BDT in the presence of partial verification gives the frequencies in Table 2, where the variables T and D are the same as in Section 3, and the variable V models the verification process, i.e., V = 1 when the disease status of a patient is verified with the GS and V = 0 when it is not.
Let λ i j be the probability of verifying the disease status of a patient with the GS in which T = i and D = j , i.e.,
λ i j = P V = 1 T = i , D = j ,   i , j = 0 , 1 .
Assuming that the missing data mechanism is missing at random (MAR) [13], then
λ i j = λ i = P V = 1 T = i ,   i , j = 0 , 1 .
The MAR assumption takes that the verification process only depends on the result of the BDT and not the GS. This circumstance obtains in two-phase studies: in the first phase, the BDT is applied to all patients in the sample; in the second phase, the GS is applied to only a subset of patients in the sample, depending only on the result of the BDT. Subject to the MAR assumption, the observed frequencies s 1 , r 1 , u 1 , s 0 , r 0 , u 0 are the product of a multinomial distribution whose probabilities are:
ξ i = P V = 1 , D = 1 , T = i = p λ i S e i 1 S e 1 i ψ i = P V = 1 , D = 0 , T = i = q λ i S p 1 i 1 S p i ζ i = P V = 0 , T = i = 1 λ i λ i ξ i + ψ i .
Next, estimation of the average kappa coefficient applying the maximum likelihood (ML) method and applying multiple imputation (MI) is studied.

4.1. Maximum Likelihood

Assuming that the missing data mechanism is MAR the MLEs of sensitivity, specificity and prevalence in the presence of partial verification are [4,5]
S ^ e p v = s 1 n 1 / s 1 + r 1 s 1 n 1 / s 1 + r 1 + s 0 n 0 / s 0 + r 0 ,   S ^ p p v = r 0 n 0 / s 0 + r 0 r 1 n 1 / s 1 + r 1 + r 0 n 0 / s 0 + r 0 ,
and
p ^ p v = s 1 n 1 / s 1 + r 1 + s 0 n 0 / s 0 + r 0 n .
Substituting in Equations (2) and (3) parameters with their MLEs in the presence of partial verification, the MLEs of κ 1 and κ 2 in the presence of partial verification are
κ ^ 1 p v = 2 p ^ p v q ^ p v Y ^ p v p ^ p v Q ^ p v ln p ^ p v + Q ^ p v 2 p ^ p v Q ^ p v 2 q ^ p v Q ^ p v   and   κ ^ 2 p v = 2 p ^ p v q ^ p v Y ^ p v p ^ p v Q ^ p v ln 2 p ^ p v Q ^ p v p ^ p v + Q ^ p v 2 p ^ p v Q ^ p v
when p ^ p v Q ^ p v , and
κ ^ 1 p v = κ ^ 2 p v = Y ^ p v = n 1 n 0 s 1 + r 1 s 0 + r 0 s 1 r 0 s 0 r 1 n 1 r 1 s 0 + r 0 + n 0 r 0 s 1 + r 1 n 1 s 1 s 0 + r 0 + n 0 s 0 s 1 + r 1
when p ^ p v = Q ^ p v , where Q ^ p v = n 1 / n . The expressions of the estimators κ ^ 1 p v and κ ^ 2 p v are long and complicated when p ^ p v Q ^ , so statistical software is necessary to calculate them (see Section 6). Next, three asymptotic CIs for κ i in the presence of partial verification are proposed.

4.1.1. Wald CI

Based on the asymptotic normality of κ ^ i p v κ i / V ^ a r κ ^ i p v , the 100 1 α % Wald CI for κ i is
κ i κ ^ i p v ± z 1 α / 2 V ^ a r κ ^ i p v ,   i = 1 , 2 .
The expressions of the estimated variances are shown in Appendix B. These expressions are long and complicated, so it is necessary to use a statistical program to calculate them (see Section 6).

4.1.2. Logit CI

The logit CI is based on the asymptotic normality of the logit of logit κ ^ i p v logit κ i / V ^ a r logit κ ^ i p v . The logit CI for κ i has a general expression similar to that obtained in Section 3.2, although the expressions for the estimators and the variances are different. The expressions of the variances are shown in Appendix B, and it is necessary to use a statistical program to calculate them.

4.1.3. Arcsine CI

The arcsine CI is also based on the asymptotic normality of sin 1 κ ^ i p v sin 1 κ i / V ^ a r sin 1 κ ^ i p v and its general expression is similar to that given in Section 3.3, where the variances are shown in Appendix B.

4.2. Multiple Imputation

Multiple imputation (MI) [14,15,16,17] is a computational method used to solve estimation problems with missing data. MI consists of constructing M complete data sets, obtained by replacing the missing data with M independent imputed sets. In each complete data set, the estimators of the parameters and their standard errors are calculated, and these are combined appropriately to calculate the global estimators, their standard errors and their confidence intervals. Harel and Zhou [6] have applied MI to estimate the sensitivity (specificity) of a BDT in the presence of partial verification and have shown that this method provides CIs with better asymptotic behavior than the CIs obtained by applying the ML method. Montero-Alonso and Roldán-Nofuentes [18] have studied the estimation of the likelihood ratios of two BDTs in the presence of partial verification using the MI by chained equations (MICE) method and have also shown that this method provides CIs with better asymptotic behavior.
In our context, from the 3 × 2 table given in Table 2, M 2 × 2 tables are imputed (as in Table 1), and from each one of these M tables the estimator of κ i , its standard error and the CIs given in Section 3 are calculated. The M results are then combined by applying the Rubin rules [14] and, in this way, the CI for κ i is calculated. Regarding the imputation of missing data, MICE method was used. MICE method requires the MAR assumption and can be used with different types of variables. In the problem posed in this article there are two binary random variables: variable T and variable D . The work by White et al. [19] explains in detail the imputation of binary variables using the MICE method. For variable T there are no missing data since BDT is applied to all patients. However, variable D is not observed in all patients and therefore this variable has missing data. Firstly, all missing values are filled in at random. Variable D is then regressed on the variable T through a logistic regression. The estimation is thus restricted to individuals with observed T . Missing values in D are then replaced by simulated draws from the posterior predictive distribution of variable D . This process is called a cycle, and in order to stabilize the results the process is repeated for a determined number of cycles in order to obtain a set of imputed data. Applying multiple imputation, the estimator of κ i is the mean of the estimators obtained in M complete data sets, and their standard errors are calculated by applying the Rubin rules [14]. In the situation studied in this article, the application MICE requires that s i > 0 and r i > 0 .

5. Simulation Experiments

Monte Carlo simulation experiments have been carried out to study the asymptotic behavior (coverage probability and average length) of the CIs studied in Section 4. The relative biases of the estimators of the average kappa coefficients obtained through ML and through MI have also been studied. These experiments consisted of the generation of 10,000 random samples of multinomial distributions sized n = 50 , 100 , 200 , 500 , 1000 , and whose probabilities have been calculated from equations. These probabilities have been calculated in the following way: with respect to verification probabilities, we have taken two sets of values, λ 1 = 0.70 , λ 0 = 0.25 and λ 1 = 0.95 , λ 0 = 0.40 , which can be considered low and high verification probability values. As values of disease prevalence we took the values p = 10 % , 30 % , 50 % , 70 % and as values of κ 1 and κ 2 we took the values 0.20 , 0.40 , 0.60 , 0.80 . Once we have set the values of κ 1 and κ 2 , the values of κ 0 and κ 1 are obtained solving with the Newton-Raphson method the system made by Equations (1) and (2), only considering those values whose solutions are between 0 and 1. Once we have obtained the values of κ 0 and κ 1 , as the value prevalence p has been set previously, the values of S e and S p are calculated solving the system made by equations κ 0 = S p 1 Q / Q and κ 1 = S e Q / 1 Q , and then the probabilities of the multinomial distributions are calculated. Therefore, the samples have been generated by fixing κ 1 and κ 2 . The random samples have been generated in such a way that κ 1 and κ 2 and their standard errors can be estimated in all of them, and also verifying that κ ^ i > 0 (and, in this way, to be able to calculate all CIs). For example, if, in a sample, a frequency s i or r i is equal to 0, then MICE cannot be applied; in this situation this sample has been ruled out and another one has been generated instead until we have obtained 10,000 samples. The simulation experiments have been carried out using the R program [20] and the “mice” library [21]. Regarding MICE, this has been carried out using M = 20 data sets and performing 100 cycles. The M = 20 complete data sets are generated in such a way that κ 1 and κ 2 (and their standard errors) can be estimated in all of them. Thus, for example, if, in a complete data set κ ^ i < 0 , then that complete data set is neglected and another is generated in its place, and so on until obtaining 20 complete data sets. In a first phase of these experiments, we have considered M = 20 and M = 50 complete data sets and we have also considered 100 and 200 cycles in each case, obtaining very similar results. Therefore, we have considered M = 20 and 100 cycles to save computation time and stabilize the results. These 20 complete data sets have been generated in such a way that κ 1 and κ 2 and their standard errors can be estimated in all of them, verifying that each estimate of κ i is greater than 0. In each sample generated, we have calculated the three CIs (95% confidence) given in Section 3 along with the MICE method and the three CIs given in Section 4.1. Finally, we have calculated the coverage probabilities and the average lengths of the CIs in each scenario. The relative biases of the estimators of κ 1 and κ 2 obtained through ML and through MICE have also been calculated.
Table 3 and Table 4 show some of the results obtained for κ 1 = 0.2 , 0.4 , 0.6 , 0.8 , indicating in each case the values of S e , S p and p .
From the results of these experiments we reach the following conclusions:
(a)
With respect to ML, the verification probabilities do not have a clear effect on the coverage probabilities (CPs) of the CIs. With respect to the CIs, in general terms their CPs far exceed 95% when the sample size is small ( n = 50 ) or moderate ( n = 100 200 ), fluctuating around 95% when the sample size is large ( n = 500 1000 ). The Wald CI has a CP that fluctuates around 95% when the sample size is moderate or large. The logit CI has a higher CP than that of the Wald CI, especially when the sample size is small or moderate. The arcsine CI can have a CP of less than 90% when the sample size is small and κ 1 is small ( κ 1 = 0.2 ) and fluctuates around 95% when the sample size is large. In general terms, the Wald CI is the interval with the best performance when the sample size is small or moderate, while all three CIs have a very similar asymptotic behavior when the sample size is large.
(b)
With respect to MICE, the verification probabilities do not have a clear effect on the CPs of the CIs. The Wald CI has a coverage probability that exceeds 95% when the sample size is small or moderate and the value of κ 1 is small ( κ 1 = 0.2 ), fluctuating around 95% in the other situations and sample sizes. The logit CI has a CP that is slightly higher than that of the Wald CI, especially when the sample size is small or moderate. The arcsine CI has a CP closer to 95% when the sample size is small, and in the rest of sample size its CP is slightly higher than that of the Wald CI.
(c)
Comparing the CIs obtained by ML and those obtained by MICE, MICE along with the Wald CI presents, in general terms, better fluctuations around 95% than any of the CIs obtained by ML; once MICE, along with the Wald CI, reaches a CP of 95%, it fluctuates very slightly around 95%. Furthermore, in general terms, MICE along with the Wald CI begins to fluctuate around 95% with a sample size smaller than the CIs by ML. Regarding the average lengths, the CIs obtained by ML have an average length slightly less than that of the CIs obtained by applying MICE when the sample size is small or moderate, although the latter show better fluctuations around 95%. The average lengths are very similar when the sample size is large.
(d)
Regarding the comparison of the estimators obtained by ML and MICE, relative biases are very similar. Difference (in absolute value) is small (less than 5%) when the sample size is small, and the difference is very small (less than 1%) when the sample size is large. Therefore, ML and MICE provide estimators of κ 1 that are, on average, very similar.
Table 5 and Table 6 show some of the results obtained for κ 2 = 0.2 , 0.4 , 0.6 , 0.8 . In very general terms, very similar conclusions are obtained to those obtained for the ICs of κ 1 .

6. Function Eakcpv

We have written a function in R [20], called “eakcpv” (Estimation of the Average Kappa Coefficient in the presence of Partial Verification), to estimate the average kappa coefficient of a BDT in the presence of partial disease verification. The command to run the “eakcpv” function is “ eakcpv s 1 , r 1 , u 1 , s 0 , r 0 , u 0 , c o n f , i m p , c y c l ”, where s 1 , r 1 , u 1 , s 0 , r 0 , u 0 are the observed frequencies, “conf” is the confidence level, “imp” is the number of complete data sets and “cycl” is the number of cycles. The complete data sets are generated in such a way that κ 1 and κ 2 (and their standard errors) can be estimated in all of them. Thus, for example, if, in a complete data, set κ ^ i < 0 , then that complete data set is neglected and another is generated in its place, and so on until obtaining “imp” complete data sets. The function always checks that the values are valid and that the analysis can be performed (e.g., no frequency s i or r i is equal to 0, etc.). The function estimates κ 1 and κ 2 applying MICE, along with the Wald and arcsine CIs. The function estimates the relative loss between false positives and false negatives, and also estimates how much greater (or less) the loss associated with a false positive is than the loss associated with a false negative. The results obtained are recorded in a file called “results_eakcpv.txt” in the same folder from which the function is run. The function “eakcpv” is available as Supplementary Materials of this manuscript.

7. Example

The results obtained have been applied to the study of Drum and Christacopoulos [22] on the diagnosis of liver disease. Drum and Christacopoulos [22] have studied the diagnosis of liver disease using a hepatic scintigraphy as BDT and a biopsy as GS. In Table 7, we show the observed frequencies, where variable T models the result of the hepatic scintigraphy, variable V models the verification process and variable D models the result of the biopsy.
Running the “eakcpv” function with the command
eakcpv 231 , 32 , 166 , 27 , 54 , 140 , 0.95 , 20 , 100 ,
it is obtained that κ ^ p v 0 = 0.597 and κ ^ p v 1 = 0.507 . With respect to κ 1 , it is obtained that κ ^ 1 m i c e = 0.572 , its standard error is 0.059 and the 95% Wald CI for κ 1 is 0.452   ,   0.691 . The estimated relative loss between the false positives and the false negatives is c ^ = 0.252 , and the loss associated with the false positives (L’) is 2.97 times greater than the loss associated with the false negatives (L). With respect to κ 2 , it is obtained that κ ^ 2 m i c e = 0.526 , its standard error is 0.066 and the 95% Wald CI for κ 2 is 0.393   ,   0.660 . Estimated relative loss between the false positives and the false negatives is c ^ = 0.752 , and the loss associated with the false negatives (L) is 3.03 times greater than the loss associated with the false positives (L’).
When hepatic scintigraphy is to be used as a confirmatory test prior to risky treatment ( L > L and 0 c < 0.5 ), the beyond-chance average agreement between the hepatic scintigraphy and the biopsy is moderate ( κ ^ 1 m i c e = 0.572 ), and in terms of the Wald CI, the beyond-chance average agreement between the hepatic scintigraphy and the biopsy is a value between moderate and substantial (95% confidence). Estimated relative loss between the false positives and the false negatives is 0.252. As c = L / L + L = L / L / 1 + L / L , it is possible to calculate which loss (L or L’) is greater. Loss associated with the false positives (L’) is 2.97 times greater than the loss associated with the false negatives (L). Therefore, if the clinician considers that L > L , then the beyond-chance average agreement between the hepatic scintigraphy and the biopsy is moderate κ ^ 1 = 0.572 , and the loss that occurs when erroneously classifying a non-diseased patient with the hepatic scintigraphy is 2.97 times greater than the loss that occurs when erroneously classifying a diseased patient with the hepatic scintigraphy.
When hepatic scintigraphy is to be used as a screening test ( L > L and 0.5 < c 1 ), the beyond-chance average agreement between the hepatic scintigraphy and the biopsy is moderate κ ^ 2 m i c e = 0.526 . In terms of the Wald CI, the beyond-chance average agreement between the hepatic scintigraphy and the biopsy is a value between fair and substantial (95% confidence). Estimated relative loss between the false positives and the false negatives is 0.752, so that the loss associated with the false negatives (L) is 3.03 times greater than the loss associated with the false positives (L’). Therefore, if the clinician considers that L > L , then the loss that occurs when erroneously classifying a diseased patient with the hepatic scintigraphy is 3.03 times greater than the loss committed when erroneously classifying a non-diseased patient with the hepatic scintigraphy.

8. Discussion

The average kappa coefficient is a measure of the beyond-chance average agreement between the BDT and the GS, and depends only on the Se and Sp of the BDT and on disease prevalence. The average kappa coefficient solves the problem of assigning values to the weighting index of the weighted kappa coefficient. In this manuscript we study the estimation of the average kappa coefficient when the gold standard is not applied to all patients in a sample. We study the estimation of the average kappa coefficient when the gold standard is not applied to all patients in a sample, a situation known as partial verification of the disease. The estimation of the average kappa coefficient has been carried out by applying two methods: the maximum likelihood method and the MICE method. As both methods require that the verification process be MAR, it therefore follows the verification process does not depend on disease status.
We have carried out simulation experiments to study the asymptotic behavior of the proposed ICs, both using the maximum likelihood approach and MICE. The relative biases of the two estimators (maximum likelihood and MICE) of the average kappa coefficient have also been calculated. MICE method along with the arcsine CI is the interval that has been shown to have a better coverage probability when the sample size is small, while MICE method along the Wald CI has shown to have a better coverage probability when the sample size is moderate or large. Regarding the relative biases, the difference between the relative biases of both types of estimators is small, such that both methods give rise to estimators that on average are very similar. Therefore, we recommend using MICE instead of the maximum likelihood method.
As in other studies [6,17], multiple imputation has proven to be a good method (and better than the maximum likelihood method) to estimate parameters of a binary diagnostic test in the presence of partial verification of the disease. In the situation studied here, the application of MICE has been carried out by generating 20 data sets. Rubin [14] recommended imputing five complete data sets in order to be able to apply multiple imputation. As our simulations have given stable values with 20 and 50 data sets, we decided, finally, to use 20.
The MICE method is requires the missing data to be MAR, so if the verification process depends on disease status then the MAR assumption is not verified and MICE cannot be applied. Therefore, it is necessary to study other methods of estimating the average kappa coefficient when the MAR assumption is not verified. The application of the method used by Kosinski and Barnhart [23] may be a solution to this problem. Future research should also focus on estimating the average kappa coefficient when covariates are observed in all patients in the sample.
Finally, we have written a function in R to estimate the average kappa coefficient in the situation studied in this manuscript, applying MICE. The function is available as Supplementary Materials.

Supplementary Materials

The following are available online at https://www.mdpi.com/article/10.3390/math9141694/s1. The function “eakcpv” is a function written in R that allows estimating the average kappa coefficient by applying the MICE method.

Author Contributions

J.A.R.-N. and S.B.R. have collaborated equally in the realization of this work. Both authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

We thank the anonymous referees for their helpful comments that improved the quality of the manuscript.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

Roldán-Nofuentes and Olvera-Porcel [3] have deduced (applying the delta method) the expressions of the estimated variances of the estimators of the average kappa coefficients when the BDT and GS are applied to all patients in a sample.
When p ^ Q ^ (which is equivalent to x 0 y 1 ), the estimated asymptotic variances of κ ^ 1 and κ ^ 2 are [3]
V ^ a r κ ^ 1 = 1 κ ^ 0 + κ ^ 1 2 κ ^ 0 κ ^ 1 2 × 2 κ ^ 0 2 κ ^ 1 κ ^ 1 κ ^ 0 + κ ^ 1 κ ^ 1 κ ^ 0 2 1 S ^ p 2 Y ^ 2 V ^ a r p ^ + p ^ 2 1 S ^ p 2 V ^ a r S ^ e + S ^ e 2 V ^ a r S ^ p Q ^ 4 + κ ^ 0 κ ^ 0 + κ ^ 1 κ ^ 1 2 κ ^ 0 κ ^ 1 κ ^ 1 2 1 S ^ e 2 Y ^ 2 V ^ a r p ^ + q ^ 2 S ^ p 2 V ^ a r S ^ e + 1 S ^ e 2 V ^ a r S ^ p 1 Q ^ 4 + 2 2 κ ^ 0 2 κ ^ 1 + κ ^ 1 κ ^ 0 + κ ^ 1 κ ^ 1 κ ^ 0 κ ^ 0 κ ^ 0 κ ^ 1 κ ^ 1 2 κ ^ 0 κ ^ 1 κ ^ 1 × p ^ q ^ 1 S ^ e S ^ e V ^ a r S ^ p + 1 S ^ p S ^ p V ^ a r S ^ e 1 S ^ e 1 S ^ p Y ^ 2 V ^ a r p ^ Q ^ 2 1 Q ^ 2
and
V ^ a r κ ^ 2 = 1 κ ^ 0 + κ ^ 1 2 κ ^ 0 κ ^ 1 2 × κ ^ 1 2 κ ^ 0 κ ^ 1 κ ^ 0 + κ ^ 1 κ ^ 2 κ ^ 0 2 1 S ^ p 2 Y ^ 2 V ^ a r p ^ + p ^ 2 1 S ^ p 2 V ^ a r S ^ e + S ^ e 2 V ^ a r S ^ p Q ^ 4 + κ ^ 0 κ ^ 0 + κ ^ 1 κ ^ 2 2 κ ^ 0 κ ^ 1 2 κ ^ 1 2 1 S ^ e 2 Y ^ 2 V ^ a r p ^ + q ^ 2 S ^ p 2 V ^ a r S ^ e + 1 S ^ e 2 V ^ a r S ^ p 1 Q ^ 4 + 2 κ ^ 1 2 κ ^ 0 κ ^ 1 κ ^ 0 + κ ^ 1 κ ^ 2 κ ^ 0 κ ^ 0 κ ^ 0 + κ ^ 1 κ ^ 2 2 κ ^ 0 κ ^ 1 2 κ ^ 1 × p ^ q ^ 1 S ^ e S ^ e V ^ a r S ^ p + 1 S ^ p S ^ p V ^ a r S ^ e 1 S ^ e 1 S ^ p Y ^ 2 V ^ a r p ^ Q ^ 2 1 Q ^ 2
respectively, where S ^ e = x 1 / x , S ^ p = y 0 / y , p ^ = x / m , q ^ = y / m , Y ^ = x 1 y 0 x 0 y 1 x y and Q ^ = m 1 / m .
When p ^ = Q ^ (which is equivalent to x 0 = y 1 ) the estimated asymptotic variances are
V ^ a r κ ^ 1 = V ^ a r κ ^ 2 = V ^ a r Y ^ = S ^ e 1 S ^ e x + S ^ p 1 S ^ p y .

Appendix B

In this appendix, the expressions of the estimated variances of the estimators of the average kappa coefficients in the presence of partial verification are obtained. Applying the delta method, the asymptotic estimated variance of κ ^ i p v , with i = 1 , 2 , is
V ^ a r κ ^ i p v = κ i p v S e S e = S ^ e p v 2 V ^ a r S ^ e p v + κ i p v S p S p = S ^ p p v 2 V ^ a r S ^ p p v + κ i p v p p = p ^ p v 2 V ^ a r p ^ p v + 2 κ i p v S e S e = S ^ e p v κ i p v S p S p = S ^ p p v C ^ o v S ^ e p v , S ^ p p v + 2 κ i p v S e S e = S ^ e p v κ i p v p p = p ^ p v C ^ o v S ^ e p v , p ^ p v + 2 κ i p v S p S p = S ^ p p v κ i p v p p = p ^ p v C ^ o v S ^ p p v , p ^ p v ,
when p ^ p v Q ^ p v , and
V ^ a r κ ^ i p v = V ^ a r Y ^ p v = S ^ p p v 2 V ^ a r S ^ e p v + S ^ e p v 2 V ^ a r S ^ p p v + 2 S ^ e p v S ^ p p v C ^ o v S ^ e p v , S ^ p p v
when p ^ p v = Q ^ p v , and where [4]
V ^ a r S ^ e p v = S ^ e p v 1 S ^ e p v 2 n n 1 n 0 + r 1 s 1 s 1 + r 1 + r 0 s 0 s 0 + r 0 , V ^ a r S ^ p p v = S ^ p p v 1 S ^ p p v 2 n n 1 n 0 + s 1 r 1 s 1 + r 1 + s 0 r 0 s 0 + r 0
and [24]
C ^ o v S ^ e p v , S ^ p p v = S ^ e p v S ^ p p v 1 S ^ e p v 1 S ^ p p v u 1 n 1 s 1 + r 1 + u 0 n 0 s 0 + r 0 .
The variance V ^ a r p ^ and covariances C ^ o v p ^ , S ^ e and C ^ o v p ^ , S ^ p are obtained by applying the delta method. Let τ be the positive predictive value of the BDT, let υ be the negative predictive value of the BDT, let Q be the probability of a positive result of the BDT, and let ψ = τ , υ , Q T . Applying the delta method, the variance-covariance matrix of ψ is [25]
ψ = D i a g τ 2 1 τ 2 s 1 1 τ 2 + r 1 τ 2 , υ 2 1 υ 2 s 0 υ 2 + r 0 1 τ 2 , Q 2 1 Q 2 n 1 1 Q 2 + n 0 Q 2
MLEs of predictive values in the presence of partial verification are [26] τ ^ p v = s 1 / s 1 + r 1 and υ ^ p v = r 0 / s 0 + r 0 , and the MLE of Q is Q ^ p v = n 1 / n . Therefore, in the presence of partial verification, the estimators of the predictive values coincide with the naïve estimators (those obtained regardless of the unverified patients) when the MAR hypothesis is assumed [26]. Let θ = S e , S p , p T be the vector whose components are the sensitivity, the specificity and the prevalence. As the sensitivity, specificity and prevalence can be written in terms of the predictive values and of Q as S e = τ υ 1 p p τ + υ 1 , S p = υ τ p 1 p τ + υ 1 and p = 1 1 τ Q 1 Q υ , then the estimated variance-covariance matrix of θ ^ is obtained by applying the delta method, i.e.,
^ θ ^ = θ ψ θ = θ ^ p v ^ ψ ^ θ ψ θ = θ ^ p v T
Carrying out the algebraic operations it is obtained:
V ^ a r p ^ p v = τ ^ p v 1 τ ^ p v Q ^ p v 2 s 1 + r 1 + υ ^ p v 1 υ ^ p v 1 Q ^ p v 2 s 0 + r 0 + s 1 r 0 s 0 r 1 2 Q ^ p v 1 Q ^ p v n s 1 + r 1 2 s 0 + r 0 2 , C ^ o v S ^ e p v , p ^ p v = n 1 n 0 s 1 s 0 s 1 + r 1 s 0 + r 0 n 1 s 1 s 0 + r 0 + n 0 s 0 s 1 + r 1 2 1 τ ^ p v Q ^ p v s 1 + r 1 υ ^ p v 1 Q ^ p v s 0 + r 0 + s 1 r 0 s 0 r 1 n s 1 + r 1 s 0 + r 0
and
C ^ o v S ^ p p v , p ^ p v = n 1 n 0 r 1 r 0 s 1 + r 1 s 0 + r 0 n 1 r 1 s 0 + r 0 + n 0 r 0 s 1 + r 1 2 τ ^ p v Q ^ p v s 1 + r 1 1 υ ^ p v 1 Q ^ p v s 0 + r 0 s 1 r 0 s 0 r 1 n s 1 + r 1 s 0 + r 0 .

References

  1. Kraemer, H.C. Evaluating Medical Tests. Objective and Quantitative Guidelines; Sage Publications: Newbury Park, CA, USA, 1992. [Google Scholar]
  2. Kraemer, H.C.; Periyakoil, V.S.; Noda, A. Kappa coefficients in medical research. Stat. Med. 2002, 21, 2109–2129. [Google Scholar] [CrossRef]
  3. Roldán-Nofuentes, J.A.; Olvera-Porcel, C. Average kappa coefficient: A new measure to assess a binary test considering the losses associated with an erroneous classification. J. Stat. Comput. Simul. 2015, 85, 1601–1620. [Google Scholar] [CrossRef]
  4. Begg, C.B.; Greenes, R.A. Assessment of diagnostic tests when disease verification is subject to selection bias. Biometrics 1983, 39, 207–215. [Google Scholar] [CrossRef]
  5. Zhou, X.H. Maximum likelihood estimators of sensitivity and specificity corrected for verification bias. Comm. Statist. Theory Methods 1993, 22, 3177–3198. [Google Scholar] [CrossRef]
  6. Harel, O.; Zhou, X.H. Multiple imputation for correcting verification bias. Stat. Med. 2006, 25, 3769–3786. [Google Scholar] [CrossRef] [PubMed]
  7. Alonzo, T.A. Verification bias-impact and methods for correction when assessing accuracy of diagnostic tests. REVSTAT 2014, 12, 67–83. [Google Scholar]
  8. Roldán-Nofuentes, J.A.; Luna del Castillo, J.D. Risk of error and the kappa coefficient of a binary diagnostic test in the presence of partial verification. J. Appl. Stat. 2007, 34, 887–898. [Google Scholar] [CrossRef]
  9. Youden, W.J. Index for rating diagnostic tests. Cancer 1950, 3, 32–35. [Google Scholar] [CrossRef]
  10. Landis, R.; Koch, G. The measurement of observer agreement for categorical data. Biometrics 1977, 33, 159–174. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  11. Cicchetti, D.V. The precision of reliability and validity estimates re-visited: Distinguishing between clinical and statistical significance of sample size requirements. J. Clin. Exp. Neuropsychol. 2001, 23, 695–700. [Google Scholar] [CrossRef]
  12. Martín-Andrés, A.; Álvarez-Hernández, M. Two-tailed asymptotic inferences for a proportion. J. Appl. Stat. 2014, 41, 1516–1529. [Google Scholar] [CrossRef]
  13. Rubin, D.B. Inference and missing data. Biometrika 1976, 4, 73–89. [Google Scholar] [CrossRef]
  14. Rubin, D.B. Multiple Imputation for Nonresponse in Surveys; Wiley: New York, NY, USA, 1987. [Google Scholar]
  15. Rubin, D.B. Multiple Imputation after 18+ years. J. Am. Stat. Assoc. 1996, 91, 473–489. [Google Scholar] [CrossRef]
  16. Schafer, J.L. Analysis of Incomplete Multivariate Data; Chapman and Hall: New York, NY, USA, 1997. [Google Scholar]
  17. Harel, O.; Zhou, X.H. Multiple imputation: Review of theory, implementation and software. Stat. Med. 2007, 26, 3057–3077. [Google Scholar] [CrossRef] [Green Version]
  18. Montero-Alonso, M.A.; Roldán-Nofuentes, J.A. Approximate confidence intervals for the likelihood ratios of a binary diagnostic test in the presence of partial disease verification. J. Biopharm. Stat. 2019, 29, 56–81. [Google Scholar] [CrossRef] [PubMed]
  19. White, I.R.; Royston, P.; Wood, A.M. Multiple imputation using chained equations: Issues and guidance for practice. Stat. Med. 2011, 30, 377–399. [Google Scholar] [CrossRef] [PubMed]
  20. R Core Team. A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2016; Available online: https://www.R-project.org/ (accessed on 1 June 2021).
  21. van Buuren, S.; Groothuis-Oudshoorn, K. Mice: Multivariate imputation by chained equations in R. J. Stat. Softw. 2011, 45, 3. [Google Scholar] [CrossRef] [Green Version]
  22. Drum, D.E.; Christacopoulos, J.S. Hepatic scintigraphy in clinical decision making. J. Nucl. Med. 1972, 13, 908–915. [Google Scholar]
  23. Kosinski, A.S.; Barnhart, H.X. Accounting for nonignorable verification bias in assessment of diagnostic tests. Biometrics 2003, 59, 163–171. [Google Scholar] [CrossRef]
  24. Kosinski, A.S.; Chenb, Y.; Lylesc, R.H. Sample size calculations for evaluating a diagnostic test when the gold standard is missing at random. Stat. Med. 2010, 29, 1572–1579. [Google Scholar] [CrossRef]
  25. Zhou, X.H.; Obuchowski, N.; McClish, D. Statistical Methods in Diagnostic Medicine, 2nd ed.; Wiley: New York, NY, USA, 2011. [Google Scholar]
  26. Zhou, X.H. Effect of verification bias on positive and negative predictive values. Stat. Med. 1994, 13, 1737–1745. [Google Scholar] [CrossRef] [PubMed]
Table 1. Observed frequencies in the presence of complete data.
Table 1. Observed frequencies in the presence of complete data.
Observed   Frequencies   of   the   2   ×   2   Table
T = 1 T = 0 Total
D = 1 x 1 x 0 x
D = 0 y 1 y 0 y
Total m 1 m 0 m
Table 2. Observed frequencies in the presence of partial verification.
Table 2. Observed frequencies in the presence of partial verification.
Observed   Frequencies   of   the   3   ×   2   Table
T = 1 T = 0 Total
V = 1
D = 1 s 1 s 0 s
D = 0 r 1 r 0 r
V = 0 u 1 u 0 u
Total n 1 n 0 n
Table 3. Coverage probabilities and average lengths of CIs for κ 1 = 0.2 , 0.4 .
Table 3. Coverage probabilities and average lengths of CIs for κ 1 = 0.2 , 0.4 .
κ 1 = 0.2 S e = 0.7773 S p = 0.7308 p = 10 %
λ 1 = 0.70 λ 0 = 0.25
Maximum Likelihood MethodMICE Method
nRelative Bias (%)Wald CILogit CIArcsine CIRelative Bias (%)Wald CILogit CIArcsine CI
CPALCPALCPALCPALCPALCPAL
50−13.60.9980.4920.9940.7210.8750.527−16.70.9980.4810.9990.8310.9460.577
100−11.40.9820.3720.9890.4860.9730.394−14.30.9820.3720.9960.6490.9790.437
200−7.50.9600.2760.9880.3020.9850.277−10.60.9540.2930.9960.4130.9930.315
500−3.50.9420.1730.9720.1740.9570.172−6.10.9480.1870.9780.1990.9650.189
1000−1.80.9480.1210.9630.1220.9560.121−2.40.9480.1280.9710.1310.9620.129
λ 1 = 0.95 λ 0 = 0.40
Maximum Likelihood MethodMICE Method
nRelative Bias (%)Wald CILogit CIArcsine CIRelative Bias (%)Wald CILogit CIArcsine CI
CPALCPALCPALCPALCPALCPAL
50−13.50.9880.4250.9930.6020.9460.460−15.30.9880.4200.9950.7510.9520.503
100−9.20.9600.3220.9840.3800.9790.330−11.70.9530.3290.9910.5090.9800.367
200−5.60.9530.2320.9880.2390.9810.230−7.80.9480.2420.9930.2790.9870.246
500−2.70.9510.1460.9620.1460.9540.145−3.90.9500.1510.9710.1530.9620.150
1000−0.50.9470.1020.9560.1020.9530.102−1.10.9510.1040.9560.1050.9530.104
κ 1 = 0.4 S e = 0.7413 S p = 0.7441 p = 30 %
λ 1 = 0.70 λ 0 = 0.25
Maximum Likelihood MethodMICE Method
nRelative Bias (%)Wald CILogit CIArcsine CIRelative Bias (%)Wald CILogit CIArcsine CI
CPALCPALCPALCPALCPALCPAL
50−18.10.9790.6300.9960.6550.9730.618−20.90.9790.6240.9970.7400.9650.653
100−10.10.9630.4760.9940.4700.9880.464−13.80.9520.4990.9950.5540.9730.509
200−4.50.9610.3400.9890.3300.9770.333−6.20.9470.3650.9840.3730.9700.364
500−1.50.9520.2130.9610.2100.9560.211−2.60.9490.2250.9600.2240.9550.224
1000−1.10.9540.1500.9590.1490.9580.149−1.50.9500.1580.9550.1580.9510.158
λ 1 = 0.95 λ 0 = 0.40
Maximum Likelihood MethodMICE Method
nRelative Bias (%)Wald CILogit CIArcsine CIRelative Bias (%)Wald CILogit CIArcsine CI
CPALCPALCPALCPALCPALCPAL
50−15.50.9600.5500.9930.5610.9820.539−18.40.9550.5590.9960.6380.9890.575
100−8.20.9550.4050.9850.3930.9800.395−10.90.9500.4210.9910.4310.9850.418
200−3.40.9560.2830.9740.2770.9670.279−5.10.9560.2940.9800.2900.9670.290
500−1.10.9470.1780.9570.1760.9520.177−1.30.9500.1820.9630.1810.9570.181
1000−0.60.9550.1250.9580.1250.9570.125−0.70.9510.1280.9580.1280.9550.128
CP: coverage probability. AL: average length.
Table 4. Coverage probabilities and average lengths of CIs for κ 1 = 0.6 , 0.8 .
Table 4. Coverage probabilities and average lengths of CIs for κ 1 = 0.6 , 0.8 .
κ 1 = 0.6 S e = 0.6816 S p = 0.8624 p = 50 %
λ 1 = 0.70 λ 0 = 0.25
Maximum Likelihood MethodMICE Method
nRelative Bias (%)Wald CILogit CIArcsine CIRelative Bias (%)Wald CILogit CIArcsine CI
CPALCPALCPALCPALCPALCPAL
50−17.40.9840.70110.6380.9970.658−20.30.9890.71410.6940.9770.692
100−8.90.9690.5080.9940.4710.9870.485−11.50.9630.5390.9930.5140.9810.521
200−4.90.9630.3580.9740.3430.9680.349−6.30.9550.3840.9730.3710.9640.376
500−2.00.9460.2240.9520.2210.9500.222−2.70.9500.2380.9560.2350.9540.237
1000−0.60.9530.1570.9540.1560.9540.156−0.80.9510.1650.9530.1650.9530.166
λ 1 = 0.95 λ 0 = 0.40
Maximum Likelihood MethodMICE Method
nRelative Bias (%)Wald CILogit CIArcsine CIRelative Bias (%)Wald CILogit CIArcsine CI
CPALCPALCPALCPALCPALCPAL
50−13.50.9730.60010.5550.9910.568−15.20.9730.60810.5870.9710.593
100−6.70.9580.4200.9800.3980.9670.407−7.20.9520.4330.9860.4140.9680.421
200−3.10.9600.2930.9680.2850.9620.289−3.60.9540.3030.9680.2950.9630.299
500−1.50.9540.1840.9580.1820.9560.183−1.70.9500.1870.9510.1870.9500.188
1000−0.40.9520.1300.9530.1300.9530.130−0.50.9500.1330.9530.1330.9530.133
κ 1 = 0.8 S e = 0.7969 S p = 0.9707 p = 70 %
λ 1 = 0.70 λ 0 = 0.25
Maximum Likelihood MethodMICE Method
nRelative Bias (%)Wald CILogit CIArcsine CIRelative Bias (%)Wald CILogit CIArcsine CI
CPALCPALCPALCPALCPALCPAL
50−17.90.9900.6460.9870.5960.9940.612−20.20.9870.6820.9780.6400.9760.652
100−9.20.9790.4340.9480.4180.9700.417−10.70.9780.4710.9470.4550.9640.453
200−4.70.9690.2910.9490.2880.9590.285−5.80.9710.3220.9520.3160.9610.313
500−1.80.9610.1790.9640.1800.9640.180−2.10.9610.1860.9540.1870.9570.187
1000−0.60.9590.1230.9520.1220.9560.122−0.70.9570.1340.9510.1340.9540.133
λ 1 = 0.95 λ 0 = 0.40
Maximum Likelihood MethodMICE Method
nRelative Bias (%)Wald CILogit CIArcsine CIRelative Bias (%)Wald CILogit CIArcsine CI
CPALCPALCPALCPALCPALCPAL
50−13.20.9970.5350.9490.5040.9650.509−14.80.9710.5510.9570.5230.9680.527
100−6.90.9730.3490.9460.3410.9610.339−7.50.9680.3630.9440.3550.9630.352
200−3.30.9650.2310.9490.2300.9610.227−3.60.9670.2400.9530.2410.9590.239
500−1.40.9610.1410.9520.1410.9590.140−1.50.9560.1480.9450.1470.9500.147
1000−0.60.9530.0990.9510.0990.9520.099−0.60.9500.1020.9450.1020.9460.102
CP: coverage probability. AL: average length.
Table 5. Coverage probabilities and average lengths of CIs for κ 2 = 0.2 , 0.4 .
Table 5. Coverage probabilities and average lengths of CIs for κ 2 = 0.2 , 0.4 .
κ 2 = 0.2 S e = 0.5904 S p = 0.6901 p = 70 %
λ 1 = 0.70 λ 0 = 0.25
Maximum Likelihood MethodMICE Method
nRelative Bias (%)Wald CILogit CIArcsine CIRelative Bias (%)Wald CILogit CIArcsine CI
CPALCPALCPALCPALCPALCPAL
50−15.80.9840.5970.9760.7440.8830.610−19.40.9810.6160.9780.7950.9470.662
100−7.30.9850.4650.9710.5830.9430.486−10.70.9710.4900.9760.6630.9720.529
200−2.90.9580.3570.9600.4180.9670.365−5.10.9530.3810.9670.5000.9670.401
500−1.70.9450.2410.9600.2480.9630.239−1.90.9490.2600.9630.2810.9620.261
1000−0.70.9490.1720.9680.1730.9550.171−0.80.9500.1800.9600.1850.9570.182
λ 1 = 0.95 λ 0 = 0.40
Maximum Likelihood MethodMICE Method
nRelative Bias (%)Wald CILogit CIArcsine CIRelative Bias (%)Wald CILogit CIArcsine CI
CPALCPALCPALCPALCPALCPAL
50−11.10.9790.5110.9700.6620.9070.530−13.60.9800.5300.9730.7280.9530.577
100−6.10.9710.3950.9680.4850.9600.409−8.20.9660.4120.9720.5600.9670.440
200−2.30.9550.2990.9610.3260.9740.301−4.10.9530.3120.9710.3750.9770.321
500−1.10.9370.1960.9610.1970.9510.194−1.40.9470.2060.9650.2140.9620.206
1000−0.50.9560.1380.9620.1390.9590.138−0.60.9510.1470.9590.1480.9570.146
κ 2 = 0.4 S e = 0.7773 S p = 0.7308 p = 10 %
λ 1 = 0.70 λ 0 = 0.25
Maximum Likelihood MethodMICE Method
nRelative Bias (%)Wald CILogit CIArcsine CIRelative Bias (%)Wald CILogit CIArcsine CI
CPALCPALCPALCPALCPALCPAL
50−36.50.9990.74310.8420.8910.701−38.80.9990.68310.8920.9670.738
100−26.10.9810.63010.6800.9830.631−29.30.9630.59810.7780.9910.649
200−16.30.9640.4960.9980.4930.9950.486−19.20.9430.5160.9950.5970.9970.534
500−7.20.9540.3200.9840.3120.9690.315−9.50.9490.3560.9890.3610.9680.354
1000−3.90.9570.2260.9660.2230.9620.224−4.40.9510.2470.9710.2470.9580.247
λ 1 = 0.95 λ 0 = 0.40
Maximum Likelihood MethodMICE Method
nRelative Bias (%)Wald CILogit CIArcsine CIRelative Bias (%)Wald CILogit CIArcsine CI
CPALCPALCPALCPALCPALCPAL
50−31.50.9990.6840.9990.7620.9840.678−35.30.9990.6521.0000.8470.9760.702
100−19.70.9690.5560.9990.5710.9920.548−23.20.9510.5591.0000.6740.9920.586
200−10.90.9640.4150.9960.4020.9940.404−13.40.9510.4370.9990.4580.9920.437
500−5.30.9540.2610.9700.2560.9620.258−7.60.9520.2780.9780.2770.9660.277
1000−1.90.9450.1840.9540.1820.9490.183−2.30.9510.1930.9610.1910.9590.193
CP: coverage probability. AL: average length.
Table 6. Coverage probabilities and average lengths of CIs for κ 2 = 0.6 , 0.8 .
Table 6. Coverage probabilities and average lengths of CIs for κ 2 = 0.6 , 0.8 .
κ 2 = 0.6 S e = 0.8864 S p = 0.6746 p = 30 %
λ 1 = 0.70 λ 0 = 0.25
Maximum likelihood methodMICE Method
nRelative Bias (%)Wald CIlogit CIArcsine CIRelative Bias (%)Wald CIlogit CIArcsine CI
CPALCPALCPALCPALCPALCPAL
50−32.90.970.79910.740.9870.742−35.60.9730.76710.7940.9680.762
100−18.10.9710.6110.5550.9960.576−21.90.9440.6490.9970.6290.9720.629
200−9.80.9700.4170.9840.3940.9760.404−12.30.9550.4700.9810.4500.9660.458
500−3.80.9600.2540.9660.2480.9640.251−4.90.9560.2780.9600.2780.9580.281
1000−2.20.9450.1770.9490.1760.9490.176−2.90.9480.1870.9510.1870.9490.187
λ 1 = 0.95 λ 0 = 0.40
Maximum likelihood methodMICE Method
nRelative Bias (%)Wald CILogit CIArcsine CIRelative Bias (%)Wald CILogit CIArcsine CI
CPALCPALCPALCPALCPALCPAL
50−22.90.9780.69410.6330.9960.652−26.40.9670.70910.7010.9730.692
100−12.60.9660.4870.9960.4540.9780.468−16.10.9560.5310.9950.5070.9690.514
200−6.20.9670.3310.9760.3190.9690.324−8.80.9590.3600.9720.3480.9710.350
500−2.40.9560.2030.9600.2000.9590.201−3.30.9550.2160.9560.2120.9570.215
1000−1.30.9600.1420.9570.1420.9560.142−1.50.9560.1500.9570.1500.9570.150
κ 2 = 0.8 S e = 0.8644 S p = 0.9817 p = 50 %
λ 1 = 0.70 λ 0 = 0.25
Maximum likelihood methodMICE Method
nRelative Bias (%)Wald CILogit CIArcsine CIRelative Bias (%)Wald CILogit CIArcsine CI
CPALCPALCPALCPALCPALCPAL
50−20.60.9380.6880.9330.6440.9310.662−23.80.9350.7110.9330.6720.9420.695
100−10.60.9560.4950.9120.4920.9420.485−13.10.9490.5310.9220.5250.9410.519
200−5.50.9580.3560.9360.3560.9540.348−7.20.9510.3920.9420.3910.9540.382
500−2.30.9600.2280.9540.2280.9590.228−3.00.9530.2350.9460.2340.9500.233
1000−1.10.9530.1580.9560.1580.9530.157−1.50.9490.1750.9500.1750.9490.174
λ 1 = 0.95 λ 0 = 0.40
Maximum likelihood methodMICE Method
nRelative Bias (%)Wald CILogit CIArcsine CIRelative Bias (%)Wald CILogit CIArcsine CI
CPALCPALCPALCPALCPALCPAL
50−13.80.9570.5670.9150.5530.9380.553−16.10.9650.590.9240.5370.9430.573
100−6.70.9630.3990.9330.4010.9580.392−7.90.9520.4220.9330.4230.9450.418
200−3.60.9430.2850.9350.2850.9420.279−4.40.9470.3010.9370.3020.9430.296
500−1.30.9540.1780.9440.1790.9490.177−1.60.9500.1900.9450.1910.9460.188
1000−0.70.9490.1260.9490.1260.9470.126−0.80.9500.1320.9530.1320.9500.132
CP: coverage probability. AL: average length.
Table 7. Diagnosis of liver disease.
Table 7. Diagnosis of liver disease.
Observed Frequencies of the Study of Drum and Christacopoulos
T = 1 T = 0
V = 1
D = 1 23127
D = 0 3254
V = 0 166140
Total429221
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Roldán-Nofuentes, J.A.; Regad, S.B. Estimation of the Average Kappa Coefficient of a Binary Diagnostic Test in the Presence of Partial Verification. Mathematics 2021, 9, 1694. https://doi.org/10.3390/math9141694

AMA Style

Roldán-Nofuentes JA, Regad SB. Estimation of the Average Kappa Coefficient of a Binary Diagnostic Test in the Presence of Partial Verification. Mathematics. 2021; 9(14):1694. https://doi.org/10.3390/math9141694

Chicago/Turabian Style

Roldán-Nofuentes, José Antonio, and Saad Bouh Regad. 2021. "Estimation of the Average Kappa Coefficient of a Binary Diagnostic Test in the Presence of Partial Verification" Mathematics 9, no. 14: 1694. https://doi.org/10.3390/math9141694

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop