Next Article in Journal
Multidimensional Linear and Nonlinear Partial Integro-Differential Equation in Bessel Potential Spaces with Applications in Option Pricing
Next Article in Special Issue
About the Equivalence of the Latent D-Scoring Model and the Two-Parameter Logistic Item Response Model
Previous Article in Journal
Accuracy of Semi-Analytical and Numerical Approaches in the Evaluation of Serial Bernoulli Production Lines
Previous Article in Special Issue
Self-Perceived Health, Life Satisfaction and Related Factors among Healthcare Professionals and the General Population: Analysis of an Online Survey, with Propensity Score Adjustment
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Confidence Intervals and Sample Size to Compare the Predictive Values of Two Diagnostic Tests

by
José Antonio Roldán-Nofuentes
1,* and
Saad Bouh Regad
2
1
Department of Statistics, School of Medicine, University of Granada, 18016 Granada, Spain
2
Epidemiology and Public Health Research Unit and URMCD, University of Nouakchott Alaasriya, BP 880 Nouakchott, Mauritania
*
Author to whom correspondence should be addressed.
Mathematics 2021, 9(13), 1462; https://doi.org/10.3390/math9131462
Submission received: 11 May 2021 / Revised: 14 June 2021 / Accepted: 18 June 2021 / Published: 22 June 2021

Abstract

:
A binary diagnostic test is a medical test that is applied to an individual in order to determine the presence or the absence of a certain disease and whose result can be positive or negative. A positive result indicates the presence of the disease, and a negative result indicates the absence. Positive and negative predictive values represent the accuracy of a binary diagnostic test when it is applied to a cohort of individuals, and they are measures of the clinical accuracy of the binary diagnostic test. In this manuscript, we study the comparison of the positive (negative) predictive values of two binary diagnostic tests subject to a paired design through confidence intervals. We have studied confidence intervals for the difference and for the ratio of the two positive (negative) predictive values. Simulation experiments have been carried out to study the asymptotic behavior of the confidence intervals, giving some general rules for application. We also study a method to calculate the sample size to compare the parameters using confidence intervals. We have written a program in R to solve the problems studied in this manuscript. The results have been applied to the diagnosis of colorectal cancer.

1. Introduction

A diagnostic test is medical test that is applied to an individual in order to determine the presence of a certain disease. Binary diagnostic tests are a very common type of diagnostic test in clinical practice. A binary diagnostic test (BDT) is a diagnostic test whose possible result is positive or negative. A positive result indicates the presence of the disease, and a negative result indicates the absence. Mammography for the diagnosis of breast cancer is an example of BDT. Accuracy of a BDT is measured in terms of two fundamental parameters: sensitivity and specificity. Sensitivity (Se) is the probability of the result of the BDT being positive when the individual has the disease, and specificity (Sp) is the probability of the result of the BDT being negative when the individual does not have the disease. Therefore, Se and Sp are probabilities of getting the disease diagnosis right, and they represent the intrinsic accuracy of the BDT, since these parameters depend on the physical, chemical, or biological properties upon which the BDT is developed. Other parameters that are used to assess and compare two BDTs are the positive and negative predictive values. Positive predictive value ( τ ) is the probability of an individual having the disease when the result of the BDT is positive, and the negative predictive value ( υ ) is the probability of an individual not having the disease when the result of the BDT is negative. Predictive values represent the accuracy of the diagnostic test when it is applied to a cohort of individuals, and they are measures of the clinical accuracy of the BDT. Predictive values depend on Se, Sp, and on the disease prevalence (p), and are easily calculated applying Bayes theorem, i.e.,
τ = p × S e p × S e + ( 1 p ) × ( 1 S p )   a n d   υ = ( 1 p ) × S p p × ( 1 S e ) + ( 1 p ) × S p .
The accuracy of a BDT is assessed in relation to a gold standard. A gold standard (GS) is a medical test that determines without error whether or not an individual has the disease. Biopsy for the diagnosis of breast cancer is an example of GS.
On the other hand, the comparison of parameters of two BDTs is an important topic in the study of statistical methods for diagnosis in Medicine. The most frequent sample design to compare the parameters of two BDTs is the paired design. The paired design consists of applying the two BDTs and the GS to all individuals of a random sample sized n. The comparison of the predictive values of two BDTs subject to a paired design has been the subject of different studies. Bennett [1,2], Leisenring et al. [3], Wang et al. [4], Kosinski [5], Tsou [6], and Takahashi and Yamamoto [7] have studied hypothesis tests to compare the two positive predictive values and the negative predictive values independently. Roldán-Nofuentes et al. [8] studied a global hypothesis test to simultaneously compare the positive and negative predictive values of two BDTs. However, the comparison of predictive values using confidence intervals has been little studied. If the hypothesis test is significant to an α error, the confidence interval (CI) allows determining how much one predictive value is greater than the other. Moskowitz and Pepe [9] have proposed a Wald-type CI for the ratio of the two positive (negative) predictive values. A Wald-type CI for the difference of the two positive (negative) predictive values is easily obtained by inverting the contrast statistic of the hypothesis test studied by Wang et al. [4].
The objective of this manuscript is study CIs to compare the positive (negative) predictive values of two BDTs subject to a paired design. For this, we have studied CIs for the difference and for the ratio of the two positive (negative) predictive values. If a CI for the difference (ratio) does not contain the zero (one) value, then we reject the equality between the two positive (negative) predictive values, and we estimate how much bigger one predictive value is than another one. The problem of calculating the sample size to compare the two positive (negative) predictive values through a CI is also studied.
This manuscript is structured in the following way. In Section 2, the existing CIs are presented and other new CIs are proposed, both for the ratio and for the difference between the positive (negative) predictive values. In Section 3, simulation experiments are carried out to study the coverage probabilities and the average lengths of the CIs. In Section 4, a method to calculate the sample size to compare the parameters through CIs is proposed. In Section 5, we present the program called “cicpvbdt”, which is a program written in R that solves the problems studied in this manuscript. In Section 6, the results were applied to an example on the diagnosis of colorectal cancer, and in Section 7, the results are discussed.

2. Confidence Intervals

Let us consider two BDTs that are assessed in relation to the same GS. Let T i be the variable that models the result of the ith BDT, with i = 1 , 2 . T i = 0 indicates that the test result is negative, and T i = 1 indicates that the test result is positive. Let D be the random variable that models the result of the GS, so that D = 1 when the individual is diseased and D = 0 when the individual is non-diseased. Let S e i and S p i be the sensitivity and specificity of the ith BDT. Table 1 shows the observed frequencies obtained subject to a paired design. The frequencies s j k and r j k are the product of a multinomial distribution whose probabilities are p j k = P ( D = 1 , T 1 = j , T 2 = k ) and q j k = P ( D = 0 , T 1 = j , T 2 = k ) , with j , k = 0 , 1 . Applying the conditional dependence model of Vacek [10], probabilities p j k and q j k are written as
p j k = p [ S e 1 j ( 1 S e 1 ) 1 j S e 2 k ( 1 S e 2 ) 1 k + δ j k ε 1 ]
and
q j k = q [ S p 1 1 j ( 1 S p 1 ) j S p 2 1 k ( 1 S p 2 ) k + δ j k ε 0 ] ,
where δ j k = 1 if j = k and δ j k = 1 if j k , with j , k = 0 , 1 , ε 1 is the dependence factor between the two BDTs when D = 1 and ε 0 is the dependence factor between the two BDTs when D = 0 . It is verified that 0 ε 1 M i n { S e 1 ( 1 S e 2 ) , S e 2 ( 1 S e 1 ) } and 0 ε 0 M i n { S p 1 ( 1 S p 2 ) , S p 2 ( 1 S p 1 ) } . If ε 1 = ε 0 = 0 , then the two BDTs are conditionally independent on the disease. This assumption is not realistic, so in practice, it is verified that ε 1 > 0 and/or ε 0 > 0 . Let π = ( p 11 , p 10 , p 01 , p 00 , q 11 , q 10 , q 01 , q 00 ) T be the vector of probabilities of the multinomial distribution, p = i , j = 0 1 p i j and q = 1 p = i , j = 0 1 q i j . The maximum likelihood estimators of p j k and q j k are:
p ^ j k = s j k n   a n d   q ^ j k = r j k n .
From Equation (1), the sensitivity and specificity of each BDT are written, in terms of the predictive values and of p, as
S e i = τ i ( υ i q ) p Y i   a n d   S p i = υ i ( τ i p ) q Y i
where q = 1 p and Y i = τ i + υ i 1 . Then
0 ε 1 M i n { τ 1 ( υ 1 q ) p Y 1 ( 1 τ 2 ( υ 2 q ) p Y 2 ) , τ 2 ( υ 2 q ) p Y 2 ( 1 τ 1 ( υ 1 q ) p Y 1 ) }
and
0 ε 0 M i n { υ 1 ( τ 1 p ) q Y 1 ( 1 υ 2 ( τ 2 p ) q Y 2 ) , υ 2 ( τ 2 p ) q Y 2 ( 1 υ 1 ( τ 1 p ) q Y 1 ) } .
In terms of predictive values, Equations (2) and (3) are written as
p j k = p [ τ 1 j ( υ 1 q ) j ( τ 1 + p ) 1 j ( 1 υ 1 ) 1 j p j p 1 j Y 1 j Y 1 1 j × τ 2 k ( υ 2 q ) k ( τ 2 + p ) 1 k ( 1 υ 2 ) 1 k p k p 1 k Y 2 k Y 1 1 k + δ j k ε 1 ]
and
q j k = q [ ( 1 τ 1 ) j ( υ 1 q ) j ( τ 1 p ) 1 j υ 1 1 j q j q 1 j Y 1 j Y 1 1 j × ( 1 τ 2 ) k ( υ 2 q ) k ( τ 2 p ) 1 k υ 2 1 k q k q 1 k Y 2 k Y 2 1 k + δ j k ε 0 ] .
The estimator of sensitivities and specificities are
S ^ e 1 = s 11 + s 10 s ,   S ^ e 2 = s 11 + s 01 s ,   S ^ p 1 = r 01 + r 00 r   and   S ^ p 2 = r 10 + r 00 r ,
and applying the delta method, the estimators of the variances-covariances of S ^ e i and S ^ p i are
V ^ a r ( S ^ e i ) = S ^ e i ( 1 S ^ e i ) s ,   V a r ( S ^ p i ) = S ^ p i ( 1 S ^ p i ) r , C ^ o v ( S ^ e 1 , S ^ e 2 ) = ε ^ 1 s   a n d   C ^ o v ( S ^ p 1 , S ^ p 2 ) = ε ^ 0 r ,
where ε ^ 1 = n p ^ 11 s S ^ e 1 S ^ e 2 = s 11 s 00 s 10 s 01 s 2 and ε ^ 0 = n q ^ 00 r S ^ p 1 S ^ p 2 = r 11 r 00 r 10 r 01 r 2 , and the estimator of the disease prevalence is
p ^ = s n .
Let Q i = p S e i + q ( 1 S p i ) be the probability that the result of the ith BDT is positive and let Q ¯ i = 1 Q i = p ( 1 S e i ) + q S p i be the probability that the result is negative. Its estimators are:
Q ^ 1 = s 10 + s 11 + r 10 + r 11 n   and   Q ^ 2 = s 01 + s 11 + r 01 + r 11 n ,
and
Q ¯ ^ 1 = s 01 + s 00 + r 01 + r 00 n   and   Q ¯ ^ 2 = s 10 + s 00 + r 10 + r 00 n ,
respectively. With respect to the predictive values, their estimators are:
τ ^ 1 = s 11 + s 10 s 11 + s 10 + r 11 + r 10 ,   τ ^ 2 = s 11 + s 01 s 11 + s 01 + r 11 + r 01 ,   υ ^ 1 = r 01 + r 00 s 01 + s 00 + r 01 + r 00
and
υ ^ 2 = r 10 + r 00 s 10 + s 00 + r 10 + r 00 .
Applying the delta method, the estimators of the variances–covariances of τ ^ i and υ ^ i are [8]:
V ^ a r ( τ ^ 1 ) = ( s 10 + s 11 ) ( r 10 + r 11 ) ( s 10 + s 11 + r 10 + r 11 ) 3 ,   V ^ a r ( υ ^ 1 ) = ( s 00 + s 01 ) ( r 00 + r 01 ) ( s 00 + s 01 + r 00 + r 01 ) 3 V ^ a r ( τ ^ 2 ) = ( s 01 + s 11 ) ( r 01 + r 11 ) ( s 01 + s 11 + r 01 + r 11 ) 3 ,   V ^ a r ( υ ^ 2 ) = ( s 00 + s 10 ) ( r 00 + r 10 ) ( s 00 + s 10 + r 00 + r 10 ) 3 , C ^ o v ( τ ^ 1 , τ ^ 2 ) = s 11 r 10 r 01 + r 11 [ s 01 ( s 10 + s 11 ) + s 11 ( s 11 + s 10 + r 11 + r 10 + r 01 ) ] ( s 01 + s 11 + r 01 + r 11 ) 2 ( s 10 + s 11 + r 10 + r 11 ) 2
and
C ^ o v ( υ ^ 1 , υ ^ 2 ) = s 10 ( s 00 + s 01 ) r 00 + s 00 [ r 00 2 + r 01 r 10 + r 00 ( s 00 + s 01 + r 10 + r 01 ) ] ( s 00 + s 01 + r 00 + r 01 ) 2 ( s 00 + s 10 + r 00 + r 10 ) 2 .
When two parameters are compared in Statistics, the interest is to study the difference or the ratio between them. Then, we compare the positive (negative) predictive values of two BDTs through CIs for the difference, i.e., δ τ = τ 1 τ 2 and δ υ = υ 1 υ 2 , and for the ratio, i.e., ρ τ = τ 1 / τ 2 and ρ υ = υ 1 / υ 2 .

2.1. CIs for the Difference

Three CIs for each difference δ τ and δ υ are studied: Wald CI, bias-corrected bootstrap CI, and Monte Carlo Bayesian CI.

2.1.1. Wald CI

Wang et al. [4] have studied the comparison of the PVs of two BDTs through the weighted least square method. The test statistics for H 0 : τ 1 = τ 2 and H 0 : υ 1 = υ 2 are
z τ = δ ^ τ V ^ a r ( δ ^ τ )   a n d   z υ = δ ^ υ V ^ a r ( δ ^ υ ) ,
respectively. Both test statistics follow a standard normal distribution where
V ^ a r ( δ ^ τ ) = V ^ a r ( τ ^ 1 ) + V ^ a r ( τ ^ 2 ) 2 C ^ o v ( τ ^ 1 , τ ^ 2 )
and
V ^ a r ( δ ^ υ ) = V ^ a r ( υ ^ 1 ) + V ^ a r ( υ ^ 2 ) 2 C ^ o v ( υ ^ 1 , υ ^ 2 )
are the estimators of the variances of δ ^ τ and δ ^ υ , respectively. Inverting the two test statistics, the Wald CIs for δ τ and for δ υ are
δ τ δ ^ τ ± z 1 α / 2 V ^ a r ( δ ^ τ )   a n d   δ υ δ ^ υ ± z 1 α / 2 V ^ a r ( δ ^ υ ) ,
respectively, where z 1 α / 2 is the 100 ( 1 α / 2 ) th percentile of the standard normal distribution.

2.1.2. Bias-Corrected Bootstrap CI

The bias-corrected bootstrap CI is calculated from B random samples with replacement generated from the sample of n individuals. In each of the B samples, we calculate τ ^ 1 b , τ ^ 2 b , υ ^ 1 b , υ ^ 2 b , δ ^ τ b = τ ^ 1 b τ ^ 2 b and δ ^ υ b = υ ^ 1 b υ ^ 2 b , with b = 1 , , B . Then, the average differences are calculated as δ ¯ ^ τ B = 1 B b = 1 B δ ^ τ b and δ ¯ ^ υ B = 1 B b = 1 B δ ^ τ b . Assuming that the bootstrap statistics δ ¯ ^ τ B and δ ¯ ^ υ B can be transformed to a normal distribution, the bias-corrected bootstrap CIs [11] are calculated in the following way. Let A τ = # ( δ ^ τ b < δ ^ τ ) be the number of bootstrap estimators δ ^ τ b that are lower than the maximum likelihood estimator (MLE) δ ^ τ , and let A υ = # ( δ ^ υ b < δ ^ υ ) be the number of bootstrap estimators δ ^ υ b that are lower than the MLE δ ^ υ . Let z ^ τ = Φ 1 ( A τ / B ) and z ^ υ = Φ 1 ( A υ / B ) , where Φ 1 ( ) is the inverse function of the standard normal cumulative distribution function. Let α 1 τ = Φ ( 2 z ^ τ z 1 α / 2 ) , α 2 τ = Φ ( 2 z ^ τ + z 1 α / 2 ) , α 1 υ = Φ ( 2 z ^ υ z 1 α / 2 ) , and α 2 υ = Φ ( 2 z ^ υ + z 1 α / 2 ) ; then, the bias-corrected bootstrap CI for δ τ is
δ τ ( δ ^ τ B ( α 1 τ ) , δ ^ τ B ( α 2 τ ) )
and the bias-corrected bootstrap CI for δ υ is
δ υ ( δ ^ υ B ( α 1 υ ) , δ ^ υ B ( α 2 υ ) )
where δ ^ τ B ( γ ) is the γ th quantile of the distribution of the B bootstrap estimations of δ τ , and δ ^ υ B ( γ ) is the γ th quantile of the distribution of the B bootstrap estimations of δ υ .

2.1.3. Monte Carlo Bayesian CI

The number of diseased individuals ( s ) is the product of a binomial distribution, i.e., s B ( n , p ) . Conditioning on D = 1 , it is verified that
s 11 + s 10 B ( s , S e 1 )   a n d   s 11 + s 01 B ( s , S e 2 ) .
The number of non-diseased individuals ( r ) is the product of a binomial distribution, i.e., r B ( n , q ) . Conditioning on D = 0 , it is verified that
r 01 + r 00 B ( r , S p 1 )   a n d   r 10 + r 00 B ( r , S p 2 ) .
On the other hand, the estimators S ^ e i , S ^ p i , and p ^ (Equations (9) and (10)) are estimators of binomial proportions. Therefore, for these estimators we propose conjugate beta prior distributions, i.e.,
S ^ e i B e t a ( α S e i , β S e i ) ,   S ^ p i B e t a ( α S p i , β S p i )   a n d   p ^ B e t a ( α p , β p ) .
Let n = ( s 11 , s 10 , s 01 , s , r 11 , r 10 , r 01 , n s ) be the vector of observed frequencies, with s 00 = s s 11 s 10 s 01 , r = n s , and r 00 = n s r 11 r 10 r 01 . Then, the posteriori distributions for the estimators of S e i , S p i , and p are:
S ^ e 1 | n B e t a ( s 11 + s 10 + α S e 1 , s s 11 s 10 + β S e 1 ) , S ^ p 1 | n B e t a ( r 01 + r 00 + α S p 1 , n s r 01 r 00 + β S p 1 ) , S ^ e 2 | n B e t a ( s 11 + s 01 + α S e 2 , s s 11 s 01 + β S e 2 ) , S ^ p 2 | n B e t a ( r 10 + r 00 + α S p 2 , n s r 10 r 00 + β S p 2 ) , p ^ | n B e t a ( s + α p , n s + β p ) .
The posteriori distribution for the positive (negative) predictive value of each BDT, and for δ τ and δ υ , can be approximated applying the Monte Carlo method. The Monte Carlo method is a computational method that consists of generating M values from the posteriori distributions (12). In the mth iteration, the values generated for S e j ( m ) , S p j ( m ) , and p ( m ) are plugged in the equations
τ i ( m ) = p ( m ) × S e i ( m ) p ( m ) × S e i ( m ) + ( 1 p ( m ) ) × ( 1 S p i ( m ) )
and
υ i ( m ) = ( 1 p ( m ) ) × S p i ( m ) p ( m ) × ( 1 S e i ( m ) ) + ( 1 p ( m ) ) × S p i ( m ) ,
with i = 1 , 2 , and then, δ τ ( m ) = τ 1 ( m ) τ 2 ( m ) and δ υ ( m ) = υ 1 ( m ) υ 2 ( m ) are calculated. As estimators of δ τ and δ υ , we calculate the average of the M estimations of the differences, i.e., δ ¯ ^ τ B a y = 1 M m = 1 M δ τ ( m ) and δ ¯ ^ υ B a y = 1 M m = 1 M δ υ ( m ) . Finally, based on the M values of δ τ ( m ) and of δ υ ( m ) , we propose CIs based on quantiles. Therefore, the 100 × ( 1 α ) % CI for δ τ is
δ τ ( q τ B a y ( α / 2 ) , q τ B a y ( 1 α / 2 ) )
and the 100 × ( 1 α ) % CI for δ υ is
δ υ ( q υ B a y ( α / 2 ) , q υ B a y ( 1 α / 2 ) )
where q τ B a y ( γ )   ( q υ B a y ( γ ) ) is the γ th quantile of the distribution of the M values δ τ ( m )   ( δ υ ( m ) ) .

2.2. CIs for the Ratio

Five CIs for each ratio ρ τ = τ 1 / τ 2 and ρ υ = υ 1 / υ 2 are studied: Wald CI, logarithmic CI, Fieller CI, bias-corrected bootstrap CI, and Monte Carlo Bayesian CI.

2.2.1. Wald CI

Moskowitz and Pepe [9] have studied a Wald-type confidence interval for the ratio of the two positive (negative) predictive values. The 100 × ( 1 α ) % Wald CI for δ τ is
ρ τ ρ ^ τ ± z 1 α / 2 V ^ a r ( ρ ^ τ )
and the 100 × ( 1 α ) % Wald CI for δ υ is
ρ υ ρ ^ υ ± z 1 α / 2 V ^ a r ( ρ ^ υ ) ,
where V ^ a r ( ρ ^ τ ) and V ^ a r ( ρ ^ υ ) , obtained applying the delta method, are
V ^ a r ( ρ ^ τ ) = τ ^ 2 2 V ^ a r ( τ ^ 1 ) + τ ^ 1 2 V ^ a r ( τ ^ 2 ) 2 τ ^ 1 τ ^ 2 C ^ o v ( τ ^ 1 , τ ^ 2 ) τ ^ 2 4
and
V ^ a r ( ρ ^ υ ) = υ ^ 2 2 V ^ a r ( υ ^ 1 ) + υ ^ 1 2 V ^ a r ( υ ^ 2 ) 2 υ ^ 1 υ ^ 2 C ^ o v ( υ ^ 1 , υ ^ 2 ) υ ^ 2 4 .
These CIs are for ρ τ = τ 1 / τ 2 and ρ υ = υ 1 / υ 2 . If we want to calculate the CI for the ratio τ 2 / τ 1 = 1 / ρ τ and for the ratio υ 2 / υ 1 = 1 / ρ υ , then we have to divide the CI for ρ τ by ρ ^ τ 2 and the CI for ρ υ by ρ ^ υ 2 . For example, if ( L τ , U τ ) is the Wald CI for τ 1 / τ 2 , then ( L τ / ρ ^ τ 2 , U τ / ρ ^ τ 2 ) is the Wald CI for τ 2 / τ 1 .

2.2.2. Logarithmic CI

Assuming the asymptotic normality of the Napierian logarithm of ρ ^ τ and of ρ ^ υ , i.e., l n ( ρ ^ τ ) N ( l n ( δ τ ) , V a r [ l n ( δ τ ) ] ) and l n ( ρ ^ υ ) N ( l n ( δ υ ) , V a r [ l n ( δ υ ) ] ) when n is large, an asymptotic CI for l n ( ρ τ ) is
l n ( ρ ^ τ ) ± z 1 α / 2 V ^ a r [ l n ( ρ ^ τ ) ]
and an asymptotic CI for l n ( ρ υ ) is
l n ( ρ ^ υ ) ± z 1 α / 2 V ^ a r [ l n ( ρ ^ υ ) ] .
Taking exponential in each of the previous expressions, the logarithmic CI for δ τ is
ρ τ ρ ^ τ × e x p { ± z 1 α / 2 V ^ a r [ l n ( ρ ^ τ ) ] }
and the logarithmic CI for δ υ is
ρ υ ρ ^ υ × e x p { ± z 1 α / 2 V ^ a r [ l n ( ρ ^ υ ) ] } ,
where V ^ a r [ l n ( ρ ^ τ ) ] and V ^ a r [ l n ( ρ ^ υ ) ] , obtained applying the delta method, are:
V ^ a r [ l n ( ρ ^ τ ) ] = V ^ a r [ τ ^ 1 ] τ ^ 1 2 + V ^ a r [ τ ^ 2 ] τ ^ 2 2 2 C ^ o v [ τ ^ 1 , τ ^ 2 ] τ ^ 1 τ ^ 2
and
V ^ a r [ l n ( ρ ^ υ ) ] = V ^ a r [ υ ^ 1 ] υ ^ 1 2 + V ^ a r [ υ ^ 2 ] υ ^ 2 2 2 C ^ o v [ υ ^ 1 , υ ^ 2 ] υ ^ 1 υ ^ 2 .
If we want to calculate the logarithmic CI for the ratio τ 2 / τ 1 , then the CI is obtained by calculating the inverse of each boundary of CI for ρ τ = τ 1 / τ 2 . In a similar way, the CI for υ 2 / υ 1 is calculated.

2.2.3. Fieller CI

The method of Fieller [12] is a classic method used to estimate the ratio of two parameters. In order to apply this method, it is necessary to assume that τ ^ T N ( τ , τ ) and that υ ^ T N ( υ , υ ) ; i.e., it is necessary to assume that the estimators of the positive (negative) predictive values are distributed according to a normal bivariate distribution, and where τ ^ = ( τ ^ 1 , τ ^ 2 ) , υ ^ = ( υ ^ 1 , υ ^ 2 ) ,
τ = ( V a r ( τ 1 ) C o v ( τ 1 , τ 2 ) C o v ( τ 1 , τ 2 ) V a r ( τ 2 ) ) = ( σ τ 11 σ τ 12 σ τ 21 σ τ 22 )
and
υ = ( V a r ( υ 1 ) C o v ( υ 1 , υ 2 ) C o v ( υ 1 , υ 2 ) V a r ( υ 2 ) ) = ( σ υ 11 σ υ 12 σ υ 21 σ υ 22 ) .
Applying the method of Fieller, it is verified that τ ^ 1 ρ τ τ ^ 2 N ( 0 , σ τ 11 + ρ τ 2 σ τ 22 2 ρ τ σ τ 12 ) and that υ ^ 1 ρ υ υ ^ 2 N ( 0 , σ υ 11 + ρ υ 2 σ υ 22 2 ρ υ σ υ 12 ) when n is large. The Fieller CI for ρ τ is obtained by solving the inequality
( τ ^ 1 ρ τ τ ^ 2 ) 2 σ ^ τ 11 + ρ τ 2 σ ^ τ 22 2 ρ τ σ ^ τ 12 < z 1 α / 2 2 ,
and the Fieller CI for ρ υ is obtained by solving the inequality
( υ ^ 1 ρ υ υ ^ 2 ) 2 σ ^ υ 11 + ρ υ 2 σ ^ υ 22 2 ρ υ σ ^ υ 12 < z 1 α / 2 2 .
Finally, the Fieller CI for ρ τ is
ρ τ β ^ τ 12 ± β ^ τ 12 2 β ^ τ 11 β ^ τ 22 β ^ τ 22
where β ^ τ i j = τ ^ i τ ^ j σ ^ τ i j z 1 α / 2 2 with i , j = 1 , 2 and verifying that β ^ τ 12 = β ^ τ 21 . This CI is valid when β ^ τ 12 2 > β ^ τ 11 β ^ τ 22 and β ^ τ 22 0 . Similarly, the Fieller CI for ρ υ is
ρ υ β ^ υ 12 ± β ^ υ 12 2 β ^ υ 11 β ^ υ 22 β ^ υ 22
where β ^ υ i j = υ ^ i υ ^ j σ ^ υ i j z 1 α / 2 2 with i , j = 1 , 2 , and β ^ υ 12 = β ^ υ 21 . This CI is valid when β ^ υ 12 2 > β ^ υ 11 β ^ υ 22 and β ^ υ 22 0 . The Fieller CI for τ 2 / τ 1   ( υ 2 / υ 1 ) is calculated by inverting the limits of the CI for ρ τ   ( ρ υ ) .

2.2.4. Bias-Corrected Bootstrap CI

The bias-corrected bootstrap CI for ρ τ   ( ρ υ ) is obtained in a similar way to that of δ τ   ( δ υ ) . In each sample with a replacement obtained, we calculate τ ^ 1 b , τ ^ 2 b , υ ^ 1 b , υ ^ 2 b , ρ ^ τ b = τ ^ 1 b / τ ^ 2 b , and ρ ^ υ b = υ ^ 1 b / υ ^ 2 b , with b = 1 , , B . Then, based on the B ratios, we estimate the average ratios as ρ ¯ ^ τ B = 1 B b = 1 B ρ ^ τ b and ρ ¯ ^ υ B = 1 B b = 1 B ρ ^ υ b . Assuming that these statistics can be transformed to a normal distribution, the bias-corrected bootstrap CI [11] for ρ τ   ( ρ υ ) is calculated in a similar way as the bias-corrected bootstrap CI for δ τ   ( δ υ ) , considering that A τ = # ( ρ ^ τ b < ρ ^ τ ) and that A υ = # ( ρ ^ υ b < ρ ^ υ ) . Finally, the bias-corrected bootstrap CI for ρ τ is
ρ τ ( ρ ^ τ B ( α 1 ) , ρ ^ τ B ( α 2 ) ) ,
where ρ ^ τ B ( γ ) is the γ th quantile of the distribution of the B bootstrap estimations of ρ τ . Similarly, the bias-corrected bootstrap CI for ρ υ is
ρ υ ( ρ ^ υ B ( α 1 ) , ρ ^ υ B ( α 2 ) ) ,
where ρ ^ υ B ( γ ) is the γ th quantile of the distribution of the B bootstrap estimations of ρ υ . The bias-corrected bootstrap CI for τ 2 / τ 1   ( υ 2 / υ 1 ) is calculated by inverting the limits of the bias-corrected bootstrap CI for ρ τ   ( ρ υ ) .

2.2.5. Monte Carlo Bayesian CI

The Monte Carlo Bayesian CI for ρ τ   ( ρ υ ) is obtained in a similar way as the Monte Carlo Bayesian CI for δ τ   ( δ υ ) . Considering the same distributions (10) and (11), in the mth iteration, we calculate the ratios ρ τ ( m ) = τ 1 ( m ) / τ 2 ( m ) and ρ υ ( m ) = υ 1 ( m ) / υ 2 ( m ) . As estimators, we calculate ρ ¯ ^ τ B a y = 1 M m = 1 M ρ τ ( m ) and ρ ¯ ^ υ B a y = 1 M m = 1 M ρ υ ( m ) . Finally, we calculate the CIs based on quantiles, i.e.,
ρ τ ( q τ B a y ( α / 2 ) , q τ B a y ( 1 α / 2 ) )   a n d   ρ υ ( q υ B a y ( α / 2 ) , q υ B a y ( 1 α / 2 ) )
where q τ B a y ( γ )   ( q υ B a y ( γ ) ) is the γ th quantile of the distribution of the M values ρ τ ( m )   ( ρ υ ( m ) ) . The Monte Carlo Bayesian CI for τ 2 / τ 1   ( υ 2 / υ 1 ) is calculated by inverting the limits of the Monte Carlo Bayesian CI for ρ τ   ( ρ υ ) .

3. Simulation Experiments

The CIs studied in Section 2 are approximate, and therefore, it is necessary to study their asymptotic behaviors. For this, Monte Carlo simulation experiments have been carried out to study the coverage probabilities and the average lengths of the CIs studied, considering a confidence level of 95%. These experiments have consisted of generating N = 10 , 000 random samples with multinomial distribution sized n = { 50 , 100 , 200 , 500 , 1000 } , and whose probabilities have been calculated from Equations (7) and (8). The experiments have been designed from the predictive values of both BDTs. As the value of disease prevalence, we have taken p = { 10 % , 25 % , 50 % , 75 % } , and as values of predictive values, we have taken the values τ i , υ i = { 0.70 , 0.75 , , 0.90 , 0.95 } , which are realistic values in clinical practice. Next, using these values, we have calculated the maximum values of the dependence factors ε 1 and ε 0 (Equations (5) and (6)). As values of ε 1 and ε 0 , we have taken intermediate and high values, i.e., 50% of the maximum value of ε i and 90% of the maximum value of ε i , respectively. Finally, we have calculated the probabilities of the multinomial distributions using Equations (7) and (8). In each scenario, we have calculated all the CIs for each of the N random samples.
For the bias-corrected bootstrap CIs, for each one of the N random samples, B = 2000 samples with replacement have been generated, and from these B samples, the bias-corrected bootstrap CIs have been calculated.
For the Monte Carlo Bayesian CI, we have considered a B e t a ( 1 , 1 ) distribution as a priori distribution for the estimators of sensitivities, specificities and prevalence. The B e t a ( 1 , 1 ) distribution is a non-informative distribution, which is flat for every possible value of the sensitivities, specificities, and prevalence. Therefore, the impact of the B e t a ( 1 , 1 ) distribution on the posteriori distributions is minimal. Moreover, for each one of the N random samples, M = 10 , 000 random samples have been generated, and the Monte Carlo Bayesian CIs have been calculated from them.
In each of the N samples generated, all the CIs have been calculated. Furthermore, it has been checked whether each CI contains the value of the parameter (difference or ratio, depending on the type of CI). The coverage probability has been calculated by dividing the number of samples in which the CI contains the parameter by the total number of samples. For each CI, its length (upper limit minus the lower limit) has also been calculated, and finally, the average length of each CI has been calculated.

3.1. CIs for the Differences and Ratios of Positive Predictive Values

Table 2 shows some of the results obtained for the three CIs for the difference δ τ for four different scenarios and for intermediate values of ε 1 and ε 0 . When the sample size is small ( n = 50 ) or moderate ( n = 100 ) , the CIs for δ τ have a coverage probability close to 1. For the difference δ τ , in very general terms, the Wald CI is the interval that has a coverage probability with better fluctuations around 95%, especially when n is moderate or large ( n 200 ) . The bias-corrected bootstrap CI has a very similar behavior to the Wald CI, especially when the sample size is large. In general terms, the Monte Carlo Bayesian CI has a coverage probability greater than that of the other two intervals, even when the coverage probability of the other two intervals fluctuates around 95%.
Regarding the CIs for the ratio ρ τ , Table 3 shows the results obtained for the same scenarios as in Table 2. When the sample size is small ( n = 50 ) , the five CIs for ρ τ have a coverage probability close to 1. In general terms, there is not an important difference between the coverage probabilities and the average lengths of the Wald, logarithm, and Fieller CIs, especially when n 100 . When the sample size is small, the logarithmic CI and the Fieller CI have an average length slightly greater than the Wald CI. The bias-corrected bootstrap CI has a very similar behavior to the Wald, logarithmic, and Fieller CIs, especially when the sample size is large. In general terms, the Monte Carlo Bayesian CI has a coverage probability greater than that of the other four intervals.

3.2. CIs for the Differences and Ratios of Negative Predictive Values

Table 4 shows the results for the three CIs for the difference δ υ for the same scenarios as in Table 2 and Table 3. In general terms, when the sample size is small, the three CIs have a coverage probability close to 1, although in some situations, the bias-corrected bootstrap CI may have a coverage probability well below 95%. In general terms, the Monte Carlo Bayesian CI has a coverage probability that almost always fluctuates above 95%. For the difference δ υ , the Wald CI is the interval that has a coverage probability with better fluctuations around 95%, especially when the sample size is moderate or large.
Regarding the CIs for the ratio ρ υ , Table 5 shows the results obtained for the same scenarios as in Table 4. When the sample size is small ( n = 50 ) , the five CIs for ρ τ fail or have a coverage probability close to 1. In general terms, the conclusions for the bias-corrected bootstrap CI and for the Monte Carlo Bayesian CI are very similar to those obtained for the corresponding intervals for the difference δ υ . With respect to the other intervals, there is not an important difference between the coverage probabilities and the average lengths of the Wald, logarithm, and Fieller CIs, especially when n 100 . When the sample size is small, the logarithmic CI and the Fieller CI have an average length slightly greater than that of the Wald CI.
Similar conclusions are obtained when ε 1 and ε 0 take high values. Therefore, the dependency factors ε 1 and ε 0 do not have an important effect on the behavior of the CIs for the difference (ratio) of the two negative predictive values.
As a conclusion, the following general rules of application can be given depending on the sample size, since the sample size is the only parameter controlled by the researcher: (a) apply the Wald CI for the difference of the positive (negative) predictive values whatever the sample size; (b) apply the Wald CI for the ratio of the two positive (negative) predictive values when the sample size is small, and apply the Wald CI, the logarithmic CI, the Fieller CI, or the bias-corrected bootstrap CI when the sample size is moderate or high.
Once some general rules of application have been established, what is better: to use a CI for the difference or a CI for the ratio? Simulation experiments have shown that the Wald CIs for the difference and the Wald CIs for the ratio have a very similar coverage probability. Furthermore, the Wald CI for the difference has a coverage probability very similar to that of the Fieller CI when the sample size is large. The Wald CIs for the difference are obtained by inverting the Wald test statistics of the tests H 0 : τ 1 τ 2 = 0 and H 0 : υ 1 υ 2 = 0 , and the Wald CIs for the ratio are obtained by inverting the Wald test statistics of the tests H 0 : τ 1 / τ 2 = 1 and H 0 : υ 1 / υ 2 = 1 . Wang et al. [4] have shown through simulation experiments that the hypothesis tests H 0 : τ 1 τ 2 = 0 and H 0 : υ 1 υ 2 = 0 have better asymptotic behavior than the tests H 0 : τ 1 / τ 2 = 1 and H 0 : υ 1 / υ 2 = 1 . Furthermore, Wang et al. recommend using the difference-based approach as it is more straightforward and more understandable for researchers. Therefore, we recommend using a CI for the difference instead of a CI for the ratio.

4. Sample Size

The calculation of the sample size to compare parameters has great interest in Statistics. Next, we propose a procedure to determine the sample size necessary to estimate the difference between the two positive (negative) predictive values with a precision ϕ τ   ( ϕ υ ) and a confidence 100 ( 1 α ) % . This procedure is based on the Wald CI for the difference δ τ   ( δ υ ) , since in general terms, this CI is the interval with the best asymptotic behavior. This procedure requires having a pilot sample (or another study) in order to estimate the predictive values and their differences. If the pilot sample is not small and the Wald CI for the difference δ τ   ( δ υ ) contains the value 0, then the null hypothesis of equality of the predictive values is not rejected, and it does not make sense to calculate the sample size. However, if the sample is small, it may be necessary to calculate the sample size, since the Wald CI will be very wide and may contain the value 0 even if the predictive values are different. Let us considerer that τ 1 τ 2   ( υ 1 υ 2 ) and therefore δ τ 0   ( δ υ 0 ) , and let ϕ τ and ϕ υ be the precisions set by the researcher ( ϕ must be a small value if the researcher wants high precision). Based on the asymptotic normality of δ ^ τ = τ ^ 1 τ ^ 1 and of δ ^ υ = υ ^ 1 υ ^ 2 , it is verified that
δ ^ τ δ τ ± z 1 α / 2 V a r ( δ ^ τ )   and   δ ^ υ δ υ ± z 1 α / 2 V a r ( δ ^ υ ) ,
i.e., the probability of obtaining an estimator δ ^ τ   ( δ ^ υ ) is in this interval with a probability 100 ( 1 α ) % .
For positive predictive values, the method is as follows. Setting a precision ϕ τ , the sample size is calculated from the equation
ϕ τ = z 1 α / 2 V a r ( δ ^ τ ) ,
where the variance is
V a r ( δ ^ τ ) = p q Q 2 τ 1 τ ¯ 1 + p q Q 1 τ 2 τ ¯ 2 2 ( p q 2 τ 1 τ 2 ε 0 + p 2 q τ ¯ 1 τ ¯ 2 ε 1 + τ 1 τ 2 τ ¯ 1 τ ¯ 2 Q 1 Q 2 ) n p q Q 1 Q 2 .
The proof can be seen in the Appendix A. This variance depends on the positive predictive values ( τ i ) , on the disease prevalence ( p ) , on the probability of a positive result of each test ( Q i ) , on the dependency factors ( ε i ) , and on the sample size n. Substituting in Equation (13) the parameters with their estimators and clearing n, the sample size to estimate the difference δ τ with precision ϕ τ and a confidence 100 ( 1 α ) % is
n τ = n = z 1 α / 2 2 ϕ τ 2 × p ^ q ^ τ ^ 1 τ ¯ ^ 1 Q ^ 2 + p ^ q ^ τ ^ 2 τ ¯ ^ 2 Q ^ 1 2 ( p ^ q ^ 2 τ ^ 1 τ ^ 2 ε ^ 0 + p ^ 2 q ^ τ ¯ ^ 1 τ ¯ ^ 2 ε ^ 1 + τ ^ 1 τ ^ 2 τ ¯ ^ 1 τ ¯ ^ 2 Q ^ 1 Q ^ 2 ) p ^ q ^ Q ^ 1 Q ^ 2 .
Once the equation for the sample size is obtained, the method to calculate the sample size consists of the following steps:
(1)
Take pilot samples sized n τ , and from this sample, calculate τ ^ i , υ ^ i , ε ^ i , p ^ , Q ^ i and the Wald CI for the difference δ τ . If the Wald CI has a precision ϕ τ , then with the pilot sample, the precision has been reached, and the process ends. In this situation, the difference δ τ has been estimated with a precision ϕ τ to a confidence 100 ( 1 α ) % . Otherwise, go to the next step.
(2)
From the estimates obtained with the pilot sample, calculate the sample size n τ applying Equation (15).
(3)
Take the sample sized n τ ( n τ n τ individuals are added to the initial pilot sample), and from this sample, calculate all the estimators and the Wald CI for the difference δ τ . If the Wald CI has a precision ϕ τ , then the process ends (the precision has been reached with the new sample). If the Wald CI does not have the precision ϕ τ , then this sample is considered as a pilot sample and go to step 1.
This method to calculate the sample size n is an iterative method, which depends on the initial pilot sample and therefore does not guarantee that the difference between the positive predictive values will be estimated with the precision ϕ τ .
Sample size to estimate the difference δ υ is calculated in a similar way. In this case,
V a r ( δ ^ υ ) = p q υ 1 υ ¯ 1 Q ¯ 2 + p q υ 2 υ ¯ 2 Q ¯ 1 2 ( p q 2 υ ¯ 1 υ ¯ 2 ε 0 + p 2 q υ 1 υ 2 ε 1 + υ 1 υ 2 υ ¯ 1 υ ¯ 2 Q ¯ 1 Q ¯ 2 ) n p q Q ¯ 1 Q ¯ 2
and the sample size n υ to estimate the difference δ υ with precision ϕ υ and a confidence 100 ( 1 α ) % is
n υ = z 1 α / 2 2 ϕ υ 2 × p ^ q ^ υ ^ 1 υ ¯ ^ 1 Q ¯ ^ 2 + p ^ q ^ υ ^ 2 υ ¯ ^ 2 Q ¯ ^ 1 2 ( p ^ q ^ 2 υ ¯ ^ 1 υ ¯ ^ 2 ε ^ 0 + p ^ 2 q ^ υ ^ 1 υ ^ 2 ε ^ 1 + υ ^ 1 υ ^ 2 υ ¯ ^ 1 υ ¯ ^ 2 Q ¯ ^ 1 Q ¯ ^ 2 ) p ^ q ^ Q ¯ ^ 1 Q ¯ ^ 2 .
If the researcher wants to estimate δ τ with precision ϕ τ and also wants to estimate δ υ with precision ϕ υ , at the same level of confidence, then the final sample size is n = M a x ( n τ , n υ ) . Using the largest sample size, can guarantee that the CI for the difference of the two positive predictive values and that the CI for the difference of the two negative predictive values verify the set precision for each of them.
The method for calculating the sample size depends on the values of the estimators obtained from the pilot sample. As the values of the estimators depend on each sample (and therefore vary from one sample to another), it is necessary to study how the values of the estimators affect the calculation of the sample size. Therefore, we have carried out simulation experiments to study the effect of the values of the estimators on the calculation of the sample size. These simulation experiments consisted of the following steps:
(1)
Calculate the sample size n τ   ( n υ ) , Equations (14) and (16), from the values of the parameters in the scenarios considered (Table 2 and Table 4). Therefore, these equations have been applied using the values of the parameters instead of the values of the estimators.
(2)
Generate N = 10 , 000 multinomial random samples sized n τ   ( n υ ) and whose probabilities have been calculated from Equations (7) and (8), using the parameters of the scenarios considered, and as values ε i , intermediate (50%) and high (90%) values have been considered. From each one of the N random samples, all the estimators ( τ ^ i , υ ^ i , ε ^ i , p ^ and Q ^ i ) have been calculated, and then, the sample size n τ   ( n υ ) has been calculated applying Equation (14) (Equation (16)).
(3)
In each scenario considered, the average sample size and relative bias have been calculated.
Table 6 shows the results obtained for different precision values (2.5% and 5%, which are values that can be considered as high precision values) and 1 α = 0.95 . The relative biases are very small, the equations of the sample sizes provide robust values, and therefore, the pilot sample has little effect on the calculation of the sample sizes.

5. Program Cipvbdt

We have written a program in R [13] to solve the problems raised in this manuscript. The program is called “cicpvbdt” (confidence intervals to compare the predictive values of binary diagnostic tests), and it allows calculating all CIs and sample sizes. The program runs with the command “ cicpvbdt ( s 11 , s 10 , s 01 , s 00 , r 11 , r 10 , r 01 , r 00 , δ τ , δ υ ) ”. By default, the level of confidence is 95%. The program does not calculate the sample sizes when δ τ = 0 and δ υ = 0 , and only calculates the sample sizes when δ τ > 0 and/or δ υ > 0 . In this last situation, the program checks if the set precision is reached. The program checks that all values are valid (e.g., that there are no negative observed frequencies, etc.). The program also checks that all the parameters and their variances–covariances can be estimated. For the bias-corrected bootstrap CIs, 2000 samples with replacement are generated, and for the Monte Carlo Bayesian CIs, 10,000 random samples are generated. The results obtained are saved in a file called “results_cicpvbt.txt” in the folder from where the program is run. The program is available as Supplemental Material of this manuscript.

6. Example

The results obtained have been applied to a study on the diagnosis of colorectal cancer, using two diagnostic tests: Fecal Immunochemical Testing (FIT) and Fecal Occult Blood Testing (FOBT). The GS for the diagnosis of colorectal cancer is the biopsy. Table 7 shows the observed frequencies when the two BDTs and the GS have been applied to a sample of 168 adult men suspected of having colorectal cancer. Using the program “cicpvbdt” with the command “ cicpvbdt ( 68 , 18 , 1 , 13 , 4 , 1 , 2 , 61 , 0 , 0 ) ”; all the results shown in Table 7 are obtained.
The estimated positive predictive values of FIT and of FOBT are 94.5% and 92.0%, and the estimated negative predictive values are 81.8% and 66.7%, respectively. Using the recommendations given in Section 3, the 95% Wald CI for the difference between the two positive predictive values contains the value zero, and therefore (with α = 5 % ), the equality of the two positive predictive values is not rejected.
Regarding the negative predictive values, the 95% Wald CI does not contain the value zero, and therefore, we reject the equality of both negative predictive values. Therefore, negative predictive value of FIT is significantly greater than the negative predictive value of FOBT. The negative predictive value of FIT is (with a confidence of 95%) a value between 8.1% and 22.2% greater than the negative predictive value of FOBT. The same conclusions are obtained using the other CIs.
To illustrate the method of calculating the sample size, we are going to suppose that the clinician is interested in calculating the sample size necessary to estimate the difference between the two negative predictive values with a precision ϕ υ = 0.05 and 1 α = 0.95 . The 95% Wald CI for δ υ = υ 1 υ 2 is ( 0.081   ,   0.222 ) , and the precision is 0.0705 ( = ( 0.222 0.081 ) / 2 ) . Since ϕ υ = 0.05 < 0.0705 , with the sample of 168 individuals, the precision has not been reached, and therefore, the sample size must be calculated. Using the sample of 168 patients as a pilot sample and executing the command cicpvbdt ( 68 , 18 , 1 , 13 , 4 , 1 , 2 , 61 , 0 , 0.05 ) , it is obtained that n υ = 338 . A sample of 338 patients is necessary to estimate the difference between the two negative predictive values with a precision ϕ υ = 0.05 and a confidence of 95%. To the sample of 168 patients, another 170 new patients must be added. The two BDTs and the biopsy should be applied to new patients. Finally, it is necessary to recalculate the CIs with the sample of 338 patients and check if the set precision is verified.

7. Discussion

Comparison of the predictive values of two medical tests is a topic of interest in biostatistics. There are several articles that have studied hypothesis tests to solve these problems; however, the comparison of predictive values through confidence intervals has been little studied. In this manuscript, we have studied confidence intervals for the difference and for the ratio of the positive (negative) predictive values of two diagnostic tests under a paired design. We have carried out simulation experiments to study the asymptotic behaviors of the CIs, and we have given general rules of application. These rules are based on the sample size, since this is the only parameter that is set by the researcher, and also on the practical interpretation of the CIs. As a general conclusion, we recommend using the Wald interval for the difference of the two positive (negative) predictive values.
We have also proposed a method, based on the Wald CI for the difference, to calculate the sample size to estimate the difference between the two positive (negative) predictive values with a determined precision and confidence. This method starts from an initial pilot sample, and then the sample size is calculated from the estimators obtained with the initial sample. This method depends on the estimators of the pilot sample, so we have carried out simulation experiments to study the effect of the pilot sample on the sample size. The results obtained in these experiments have shown that the pilot sample does not have any important effect on the calculation of the sample size, and that therefore, the method has practical validity.

Supplementary Materials

The following are available online at https://www.mdpi.com/article/10.3390/math9131462/s1.

Author Contributions

The two authors have collaborated equally in the realization of this work. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

We thank the Academic Editor and the anonymous referees for their helpful comments that improved the quality of the manuscript.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

Let π = ( p 11 , p 10 , p 01 , p 00 , q 11 , q 10 , q 01 , q 00 ) T be the vector of probabilities of the multinomial distribution; then, the variance–covariance matrix of π ^ is π ^ = { diag ( π ) π π T } / n . In terms of ω , the predictive values are written as
τ 1 = p 10 + p 11 Q 1 ,   τ 2 = p 01 + p 11 Q 2 , υ 1 = q 00 + q 01 Q ¯ 1   and   υ 2 = q 00 + q 10 Q ¯ 2 ,
where
Q 1 = P ( T 1 = 1 ) = p 10 + p 11 + q 10 + q 11 = p S e 1 + q ( 1 S p 1 ) , Q ¯ 1 = 1 Q 1 = P ( T 1 = 0 ) = p 00 + p 01 + q 00 + q 01 = p ( 1 S e 1 ) + q S p 1 , Q 2 = P ( T 2 = 1 ) = p 01 + p 11 + q 01 + q 11 = p S e 2 + q ( 1 S p 2 )
and
Q ¯ 2 = 1 Q 2 = P ( T 2 = 0 ) = p 00 + p 10 + q 00 + q 10 = p ( 1 S e 2 ) + q S p 2 .
Let ω = ( τ 1 , τ 2 , υ 1 , υ 2 ) T be the vector of predictive values; then, applying the delta method, the matrix of the asymptotic variances–covariances of ω ^ is
ω ^ = ( ω π ) π ^ ( ω π ) T .
Performing the algebraic operations, it is obtained that
V a r ( τ ^ 1 ) = ( p 10 + p 11 ) ( q 10 + q 11 ) n Q 1 3 = τ 1 τ ¯ 1 n Q 1 , V a r ( τ ^ 2 ) = ( p 01 + p 11 ) ( q 01 + q 11 ) n Q 2 3 = τ 2 τ ¯ 2 n Q 2 , V a r ( υ ^ 1 ) = ( q 01 + q 00 ) ( p 01 + p 00 ) n Q ¯ 1 3 = υ 1 υ ¯ 1 n Q ¯ 1 , V a r ( υ ^ 2 ) = ( q 00 + q 10 ) ( p 00 + p 10 ) n Q ¯ 2 3 = υ 2 υ ¯ 2 n Q ¯ 2 , C o v ( τ ^ 1 , τ ^ 2 ) = p q 2 τ 1 τ 2 ε 0 + p 2 q τ ¯ 1 τ ¯ 2 ε 1 + τ 1 τ 2 τ ¯ 1 τ ¯ 2 Q 1 Q 2 n p q Q 1 Q 2
and
C o v ( υ ^ 1 , υ ^ 2 ) = p q 2 υ ¯ 1 υ ¯ 2 ε 0 + p 2 q υ 1 υ 2 ε 1 + υ 1 υ 2 υ ¯ 1 υ ¯ 2 Q ¯ 1 Q ¯ 2 n p q Q ¯ 1 Q ¯ 2
where τ ¯ i = 1 τ i and υ ¯ i = 1 υ i , with i = 1 , 2 . The estimated variances–covariances are obtained by substituting the parameters for their estimators. Equations (15) and (16) are obtained by substituting in equations
V a r ( δ ^ τ ) = V a r ( τ ^ 1 ) + V a r ( τ ^ 2 ) 2 C o v ( τ ^ 1 , τ ^ 2 )
and
V a r ( δ ^ υ ) = V a r ( υ ^ 1 ) + V a r ( υ ^ 2 ) 2 C o v ( υ ^ 1 , υ ^ 2 )
the variances–covariances by their corresponding expressions obtained previously.

References

  1. Bennett, B.M. On comparison of sensitivity, specificity and predictive value of a number of diagnostic procedures. Biometrics 1972, 28, 793–800. [Google Scholar] [CrossRef] [PubMed]
  2. Bennett, B.M. On tests for equality of predictive values for t diagnostic procedures. Stat. Med. 1985, 4, 535–539. [Google Scholar] [CrossRef] [PubMed]
  3. Leisenring, W.; Alonzo, T.; Pepe, M.S. Comparisons of predictive values of binary medical diagnostic tests for paired designs. Biometrics 2000, 56, 345–351. [Google Scholar] [CrossRef] [PubMed]
  4. Wang, W.; Davis, C.S.; Soong, S.J. Comparison of predictive values of two diagnostic tests from the same sample of subjects using weighted least squares. Stat. Med. 2006, 25, 2215–2229. [Google Scholar] [CrossRef] [PubMed]
  5. Kosinski, A.S. A weighted generalized score statistic for comparison of predictive values of diagnostic tests. Stat. Med. 2013, 32, 964–977. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  6. Tsou, T.S. A new likelihood approach to inference about predictive values of diagnostic tests in paired designs. Stat. Methods Med. Res. 2018, 27, 541–548. [Google Scholar] [CrossRef] [PubMed]
  7. Takahashi, K.; Yamamoto, K. An exact test for comparing two predictive values in small-size clinical trials. Pharm. Stat. 2020, 19, 31–43. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  8. Roldán-Nofuentes, J.A.; Luna del Castillo, J.D.; Montero-Alonso, M.A. Global hypothesis test to simultaneously compare the predictive values of two binary diagnostic tests. Comput. Stat. Data Anal. 2012, 56, 1161–1173. [Google Scholar] [CrossRef]
  9. Moskowitz, C.S.; Pepe, M.S. Comparing the predictive values of diagnostic tests: Sample size and analysis for paired study designs. Clin. Trials 2006, 3, 272–279. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  10. Vacek, P.M. The effect of conditional dependence on the evaluation of diagnostic tests. Biometrics 1985, 41, 959–968. [Google Scholar] [CrossRef] [PubMed]
  11. Efron, B.; Tibshirani, R.J. An Introduction to the Bootstrap; Chapman and Hall: Binghamton, NY, USA, 1993. [Google Scholar]
  12. Fieller, E.C. The biological standardization of insulin. J. R. Stat. Soc. 1940, 7, 1–64. [Google Scholar] [CrossRef]
  13. Team, R.C. R: A Language and Environment for Statistical Computing; R Core Team: Vienna, Austria, 2016; Available online: https://www.R-project.org/ (accessed on 11 June 2021).
Table 1. Observed frequencies subject to a paired design.
Table 1. Observed frequencies subject to a paired design.
T 1 = 1 T 1 = 0
T 2 = 1 T 2 = 0 T 2 = 1 T 2 = 0 Total
Diseased   ( D = 1 ) s 11 s 10 s 01 s 00 s
Non - diseased   ( D = 0 ) r 11 r 10 r 01 r 00 r
Total s 11 + r 11 s 10 + r 10 s 01 + r 01 s 00 + r 00 n
Table 2. Asymptotic behaviors of the CIs for the difference of the two positive predictive values.
Table 2. Asymptotic behaviors of the CIs for the difference of the two positive predictive values.
τ 1 = 0.75   τ 2 = 0.75   υ 1 = 0.95   υ 2 = 0.95   δ τ = 0 ε 1 = 0.124   ε 0 = 0.010   p = 0.10
WaldBCBMCB
nCPALCPALCPAL
5010.74910.74310.787
10010.53810.52710.683
2000.9880.40910.39710.454
5000.9530.2610.9900.2540.9980.363
10000.9440.1850.9570.1860.9930.259
τ 1 = 0.90   τ 2 = 0.85   υ 1 = 0.95   υ 2 = 0.90   δ τ = 0.05 ε 1 = 0.021   ε 0 = 0.044   p = 0.50
WaldBCBMCB
nCPALCPALCPAL
500.9820.25110.2420.9990.326
1000.9510.1740.9660.1820.9930.223
2000.9540.1220.9520.1260.9910.156
5000.9410.0770.9330.0770.9810.099
10000.9520.0550.9480.0550.9880.070
τ 1 = 0.85   τ 2 = 0.75   υ 1 = 0.95   υ 2 = 0.85   δ τ = 0.10 ε 1 = 0.037   ε 0 = 0.024   p = 0.25
WaldBCBMCB
nCPALCPALCPAL
500.9980.51310.49910.602
1000.9810.35410.3430.9990.445
2000.9410.2500.9870.2430.9910.318
5000.9560.1580.9590.1590.9890.204
10000.9530.1120.9540.1130.9890.145
CP: coverage probability. AL: average length. Wald: Wald CI. BCB: bias-corrected bootstrap CI. MCB: Monte Carlo Bayesian CI.
Table 3. Asymptotic behaviors of the CIs for the ratio of the two positive predictive values.
Table 3. Asymptotic behaviors of the CIs for the ratio of the two positive predictive values.
τ 1 = 0.75   τ 2 = 0.75   υ 1 = 0.95   υ 2 = 0.95   ρ τ = 1 ε 1 = 0.124   ε 0 = 0.010   p = 0.10
WaldLogFiellerBCBMCB
nCPALCPALCPALCPALCPAL
5011.32611.45012.04611.34812.183
1000.9990.96611.01811.31110.97311.534
2000.9890.6300.9920.6430.9940.65210.64810.978
5000.9620.3590.9660.3610.9530.3690.9880.3640.9980.535
10000.9560.2500.9520.2510.9440.2530.9560.2560.9930.363
τ 1 = 0.90   τ 2 = 0.85   υ 1 = 0.95   υ 2 = 0.90   ρ τ = 1.06   ε 1 = 0.021   ε 0 = 0.044   p = 0.50
WaldLogFiellerBCBMCB
nCPALCPALCPALCPALCPAL
500.9860.3260.9940.3280.9930.33410.34710.448
1000.9490.2160.9520.2160.9500.2180.9570.2320.9940.288
2000.9520.1510.9550.1510.9540.1520.9530.1570.9900.196
5000.9410.0950.9410.0950.9400.0950.9350.0950.9820.122
10000.9500.0670.9510.0670.9490.0670.9460.0670.9870.086
τ 1 = 0.85   τ 2 = 0.75   υ 1 = 0.95   υ 2 = 0.85   ρ τ = 1.13 ε 1 = 0.037   ε 0 = 0.024   p = 0.25
WaldLogFiellerBCBMCB
nCPALCPALCPALCPALCPAL
500.9970.95010.9580.9980.96110.99211.407
1000.9720.5910.9830.5960.9790.63610.6890.9990.841
2000.9410.3900.9430.3920.9400.3960.9880.3980.9890.528
5000.9500.2410.9540.2410.9570.2420.9600.2440.9890.314
10000.9510.1690.9530.1700.9500.1710.9530.1710.9880.218
CP: coverage probability. AL: average length. Wald: Wald CI. Log: logarithmic CI. Fieller: Fieller CI. BCB: bias-corrected bootstrap CI. MCB: Monte Carlo Bayesian CI.
Table 4. Asymptotic behaviors of the CIs for the difference of the two negative predictive values.
Table 4. Asymptotic behaviors of the CIs for the difference of the two negative predictive values.
τ 1 = 0.75   τ 2 = 0.75   υ 1 = 0.95   υ 2 = 0.95   δ υ = 0 ε 1 = 0.124   ε 0 = 0.010   p = 0.10
WaldBCBMCB
nCPALCPALCPAL
5010.12710.11910.154
1000.9990.07210.0690.9990.095
2000.9890.04610.0420.9990.063
5000.9490.0280.9680.0280.9940.040
10000.9460.0200.9430.0200.9930.028
τ 1 = 0.90   τ 2 = 0.85   υ 1 = 0.95   υ 2 = 0.90   δ υ = 0.05 ε 1 = 0.021   ε 0 = 0.044   p = 0.50
WaldBCBMCB
nCPALCPALCPAL
500.9990.26410.27610.344
1000.9660.1690.9520.1700.9950.222
2000.9500.1150.9320.1190.9890.147
5000.9490.0730.9480.0730.9830.090
10000.9520.0520.9530.0520.9840.063
τ 1 = 0.85   τ 2 = 0.75   υ 1 = 0.95   υ 2 = 0.85   δ υ = 0.10 ε 1 = 0.037   ε 0 = 0.024   p = 0.25
WaldBCBMCB
nCPALCPALCPAL
500.9360.2070.7200.1820.9480.218
1000.9380.1420.8740.1330.9600.151
2000.9480.0990.9370.0960.9670.107
5000.9570.0620.9610.0620.9750.068
10000.9460.0440.9470.0440.9640.048
CP: coverage probability. AL: average length. Wald: Wald CI. BCB: bias-corrected bootstrap CI. MCB: Monte Carlo Bayesian CI.
Table 5. Asymptotic behaviors of the CIs for the ratio of the two negative predictive values.
Table 5. Asymptotic behaviors of the CIs for the ratio of the two negative predictive values.
τ 1 = 0.75   τ 2 = 0.75   υ 1 = 0.95   υ 2 = 0.95   ρ υ = 1 ε 1 = 0.124   ε 0 = 0.010   p = 0.10
WaldLogFiellerBCBMCB
nCPALCPALCPALCPALCPAL
5010.14410.14410.14510.1281.0000.173
1000.9990.07610.0790.9990.08010.0740.9990.103
2000.9880.0460.9910.0470.9920.04810.0450.9990.068
5000.9500.0300.9500.0300.9490.0300.9710.0290.9940.042
10000.9480.0210.9470.0210.9460.0210.9460.0210.9930.030
τ 1 = 0.90   τ 2 = 0.85   υ 1 = 0.95   υ 2 = 0.90   ρ υ = 1.06 ε 1 = 0.021   ε 0 = 0.044   p = 0.50
WaldLogFiellerBCBMCB
nCPALCPALCPALCPALCPAL
5010.32410.3260.9990.33210.34910.462
1000.9540.2010.9640.2010.9620.2020.9640.2190.9950.274
2000.9450.1340.9460.1340.9470.1350.9210.1380.9890.175
5000.9460.0850.9450.0850.9460.0850.9470.0850.9820.105
10000.9540.0600.9520.0600.9520.0600.9500.0600.9830.074
τ 1 = 0.85   τ 2 = 0.75   υ 1 = 0.95   υ 2 = 0.85   ρ υ = 1.12 ε 1 = 0.037   ε 0 = 0.024   p = 0.25
WaldLogFiellerBCBMCB
nCPALCPALCPALCPALCPAL
500.9360.2710.9340.2720.9330.2750.7240.2830.9330.291
1000.9390.1840.9360.1840.9350.1850.8490.1910.9610.199
2000.9450.1290.9470.1290.9450.1290.9230.1250.9630.140
5000.9570.0810.9580.0810.9580.0810.9590.0810.9740.088
10000.9490.0570.9500.0570.9500.0570.9480.0570.9640.062
CP: coverage probability. AL: average length. Wald: Wald CI. Log: logarithmic CI. Fieller: Fieller CI. BCB: bias-corrected bootstrap CI. MCB: Monte Carlo Bayesian CI.
Table 6. Sample size for estimated the difference between the positive (negative) predictive values.
Table 6. Sample size for estimated the difference between the positive (negative) predictive values.
Positive Predictive Values
τ 1 = 0.90   τ 2 = 0.85   υ 1 = 0.95   υ 2 = 0.90 δ τ = 0.05   ε 1 = 0.021   ε 0 = 0.044   p = 0.50 τ 1 = 0.85   τ 2 = 0.75   υ 1 = 0.95   υ 2 = 0.85 δ τ = 0.10   ε 1 = 0.037   ε 0 = 0.024   p = 0.25
ϕ τ = 0.025 ϕ τ = 0.05 ϕ τ = 0.025 ϕ τ = 0.05
Sample size120330150481262
Average sample size120430250541267
Relative bias (%)0.170.330.120.40
Negative predictive values
τ 1 = 0.90   τ 2 = 0.85   υ 1 = 0.95   υ 2 = 0.90 δ υ = 0.05   ε 1 = 0.021   ε 0 = 0.044   p = 0.50 τ 1 = 0.85   τ 2 = 0.75   υ 1 = 0.95   υ 2 = 0.85 δ υ = 0.10   ε 1 = 0.037   ε 0 = 0.024   p = 0.25
ϕ υ = 0.025 ϕ υ = 0.05 ϕ υ = 0.025 ϕ υ = 0.05
Sample size1079270782196
Average sample size1080272783198
Relative bias (%)0.090.740.131.02
Table 7. Observed frequencies and CIs.
Table 7. Observed frequencies and CIs.
Observed Frequencies
FIT: positiveFIT: negative
BiopsyFOBT: positiveFOBT: negativeFOBT: positiveFOBT: negativeTotal
Cancer6818113100
Normal4126168
Total7219374168
Results
Positive predictive valueNegative predictive value
FIT 0.945 ± 0.024 0.818 ± 0.044
FOBT 0.920 ± 0.031 0.667 ± 0.049
p ε 1 ε 0 Q 1 Q 2
0.5950.0870.0520.5420.446
CIs   for   δ τ = τ 1 τ 2
WaldBCBMCB
( 0.016   ,   0 .066 ) ( 0.013   ,   0.073 ) ( 0.045   ,   0.105 )
CIs   for   ρ τ = τ 1 / τ 2
WaldLogFiellerBCBMCB
( 0.981   ,   1.073 ) ( 0.982   ,   1.074 ) ( 0.983   ,   1.076 ) ( 0.985   ,   1.084 ) ( 0.952   ,   1.124 )
CIs   for   δ υ = υ 1 υ 2
WaldBCBMCB
( 0.081   ,   0 .222 ) ( 0.089   ,   0.231 ) ( 0.049   ,   0 . 248 )
CIs   for   ρ υ = υ 1 / υ 2
WaldLogFiellerBCBMCB
( 1.101   ,   1.353 ) ( 1.108   ,   1.350 ) ( 1.112   ,   1.368 ) ( 1.121   ,   1.382 ) ( 1.069   ,   1.420 )
Wald: Wald CI. Log: logarithmic CI. Fieller: Fieller CI. BCB: bias-corrected bootstrap CI. MCB: Monte Carlo Bayesian CI.
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Roldán-Nofuentes, J.A.; Regad, S.B. Confidence Intervals and Sample Size to Compare the Predictive Values of Two Diagnostic Tests. Mathematics 2021, 9, 1462. https://doi.org/10.3390/math9131462

AMA Style

Roldán-Nofuentes JA, Regad SB. Confidence Intervals and Sample Size to Compare the Predictive Values of Two Diagnostic Tests. Mathematics. 2021; 9(13):1462. https://doi.org/10.3390/math9131462

Chicago/Turabian Style

Roldán-Nofuentes, José Antonio, and Saad Bouh Regad. 2021. "Confidence Intervals and Sample Size to Compare the Predictive Values of Two Diagnostic Tests" Mathematics 9, no. 13: 1462. https://doi.org/10.3390/math9131462

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop