Efficient and exact tests of the risk ratio in a correlated 2×2 table with structural zero

https://doi.org/10.1016/j.csda.2006.12.035Get rights and content

Abstract

For a correlated 2×2 table where the (01) cell is empty by design, the parameter of interest is typically the ratio of the probability of secondary response conditional on primary response to the probability of primary response, also known as a risk ratio. It is common to test whether or not the risk ratio equals one. One method of obtaining an exact P-value is to maximise the tail probability of the test statistic over the nuisance parameter. It is argued that better results are obtained by first replacing the nuisance parameter by its profile estimate in the calculation of its exact significance followed by maximisation—termed an E+M P-value. We consider four standard approximate test statistics with and without the common correction of adding 12 to each count. From a complete enumeration of the distributions of these P-values (for sample sizes 50 and 100), we recommend E+M P-values based on the uncorrected Wald statistic for testing the greater than alternative and on the corrected Wald statistic on the log-scale for testing the less than alternative. A good compromise statistic for both kinds of alternatives is the likelihood ratio statistic.

Introduction

A sample of n individuals have a binary response measured. For reasons of design, only those who give a certain response on the first occasion are measured a second time. Such designs arise, for instance, in both treating and testing for disease, see Johnson and May (1995). An often quoted example is Toyota et al. (1999) who study the detection rates of a screening test for tuberculosis. For those who test negative on the first occasion the test is applied a second time 1–3 weeks later, whereas those who test positive on the first occasion do not need to be retested. It is suspected that application of the first test, even if negative, makes infected individuals more sensitive to subsequent tests. This booster phenomenon can be measured by the extent to which the probability of a negative response decreases from the first to the second occasion, given the first.

Another example which we will study in some detail was given in Agresti (1990, p. 45) (Table 1). A sample of 156 calves were tested for pneumonia during the first 60 days of life and a total of T=93 were positive. Of these 93 calves with primary infection, n11=30 suffered a secondary infection in the following two weeks. There was interest in comparing the rate of primary infection, estimated to be 93/156=59.6%, with the rate of secondary infection, estimated to be 30/93=32.3%. The ratio of these two probabilities, known as the risk ratio (RR), represents the factor by which chance of infection changes after first infection, and is here estimated to be 30×156/932=0.54. An RR less than 1.0 suggests that primary infection has an immunising effect. Agresti defines a kind of χ2 statistic for testing whether the RR=1. For this example, the value turns out to be 19.7. The signed version of this statistic is -4.44 and the approximate one-sided P-value is 4.6×10-6. Certainly the evidence for a protective effect of first infection seems to be overwhelming.

Liu (1998, 2000) studied confidence intervals for the ratio and difference in response probabilities, respectively. This work was further developed by Tang and Tang (2002) for the ratio and Tang and Tang (2003) for the difference of probabilities. Lloyd and Moldovan, 2007a, Lloyd and Moldovan, 2007b have recently applied the exact method of Buehler (1957) to confidence limits for the RR. There has been less work on the testing problem though obviously the confidence intervals can be used to define two-sided tests.

This paper is motivated by several considerations. First, in this problem it is quite computationally feasible to calculate a P-value with exact statistical properties. This is achieved by maximising over the nuisance parameter. Within the frequentist paradigm of inference it is essential to account for the worst possible parameter values if the statistical properties are to be guaranteed. While this may seem conservative, maximisation is the most efficient method possible of achieving this guarantee. Tests which are not maximised over the nuisance parameter are either systematically conservative or explicitly violate their stated properties, as explained in Lloyd (2005). Second, standard asymptotic tests have statistical properties that are far from ideal, even for large samples. The issue of exactness is a practical one. For instance in the above example of Agresti, the exact P-value obtained by maximising over the nuisance parameter is 0.00361 which, while still small, corresponds to an equivalent Z-statistic of -2.91 rather than -4.44. Such behaviour is not at all uncommon. Third, such behaviour can be largely eliminated by replacing the nuisance parameter with a null estimate and then maximising, as described in Section 3. This results in a P-value that is less sensitive to the nuisance parameter and consequently the maximised versions tend to be smaller. This will be seen to translate into superior power for guaranteed size. Lastly, we look at the performance when RR>θ0 and RR<θ0 separately and discover quite different behaviour.

Section snippets

Model notation and approximate test statistics

The possible responses of an individual are {11,10,00}, where 00 denotes a negative response on occasion 1, in which case the second response is negative by convention. Let nij be the number of individuals with response ij and pij the probability of this response. The count n01=0 is absent by design. The probability of a positive response on the first occasion is ψ=p11+p10. The probability of a second positive response given a first positive response is p11/ψ. The ratio of these two

Exact tests and P-values

Tang and Tang (2002) have shown numerically that confidence intervals based on W1 and W2 can have poor coverage properties even for moderate sample sizes. This leads to poor performance of the implied two-sided test, at least for some null values. In this section we give a brief overview of methods for constructing the so-called exact tests from a given, possibly approximate, test statistic. We have data Y and parameter (θ,ψ) and want to test the null hypothesis θ=θ0, against either one or

Numerical study

We have described four basic test statistics W1,W2,S,G and their modified versions. Each of these eight basic statistics generate four P-values, namely the approximate P-value based on the normal asymptotic, the M P-value, the E P-value and the E+M P-value. Only the M and E+M P-values can be guaranteed as valid. For all possible data sets when n=50 and 100, all 32 P-values for testing the null hypothesis θ=1 versus θ1 were computed. This allows a full investigation of the performance of the

Discussion

Another method of comparing secondary and primary probability of success is by the simple difference rather than the ratio. Approximate confidence intervals which generate two-sided tests are given by Lui (2000) and Tang and Tang (2003). When only a proportion of individuals have a structural zero, inference has been studied by Tang and Tang (2004). The study in Lloyd (2005) has considered some other basic generating statistics, including one based on the conditional distribution of X given T.

References (19)

  • Agresti, A., 1990. Categorical data analysis. first ed. Wiley,...
  • D. Basu

    On the elimination of nuisance parameters

    J. Amer. Statist. Assoc.

    (1977)
  • R.L. Berger et al.

    P values maximised over a confidence set for the nuisance parameter

    J. Amer. Statist. Assoc.

    (1994)
  • R.L. Berger et al.

    Exact unconditional tests for a 2×2 matched pairs design

    Statist. Methods Med. Res.

    (2003)
  • P.J. Bickel et al.

    Mathematical Statistics

    (1977)
  • R.J. Buehler

    Confidence intervals for the product of two binomial parameters

    J. Amer. Statist. Assoc.

    (1957)
  • X.Z. Fang et al.

    EM algorithm and its application to testing hypotheses

    Sci. China A

    (2003)
  • W.D. Johnson et al.

    Combining 2×2 tables that contain structural zero

    Statist. Med.

    (1995)
  • Lloyd, C.J., 2005. E+M P-values. Austral. NZ. J. Statist., submitted for publication and available as Working Paper...
There are more references available in the full text version of this article.

Cited by (3)

View full text