Abstract
RNA sequencing (RNA-seq) is widely used to study gene expression changes associated with treatments or biological conditions. Many popular methods for detecting differential expression (DE) from RNA-seq data use generalized linear models (GLMs) fitted to the read counts across independent replicate samples for each gene. This article shows that the standard formula for the residual degrees of freedom (d.f.) in a linear model is overstated when the model contains fitted values that are exactly zero. Such fitted values occur whenever all the counts in a treatment group are zero as well as in more complex models such as those involving paired comparisons. This misspecification results in underestimation of the genewise variances and loss of type I error control. This article proposes a formula for the reduced residual d.f. that restores error control in simulated RNA-seq data and improves detection of DE genes in a real data analysis. The new approach is implemented in the quasi-likelihood framework of the edgeR software package. The results of this article also apply to RNA-seq analyses that apply linear models to log-transformed counts, such as those in the limma software package, and more generally to any count-based GLM where exactly zero fitted values are possible.
1 Introduction
Transcriptional profiling with RNA sequencing (RNA-seq) is widely used to study gene expression profiles associated with a particular biological condition. A common aim of RNA-seq studies is to detect differentially expressed (DE) genes between two or more conditions. To do so, the number of sequencing reads mapped to the exons of each gene is counted to quantify the expression of that gene (Liao et al., 2014). This is performed for multiple replicate libraries in each condition, where each replicate is prepared from an independent biological sample. Statistical analyses can then be performed with methods like edgeR (Robinson et al., 2010) to identify genes with significant differences in the read counts between conditions (Anders et al., 2013). These putative DE genes are the basis for further investigation into the mechanisms driving the biological difference of interest.
The differential expression analysis must be able to handle discrete count data with extra-Poisson variability between biological replicates. To this end, the counts for each gene can be fitted to generalized linear models (GLMs) based on the negative binomial (NB) distribution (McCarthy et al., 2012). Overdispersion in the counts between replicates is modelled with the NB dispersion parameter. Further sophistication can be added with the quasi-likelihood (QL) framework (Lund et al., 2012), which introduces an additional QL dispersion parameter to model estimation uncertainty. The mean-variance relationship of the count ygi for gene g in library i can then be written as
where μgi is the mean, ϕg is the NB dispersion and
Fitting a GLM to RNA-seq data involves estimation of real-valued model coefficients from discrete non-negative counts. Consider an RNA-seq data set with n libraries. Assume that a stable estimate of the common or trended NB dispersion has been obtained by using an EB approach across all genes (McCarthy et al., 2012). For each gene, a GLM is fitted to the n counts using the estimated NB dispersion and a design matrix X that has p independent coefficients. The residual degrees of freedom (d.f.) is canonically defined as
This represents the number of counts that need to be known in order to derive all n counts, given the estimates of the p coefficients from the fitted model. The total residual deviance of the fitted model for gene g is denoted as Dg, and is distributed as
where
Thus, correct calculation of the residual d.f. is necessary for QL dispersion estimation in the EB framework described by Lund et al. (2012).
The appropriate residual d.f. may not be obvious when zeroes are present in the set of counts for a gene. Any library that has a fitted value of zero must also have a count of zero. This will not contribute any d.f. to the fit, because no additional observations need to be known to identify the count as zero. As such, the true residual d.f. will be lower than the canonical value d. Dg will also be lower than expected, as the unit deviance will be zero for libraries where the fitted value and count are identical. Thus, the above expression for
This paper demonstrates that the use of the canonical residual d.f. can result in underestimation of the QL dispersion and loss of type I error control in the presence of zero counts. A corrected definition of the residual d.f. is given for genes that have fitted values of zero for some libraries. The performance of this solution compares favourably to the canonical defininition on both simulated and real datasets.
2 Correctly specifying the residual d.f.
2.1 Definition of the reduced residual d.f.
Let
The value of dg can be determined by identifying and ignoring the libraries in Zg. In practice, this is done by removing all libraries with fitted values below some arbitrarily small value like 10−4. This ensures that libraries in Zg are not overlooked due to numerical imprecision of the model fit. The effective number of libraries for gene g becomes
where the column rank of Xg is simply the number of independent coefficients in Xg. This can be obtained by performing a QR decomposition on Xg and counting the number of non-zero diagonal elements in the resulting R matrix. The above expression for dg is appropriate as it explicitly ignores the libraries that do not contribute any d.f. to the model fit. This avoids any overestimation of the residual d.f. that might occur with the canonical definition. Of course, dg = d if Zg is empty as the rank of Xg is equal to p.
2.2 Overview of the canonical QL framework
Lund et al. (2012) assume that
which effectively squeezes
The QL F-test is used to test a pre-specified null hypothesis by comparing nested designs. One or more columns are chosen and removed from X to obtain the null design matrix X0. The moderated F-statistic uses the shrunken dispersion and is defined as
where t is the difference in the number of coefficients between designs and LRg is the likelihood ratio, i.e. the difference in the total residual deviances between GLMs fitted with X0 and X. Under the null hypothesis, Fg will be F-distributed on t and d + d0 d.f.’s. This can be used to compute a p-value to reject or accept the null for each gene. In practice, a lower bound for the p-value is defined by using the likelihood ratio test after fitting a Poisson GLM to the counts for each gene. This reflects the fact that RNA-seq data should exhibit at least Poisson-level variance due to sequencing noise (Marioni et al., 2008).
2.3 Redefined statistics with the true residual d.f.
The canonical definitions are only correct for gene g when d = dg. If these two values are not equal, all occurrences of d should be replaced with dg. For simplicity, assume that all libraries i ∈ Zg have true means close to zero. This is reasonable in most replicated designs, where a fitted value of zero from a large true mean would require zero counts for all replicate observations (which is unlikely). The assumption means that the libraries in Zg do not contribute to any sampled instance of Dg, i.e. the unit deviance from each library will always be zero. Thus,
This expression ensures that
while the moderated F-statistic is redefined as
This is F-distributed on t and dg + d0(2) d.f.’s, and can be used to compute a p-value as previously described.
3 Assessing performance on simulated data
3.1 The canonical method underestimates the dispersion
Consider a one-way layout with two replicate libraries in each of four conditions A1, A2, B1 and B2. NB-distributed counts for 10,000 genes were generated using a dispersion of 0.05 and a mean of μc for each library in condition c. For the first 5000 genes,
The expected value of the QL dispersion in this simulation is
3.2 Assessing type I error control
3.2.1 Details of the simulation design
Consider a one-way layout with two replicates in each of four conditions for 10,000 genes, similar to that described in Section 3.1. A set K was defined containing 5 to 100% of all genes. For half of all genes in K, zero counts were added by setting
The null hypothesis of
3.2.2 The canonical method fails to control type I error
In all tested scenarios, the observed error rate for the reduced residual d.f. is closer to the threshold than that for the canonical d.f. (Table 1). This is consistent with the correctness of the reduced d.f. and its associated statistics when zero counts are present.
Prop. (%) | Method | Type I error threshold (%) | ||
---|---|---|---|---|
0.1 | 1 | 10 | ||
5 | Canonical | 0.0615 (0.0071) | 0.9070 (0.0203) | 10.2590 (0.0690) |
Reduced | 0.0970 (0.0098) | 0.9750 (0.0232) | 9.8505 (0.0764) | |
20 | Canonical | 0.0720 (0.0060) | 1.1585 (0.0251) | 12.4570 (0.0765) |
Reduced | 0.1120 (0.0072) | 1.0055 (0.0212) | 10.0155 (0.0591) | |
50 | Canonical | 0.2245 (0.0099) | 2.5935 (0.0373) | 17.6345 (0.0931) |
Reduced | 0.0975 (0.0064) | 0.9650 (0.0183) | 9.7920 (0.0769) | |
100 | Canonical | 1.2255 (0.0325) | 7.0730 (0.0609) | 28.0380 (0.1127) |
Reduced | 0.1200 (0.0115) | 1.0105 (0.0379) | 10.0040 (0.0979) |
Each simulation scenario has a different proportion of genes with zero counts. For each error rate, the mean was computed over 20 simulation iterations and is shown as a percentage (standard error in brackets).
Loss of type I error control for the canonical method is observed in some scenarios. This is attributable to underestimation of the QL dispersion, which inflates the F-statistic and decreases the p-value for each gene in K. Moreover, this error is propagated to all genes through EB shrinkage. Smaller values of
Prop. (%) | Scaling factor | Prior d.f. | ||
---|---|---|---|---|
d0 | ||||
5 | 0.88 (0.00) | 0.95 (0.00) | 15.2 (0.3) | 30.1 (1.1) |
20 | 0.70 (0.00) | 0.94 (0.00) | 6.2 (0.1) | 28.8 (1.2) |
50 | 0.45 (0.00) | 0.94 (0.00) | 3.1 (0.0) | 27.2 (2.0) |
100 | 0.25 (0.00) | 0.94 (0.00) | 2.5 (0.0) | 33.1 (6.1) |
The estimated scaling factor and prior d.f. were computed using the canonical and reduced definitions for the residual d.f. All values represent the mean over 20 simulation iterations. Standard errors are also shown in brackets.
The canonical method is also conservative in some scenarios. This is driven by an increase in the variability of the dispersions when zero counts are present. Recall that the QL dispersions for g ∈ K are consistently underestimated. This results in a population of dispersion estimates that is distinct from those for g ∉ K, inflating the apparent variability of
The overall effect of using the canonical d.f. on the type I error rate is not easy to predict. The outcome depends on whether the conservativeness from the reduced d0 can offset the liberalness from the underestimation of
3.3 Care is required when the reduced residual d.f. is zero
Some additional care is required when dealing with genes where dg = 0. These genes provide no information on the variability of the counts and have undefined
To demonstrate, consider a one-way layout containing three conditions A, B1 and B2 for 10,000 genes. Condition A contains two replicates whereas B1 and B2 contain one replicate each. For the first 50% of genes,
3.4 Effect on log-transformed counts in linear models
3.4.1 Overview
An alternative approach to analyzing count data involves fitting a linear model to log-CPM values (Law et al., 2014). Consider an example data set where all libraries have 106 reads. Any zeroes will be converted into a lower bound for the log-CPM, e.g.
To demonstrate, NB-distributed counts for a four-condition design were simulated as described in Section 3.1. The voom function was applied to log-transform the counts and to compute precision weights from a fitted mean-variance trend. A linear model was fitted to the log-CPM values with the precision weights using limma v3.30.9, and an EB strategy was applied to robustly shrink the variances (Phipson et al., 2016). The mean variance estimate across all genes was recorded. The F-test was applied to compute p-values against the null hypothesis, i.e.
The mean sample variance across all genes was estimated as 0.065 and 0.132 using the canonical and reduced definitions, respectively. This fold difference is consistent with the differences in the residual d.f.’s, i.e. d = 4 and dg = 2. Type I error control was subsequently lost with the canonical method (Figure 1), consistent with the liberalness observed in simulations for NB-based GLMs. Uniformity of the p-values was restored by removing the libraries in the offending conditions. This ensured that the reduced residual d.f. was used during modelling. In practice, removal of offending libraries is complicated by the presence of variable library sizes, resulting in a variable lower bound that is difficult to define for any given library. For simplicity, such cases will not be considered here.
4 Effect of d.f. specification on real data
4.1 Overview
To determine the relevance of the simulation results, analyses with the canonical and reduced residual d.f.’s were compared on a real RNA-seq dataset. Data was generated from a Pax5 knock-out experiment in pro-B cells by Drs. Rhys Allan and Steve Nutt in the Molecular Immunology division of the Walter and Eliza Hall Institute of Medical Research. This dataset contains two replicate libraries for the knock-out (KO) condition and one replicate for the wild-type (WT) condition.
4.2 Processing steps for real RNA-seq data
Paired-end sequencing was performed by the Beijing Genomics Institute on an Illumina HiSeq 2000. Each library consisted of approximately 9 million pairs of 90 bp reads. Reads were aligned to the mm10 build of the mouse genome using subread v1.4.6 (Liao et al., 2013) in paired-end mode with unique mapping and tie splitting by Hamming distance. Read pairs were summarized into gene counts using the featureCounts function (Liao et al., 2014) in the Rsubread package v1.18.1. The number of fragments mapped to exons was counted for each gene in the NCBI mouse build 38 annotation. Note that each read pair corresponds to a single cDNA fragment and is counted no more than once. Reads with MAPQ scores below 10 were ignored to avoid non-uniquely/poorly mapped reads. Approximately 64% of read pairs in each library were counted into genes.
Genes were filtered to remove those with a count sum across all libraries below 10. This removes lowly expressed genes that are unlikely to be DE and is roughly independent of the p-values when library sizes are similar. A differential expression analysis was performed using edgeR to compare the WT and KO conditions. Briefly, a trended NB dispersion was estimated and used for GLM fitting. Offsets were defined from the log-transformed effective library sizes, after using TMM normalization (Robinson and Oshlack, 2010) to remove composition biases. Raw QL dispersions were estimated with the canonical residual d.f., as previously described. A second mean-dependent trend was fitted to the raw QL estimates, and EB shrinkage was performed towards this trend (Lund et al., 2012). Trend fitting is necessary here to empirically model non-NB mean-variance relationships, but was not required for the simulations where counts were exactly NB-distributed for simplicity. Finally, the QL F-test was used to compute a p-value for each gene.
Genes were considered to be significantly DE if the false discovery rate (FDR) was <5% after applying the Benjamini and Hochberg method to all p-values. This was repeated using the reduced residual d.f. and its associated statistics.
4.3 Differences between definitions are present in real data
The QL framework was applied with the reduced and canonical residual d.f.’s to identify DE genes between WT and KO cells. For the canonical analysis, a cluster of low dispersion estimates was observed (Figure 2A). These correspond to 139 genes that have zero counts in the two KO samples. Here, there are no residual d.f. so the deviance is always zero. As d = 1,
Design | Method | Type I error threshold (%) | ||
---|---|---|---|---|
0.1 | 1 | 10 | ||
One-way | Canonical (in K) | 0.648 (0.019) | 4.249 (0.062) | 21.683 (0.114) |
Canonical (not in K) | 0.085 (0.008) | 1.202 (0.028) | 12.302 (0.110) | |
Reduced (in K) | 0.122 (0.010) | 0.953 (0.022) | 9.450 (0.088) | |
Reduced (not in K) | 0.096 (0.009) | 0.945 (0.029) | 9.958 (0.098) | |
Paired | Canonical (in K) | 1.998 (0.045) | 9.715 (0.107) | 31.449 (0.199) |
Canonical (not in K) | 0.095 (0.007) | 1.313 (0.023) | 12.659 (0.080) | |
Reduced (in K) | 0.130 (0.012) | 0.970 (0.041) | 9.390 (0.084) | |
Reduced (not in K) | 0.114 (0.009) | 0.938 (0.024) | 9.437 (0.094) |
Simulations were performed using a one-way layout or a paired-samples design. Separate error rates are shown for genes in and outside K. Each value represents the mean error rate across 20 simulation iterations, with the standard error shown in brackets.
Hypothesis testing was also performed to identify DE genes using the reduced and canonical methods. At a FDR of 5%, the number of DE genes increased from 1926 to 2809 when dg was used instead of d. Reduced detection with the canonical d.f. is consistent with the conservativeness observed in Table 1, where the drop in d0 outweighs the liberalness caused by the underestimation of
5 Differences are recapitulated in realistic simulations
The previous simulations were intentionally simplified to highlight the effects of misspecifying the residual d.f. To demonstrate that such effects were still present in realistic scenarios, additional simulations were performed based on the Pax5 data. For each gene g, the trended NB dispersion ϕg and the average count were estimated from the Pax5 counts using edgeR as described above. This recapitulates the distribution of abundances and the mean-dispersion relationship present in real data. A new value for the dispersion was obtained by sampling from
Two experimental designs were considered in this simulation – a one-way layout containing four groups (A1, A2, B1, B2) of two replicates each, and a paired-samples design with four pairs containing one sample from each of two groups (A1, A2). To introduce fitted values of zero to the simulation, half of all genes were allocated into the set K. For each gene in K, counts were set to zero for all samples in B1 and B2 in the one-way layout, or for all samples in two pairs in the paired-samples design. The QL framework was then applied to test the null hypothesis of equal expression in A1 and A2. This motivates the choice of samples to set to zero, as it ensures that the null hypothesis is still true for all genes. The observed type I error rate was estimated as the proportion of genes with p-values below the specified threshold. This was done separately for genes in and outside of K, to determine the effect of misspecified residual d.f. on each class of genes. The large size of K ensures that stable estimates of the observed type I error rates can be obtained.
The use of the reduced d.f. in the QL framework controlled the observed type I error rate close to or below the specified threshold in all simulation scenarios (Table 3). This is consistent with the proper specification of the residual d.f. in the presence of fitted values of zero. In contrast, use of the canonical d.f. yielded liberal results for all genes in K. This is attributable to underestimation of the QL dispersion when the residual d.f. is overstated. More subtle loss of type I error control was also observed for genes outside of K at some thresholds, caused by distortion of the EB statistics when the QL dispersions in K are incorrectly estimated. These results indicate that the advantages provided by the reduced d.f. are still present in realistic scenarios.
The loss of residual d.f. in the paired-samples design warrants some further discussion. In the case where both observations in a pair are zero, the corresponding fitted values would obviously be zero (i.e. the coefficient for the blocking term for this pair in a log-link model would approach negative infinity). Neither observation would provide any residual d.f., which is not considered by the canonical definition. Another example involves pairs of control/treatment samples where the treatment sample has a count of zero in all pairs. In this scenario, the coefficient for the treatment effect would approach negative infinity, such that all treatment samples would have fitted values of zero. However, the coefficients for the pair-specific blocking terms would still be free to vary to fit the control sample in each pair. This means that there are no residual d.f. for dispersion estimation, regardless of the number of pairs. Both situations are handled properly by the reduced definition of the residual d.f., as demonstrated above.
6 Discussion
This article has shown that the standard formulation for the residual d.f. in a linear model is not correct when NB GLMs are used to model RNA-seq read count data and fitted values of zero are present. The incorrect formulation will result in underestimation of the QL dispersion and potential loss of type I error control. Such problems can be avoided by using a refined definition for the residual d.f. that accounts for the presence of zeroes. The new reduced d.f. formula restores error control in the QL framework for simulated count data. Similar behaviour is observed in linear models for log-CPMs and in a DE analysis of a real RNA-seq dataset. While this work focuses on RNA-seq data, similar conclusions can be drawn for analyses of read counts from other genomic technologies such as ChIP-seq (Lun and Smyth, 2016) or Hi-C (Lun and Smyth, 2015).
The results presented here have much wider implications for the application of GLMs in fields other than genomics and genetics. The reduced d.f. formula derived here will be a more appropriate definition of the residual d.f. for any GLM when the fitted values are non-negative but exactly zero fitted values are possible. This scenario can arise, for example, when conducting goodness of fit tests for Poisson or multinomial GLMs. The reduced d.f. will be appropriate for any count-based GLM when a quasi-dispersion parameter or parameters in the variance function need to be estimated (Wedderburn, 1974). Exactly zero fitted values can also arise when using Tweedie GLMs to model insurance claims (Smyth and Jørgensen, 2002) or rainfall data (Dunn and Smyth, 2005).
The reduced d.f. method described in this article has been implemented in the glmQLFit function of the edgeR package, available from the open-source Bioconductor project.
Acknowledgement
The authors would like to thank Dr. Yunshun Chen for assistance with incorporating the method into edgeR.
Funding: National Health and Medical Research Council (Program Grant 1054618 to G.K.S., Fellowship 1058892 to G.K.S.); Victorian State Government Operational Infrastructure Support; Department of Education and Training, Australian Government NHMRC IRIIS.
Conflict of interest: None declared.
References
Anders, S., D. J. McCarthy, Y. Chen, M. Okoniewski, G. K. Smyth, W. Huber and M. D. Robinson (2013): “Count-based differential expression analysis of RNA sequencing data using R and Bioconductor,” Nat. Protoc., 8, 1765–1786.10.1038/nprot.2013.099Search in Google Scholar PubMed
Becker-Herman, S., F. Lantner and I. Shachar (2002): “Id2 negatively regulates B cell differentiation in the spleen,” J. Immunol., 168, 5507–5513.10.4049/jimmunol.168.11.5507Search in Google Scholar PubMed
Chu, Y., J. C. Vahl, D. Kumar, K. Heger, A. Bertossi, E. Wojtowicz, V. Soberon, D. Schenten, B. Mack, M. Reutelshofer, R. Beyaert, K. Amann, G. van Loo and M. Schmidt-Supprian (2011): “B cells lacking the tumor suppressor TNFAIP3/A20 display impaired differentiation and hyperactivation and cause inflammation and autoimmunity in aged mice,” Blood, 117, 2227–2236.10.1182/blood-2010-09-306019Search in Google Scholar PubMed
Dunn, P. and G. Smyth (2005): “Series evaluation of Tweedie exponential dispersion model densities,” Stat Comput., 15, 267–280.10.1007/s11222-005-4070-ySearch in Google Scholar
Law, C. W., Y. Chen, W. Shi and G. K. Smyth (2014): “Voom: precision weights unlock linear model analysis tools for RNA-seq read counts,” Genome Biol., 15, R29.10.1186/gb-2014-15-2-r29Search in Google Scholar PubMed PubMed Central
Liao, Y., G. K. Smyth and W. Shi (2013): “The Subread aligner: fast, accurate and scalable read mapping by seed-and-vote,” Nucleic Acids Res., 41, e108.10.1093/nar/gkt214Search in Google Scholar PubMed PubMed Central
Liao, Y., G. K. Smyth and W. Shi (2014): “featureCounts: an efficient general purpose program for assigning sequence reads to genomic features,” Bioinformatics, 30, 923–930.10.1093/bioinformatics/btt656Search in Google Scholar PubMed
Lun, A. T. L. and G. K. Smyth (2015): “diffHic: a bioconductor package to detect differential genomic interactions in Hi-C data,” BMC Bioinformatics, 16, 258.10.1186/s12859-015-0683-0Search in Google Scholar PubMed PubMed Central
Lun, A. T. L. and G. K. Smyth (2016): “csaw: a bioconductor package for differential binding analysis of ChIP-seq data using sliding windows,” Nucleic Acids Res., 44, e45.10.1093/nar/gkv1191Search in Google Scholar PubMed PubMed Central
Lund, S. P., D. Nettleton, D. J. McCarthy and G. K. Smyth (2012): “Detecting differential expression in RNA-sequence data using quasi-likelihood with shrunken dispersion estimates,” Stat. Appl. Genet. Mol. Biol., 11, Article Number 8.10.1515/1544-6115.1826Search in Google Scholar PubMed
Marioni, J. C., C. E. Mason, S. M. Mane, M. Stephens and Y. Gilad (2008): “RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays,” Genome Res., 18, 1509–1517.10.1101/gr.079558.108Search in Google Scholar
McCarthy, D. J., Y. Chen and G. K. Smyth (2012): “Differential expression analysis of multifactor RNA-Seq experiments with respect to biological variation,” Nucleic Acids Res., 40, 4288–4297.10.1093/nar/gks042Search in Google Scholar
Peng, S. L., A. J. Gerth, A. M. Ranger and L. H. Glimcher (2001): “NFATc1 and NFATc2 together control both T and B cell activation and differentiation,” Immunity, 14, 13–20.10.1016/S1074-7613(01)00085-1Search in Google Scholar
Phipson, B., S. Lee, I. J. Majewski, W. S. Alexander and G. K. Smyth (2016): “Robust hyperparameter estimation protects against hypervariable genes and improves power to detect differential expression,” Ann. Appl. Stat., 10, 946–963.10.1214/16-AOAS920Search in Google Scholar PubMed PubMed Central
Robinson, M. D. and A. Oshlack (2010): “A scaling normalization method for differential expression analysis of RNA-seq data,” Genome Biol., 11, R25.10.1186/gb-2010-11-3-r25Search in Google Scholar PubMed PubMed Central
Robinson, M. D., D. J. McCarthy and G. K. Smyth (2010): “edgeR: a Bioconductor package for differential expression analysis of digital gene expression data,” Bioinformatics, 26, 139–140.10.1093/bioinformatics/btp616Search in Google Scholar PubMed PubMed Central
Smyth, G. K. (2004): “Linear models and empirical Bayes methods for assessing differential expression in microarray experiments,” Stat. Appl. Genet. Mol. Biol., 3, Article Number 3.10.2202/1544-6115.1027Search in Google Scholar PubMed
Smyth, G. K. and B. Jørgensen (2002): “Fitting Tweedie’s compound Poisson model to insurance claims data: dispersion modelling,” Astin Bull., 32, 143–157.10.2143/AST.32.1.1020Search in Google Scholar
Wedderburn, R. W. M. (1974): “Quasi-likelihood functions, generalized linear models, and the Gauss-Newton method,” Biometrika, 61, 439–447.10.1093/biomet/61.3.439Search in Google Scholar
©2017 Walter de Gruyter GmbH, Berlin/Boston