Subjective p Intervals
Researchers Underestimate the Variability of p Values Over Replication
Abstract
Suppose you obtain p = .02 in an experiment, then replicate the experiment with new samples. What p value might you obtain, and what interval has an 80% chance of including that replication p? Under conservative assumptions the answer is, perhaps surprisingly (.0003, .30). The authors report three email surveys that asked authors of articles published in leading journals in psychology, medicine, or statistics to estimate such intervals. Overall response rate (7%) was low, but responses from 360 researchers gave intervals with an average 40% to 50% chance of including replication p, rather than the target 80%. Results were similar for all three disciplines. Respondents generally found the task unfamiliar and difficult. There was great variability over respondents, but almost all of them gave intervals that were too short. This widespread, and often severe, underestimation of the variability of p may help to explain why researchers place too much interpretive weight on single p values.
References
1994). The earth is round (p < .05). American Psychologist, 49, 997–1003.
(2008). Replication and p intervals: p values predict the future only vaguely, but confidence intervals do much better. Perspectives on Psychological Science, 3, 286–300.
(2007). Statistical reform in psychology: Is anything changing?. Psychological Science, 18, 230–232.
(2006). Confidence intervals and replication: Where will the next mean fall? Psychological Methods, 11, 217–227.
(2004). Replication, and researchers’ understanding of confidence intervals and standard error bars. Understanding Statistics, 3, 299–311.
(1997). On the communication of information by displays of standard errors and confidence intervals. Psychonomic Bulletin & Review, 4, 330–341.
(1959). Statistical methods and scientific inference (2nd ed.). Edinburgh, UK: Oliver and Boyd.
(n.d.) (Murdoch University, School of Chemical and Mathematical Sciences) Retrieved from www.cms.murdoch.edu.au/areas/maths/statsnotes/samplestats/quartilesmore.html.
. (1997). The behaviour of the p-value when the alternative hypothesis is true. Biometrics, 53, 11–22.
(2007). The naïve intuitive statistician: A naïve sampling model of intuitive confidence intervals. Psychological Review, 114, 678–703.
(1982). Variants of uncertainty. Cognition, 11, 143–157.
(2006). Subjective confidence and the sampling of knowledge. In , Information sampling and adaptive cognition (pp. 153–182). New York, NY: Cambridge University Press.
(2004). Beyond significance testing: Reforming data analysis methods in behavioral research. Washington, DC: American Psychological Association.
(2001). Null hypothesis significance testing. On the survival of a flawed method. American Psychologist, 56, 16–26.
(1986). Interpretation of significance levels and effect sizes by psychological researchers. American Psychologist, 41, 1299–1301.
(1986). Statistical inference: A commentary for the social and behavioural sciences. Chichester, UK: Wiley.
(2001). Interpretation of significance levels by psychological researchers: The .05 cliff effect may be overstated. Psychonomic Bulletin & Review, 8, 847–850.
(1963). The interpretation of levels of significance by psychological researchers. Journal of Psychology, 55, 33–38.
(1964). Further evidence for the cliff effect in the interpretation of levels of significance. Psychological Reports, 15, 570.
(1989). Statistical procedures and the justification of knowledge in psychological science. American Psychologist, 44, 1276–1284.
(2004). Overconfidence in interval estimates. Journal of Experimental Psychology: Learning, Memory, and Cognition, 30, 299–314.
(2010). Reducing overconfidence in the interval judgments of experts. Risk Analysis, 30, 512–523.
(2005). When 90% confidence intervals are 50% certain: On the credibility of credible intervals. Applied Cognitive Psychology, 19, 455–475.
(1971). Belief in the law of small numbers. Psychological Bulletin, 92, 105–110.
(2004). Subjective probability intervals: How to reduce overconfidence by interval evaluation. Journal of Experimental Psychology: Learning, Memory, and Cognition, 30, 1167–1175.
(1997). Elimination and inclusion procedures in judgment. Journal of Behavioral Decision Making, 10, 211–220.
(