Original Article

Subjective p Intervals

Researchers Underestimate the Variability of p Values Over Replication

Jerry Lai

School of Psychological Science, La Trobe University, Victoria, Australia

Search for more papers by this author

Fiona Fidler

School of Psychological Science, La Trobe University, Victoria, Australia

Search for more papers by this author

, and

Geoff Cumming

School of Psychological Science, La Trobe University, Victoria, Australia

Search for more papers by this author

Published Online:January 01, 2012https://doi.org/10.1027/1614-2241/a000037

Abstract

Suppose you obtain p = .02 in an experiment, then replicate the experiment with new samples. What p value might you obtain, and what interval has an 80% chance of including that replication p? Under conservative assumptions the answer is, perhaps surprisingly (.0003, .30). The authors report three email surveys that asked authors of articles published in leading journals in psychology, medicine, or statistics to estimate such intervals. Overall response rate (7%) was low, but responses from 360 researchers gave intervals with an average 40% to 50% chance of including replication p, rather than the target 80%. Results were similar for all three disciplines. Respondents generally found the task unfamiliar and difficult. There was great variability over respondents, but almost all of them gave intervals that were too short. This widespread, and often severe, underestimation of the variability of p may help to explain why researchers place too much interpretive weight on single p values.

References

Cohen, J. (1994). The earth is round (p < .05). American Psychologist, 49, 997–1003. First citation in article Crossref, Google Scholar
Cumming, G. (2008). Replication and p intervals: p values predict the future only vaguely, but confidence intervals do much better. Perspectives on Psychological Science, 3, 286–300. First citation in article Crossref, Google Scholar
Cumming, G. , Fidler, F. , Leonard, M. , Kalinowski, P. , Christiansen, A. , Kleinig, A. , …, Wilson, S. (2007). Statistical reform in psychology: Is anything changing?. Psychological Science, 18, 230–232. First citation in article Crossref, Google Scholar
Cumming, G. , Maillardet, R. (2006). Confidence intervals and replication: Where will the next mean fall? Psychological Methods, 11, 217–227. First citation in article Crossref, Google Scholar
Cumming, G. , Williams, J. , Fidler, F. (2004). Replication, and researchers’ understanding of confidence intervals and standard error bars. Understanding Statistics, 3, 299–311. First citation in article Crossref, Google Scholar
Estes, W. K. (1997). On the communication of information by displays of standard errors and confidence intervals. Psychonomic Bulletin & Review, 4, 330–341. First citation in article Crossref, Google Scholar
Fisher, R. A. (1959). Statistical methods and scientific inference (2nd ed.). Edinburgh, UK: Oliver and Boyd. First citation in article Google Scholar
How to calculate a quartile, if you must . (n.d.) (Murdoch University, School of Chemical and Mathematical Sciences) Retrieved from www.cms.murdoch.edu.au/areas/maths/statsnotes/samplestats/quartilesmore.html. First citation in article Google Scholar
Hung, H. M. J. , O’Neill, R. T. , Bauer, P. , Köhne, K. (1997). The behaviour of the p-value when the alternative hypothesis is true. Biometrics, 53, 11–22. First citation in article Google Scholar
Juslin, P. , Winman, A. , Hansson, P. (2007). The naïve intuitive statistician: A naïve sampling model of intuitive confidence intervals. Psychological Review, 114, 678–703. First citation in article Crossref, Google Scholar
Kahneman, D. , Tversky, A. (1982). Variants of uncertainty. Cognition, 11, 143–157. First citation in article Crossref, Google Scholar
Klayman, J. , Soll, J. B. , Juslin, P. , & Winman, A. (2006). Subjective confidence and the sampling of knowledge. In K. Fiedler, P. Juslin, (Eds.), Information sampling and adaptive cognition (pp. 153–182). New York, NY: Cambridge University Press. First citation in article Google Scholar
Kline, R. B. (2004). Beyond significance testing: Reforming data analysis methods in behavioral research. Washington, DC: American Psychological Association. First citation in article Crossref, Google Scholar
Krueger, J. (2001). Null hypothesis significance testing. On the survival of a flawed method. American Psychologist, 56, 16–26. First citation in article Google Scholar
Nelson, N. , Rosenthal, R. , Rosnow, R. L. (1986). Interpretation of significance levels and effect sizes by psychological researchers. American Psychologist, 41, 1299–1301. First citation in article Crossref, Google Scholar
Oakes, M. W. (1986). Statistical inference: A commentary for the social and behavioural sciences. Chichester, UK: Wiley. First citation in article Google Scholar
Poitevineau, J. , Lecoutre, B. (2001). Interpretation of significance levels by psychological researchers: The .05 cliff effect may be overstated. Psychonomic Bulletin & Review, 8, 847–850. First citation in article Crossref, Google Scholar
Rosenthal, R. , Gaito, J. (1963). The interpretation of levels of significance by psychological researchers. Journal of Psychology, 55, 33–38. First citation in article Crossref, Google Scholar
Rosenthal, R. , Gaito, J. (1964). Further evidence for the cliff effect in the interpretation of levels of significance. Psychological Reports, 15, 570. First citation in article Google Scholar
Rosnow, R. L. , Rosenthal, R. (1989). Statistical procedures and the justification of knowledge in psychological science. American Psychologist, 44, 1276–1284. First citation in article Crossref, Google Scholar
Soll, J. B. , Klayman, J. (2004). Overconfidence in interval estimates. Journal of Experimental Psychology: Learning, Memory, and Cognition, 30, 299–314. First citation in article Crossref, Google Scholar
Speirs-Bridge, A. , Fidler, F. , McBride, M. , Flander, L. , Cumming, G. , Burgman, M. (2010). Reducing overconfidence in the interval judgments of experts. Risk Analysis, 30, 512–523. First citation in article Crossref, Google Scholar
Teigen, K. H. , Jørgensen, M. (2005). When 90% confidence intervals are 50% certain: On the credibility of credible intervals. Applied Cognitive Psychology, 19, 455–475. First citation in article Crossref, Google Scholar
Tversky, A. , Kahneman, D. (1971). Belief in the law of small numbers. Psychological Bulletin, 92, 105–110. First citation in article Crossref, Google Scholar
Winman, A. , Hansson, P. , & Juslin, P. (2004). Subjective probability intervals: How to reduce overconfidence by interval evaluation. Journal of Experimental Psychology: Learning, Memory, and Cognition, 30, 1167–1175. First citation in article Crossref, Google Scholar
Yaniv, I. , Schul, Y. (1997). Elimination and inclusion procedures in judgment. Journal of Behavioral Decision Making, 10, 211–220. First citation in article Crossref, Google Scholar

Volume 8Issue 2August 2012

ISSN: 1614-1881eISSN: 1614-2241

History

AcceptedJanuary 6, 2011

Licenses & Copyright

Keywords

PDF download

Verify Phone

Congrats!

Subjective p Intervals

Researchers Underestimate the Variability of p Values Over Replication

Abstract

References

History

Licenses & Copyright

Support & Contact

Support & Contact

Legal information

Legal information

More offers

More offers

Our partners

Our partners

Change Password

Your password must have 8 characters or more and contain 3 of the following:

Password Changed Successfully

Create a new account

Request Username

Verify Phone

Congrats!

Subjective p Intervals

Researchers Underestimate the Variability of p Values Over Replication

Abstract

References

History

Licenses & Copyright

Support & Contact

Support & Contact

Legal information

Legal information

More offers

More offers

Our partners

Our partners