Skip to main content
Log in

On-line changepoint detection and parameter estimation with application to genomic data

  • Published:
Statistics and Computing Aims and scope Submit manuscript

Abstract

An efficient on-line changepoint detection algorithm for an important class of Bayesian product partition models has been recently proposed by Fearnhead and Liu (in J. R. Stat. Soc. B 69, 589–605, 2007). However a severe limitation of this algorithm is that it requires the knowledge of the static parameters of the model to infer the number of changepoints and their locations. We propose here an extension of this algorithm which allows us to estimate jointly on-line these static parameters using a recursive maximum likelihood estimation strategy. This particle filter type algorithm has a computational complexity which scales linearly both in the number of data and the number of particles. We demonstrate our methodology on a synthetic and two real-world datasets from RNA transcript analysis. On simulated data, it is shown that our approach outperforms standard techniques used in this context and hence has the potential to detect novel RNA transcripts.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Andrieu, C., Doucet, A., Tadic, V.: On-line parameter estimation in general state-space models. In: Proc. 44th IEEE Conference on Decision and Control (2005)

    Google Scholar 

  • Azzalini, A., Capitanio, A.: Distributions generated by perturbation of symmetry with emphasis on a multivariate skew t-distribution. J. R. Stat. Soc. B 65, 367–389 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  • Barry, D., Hartigan, J.: Product partition models for change point problems. Ann. Stat. 20, 260–279 (1992)

    Article  MathSciNet  MATH  Google Scholar 

  • Benveniste, A., Metivier, M., Priouret, P.: Adaptive Algorithms and Stochastic Approximations. Springer, Berlin (1990)

    MATH  Google Scholar 

  • Bertone, P., Stolc, V., Royce, T.E., Rozowsky, J.S., Urban, A.E., Zhu, X., Rinn, J.L., Tongprasit, W., Samanta, M., Weissman, S., Gerstein, M.B., Snyder, M.: Global identification of human transcribed sequences with genome tiling arrays. Science 306(5705), 2242–2246 (2004)

    Article  Google Scholar 

  • Bolstad, B.M., Irizarry, R.A., Astrand, M., Speed, T.P.: A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics 19(2), 185–193 (2003)

    Article  Google Scholar 

  • Carlin, B., Gelfand, A., Smith, A.: Hierarchical Bayesian analysis of changepoint problems. Appl. Stat. 41, 389–405 (1992)

    Article  MATH  Google Scholar 

  • Cheng, J., Kapranov, P., Drenkow, J., Dike, S., Brubaker, S., Patel, S., Long, J., Stern, D., Tammana, H., Helt, G., Sementchenko, V., Piccolboni, A., Bekiranov, S., Bailey, D.K., Ganesh, M., Ghosh, S., Bell, I., Gerhard, D.S., Gingeras, T.R.: Transcriptional maps of 10 human chromosomes at 5-nucleotide resolution. Science 308(5725), 1149–1154 (2005)

    Article  Google Scholar 

  • Chib, S.: Estimation and comparison of multiple change-point models. J. Econom. 86, 221–241 (1998)

    Article  MathSciNet  MATH  Google Scholar 

  • Chopin, N.: Dynamic detection of change points in long time series. Ann. Inst. Math. Sci. 59, 349–366 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  • Colella, S., Yau, C., Taylor, J., Mirza, G., Butler, H., Clouston, P., Bassett, A., Seller, A., Holmes, C., Ragoussis, J.: QuantiSNP: an objective Bayes hidden-Markov model to detect and accurately map copy number variation using SNP genotyping data. Nucleic Acids Res. 35, 2013–2025 (2007)

    Article  Google Scholar 

  • David, L., Huber, W., Granovskaia, M., Toedling, J., Palm, C.J., Bofkin, L., Jones, T., Davis, R.W., Steinmetz, L.M.: A high-resolution map of transcription in the yeast genome. Proc. Natl. Acad. Sci. USA 103(14), 5320–5325 (2006)

    Article  Google Scholar 

  • De Iorio, M., de Silva, E., Stumpf, M.: Recombination hotspots as a point process. Philos. Trans. R. Soc. B 360, 1597–1603 (2005)

    Article  Google Scholar 

  • Do, K., Muller, P., Tang, F.: A Bayesian mixture model for differential gene expression. J. R. Stat. Soc. C 54, 627–644 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  • Efron, B.: Large-scale simultaneous hypothesis testing: the choice of a null hypothesis. J. Am. Stat. Assoc. 99(465), 96–104 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  • Fearnhead, P.: MCMC, sufficient statistics and particle filter. J. Comput. Graph. Stat. 11, 848–862 (2002)

    Article  MathSciNet  Google Scholar 

  • Fearnhead, P.: Exact Bayesian curve fitting and signal segmentation. IEEE Trans. Signal Process. 53, 2160–2166 (2005)

    Article  MathSciNet  Google Scholar 

  • Fearnhead, P.: Exact and efficient Bayesian inference for multiple changepoint problems. Stat. Comput. 16, 203–213 (2006)

    Article  MathSciNet  Google Scholar 

  • Fearnhead, P., Liu, Z.: On-line inference for multiple change points problems. J. R. Stat. Soc. B 69, 589–605 (2007)

    Article  MathSciNet  Google Scholar 

  • Gentleman, R.C., Carey, V.J., Bates, D.M., Bolstad, B., Dettling, M., Dudoit, S., Ellis, B., Gautier, L., Ge, Y., Gentry, J., Hornik, K., Hothorn, T., Huber, W., Iacus, S., Irizarry, R.A., Leisch, F., Li, C., Maechler, M., Rossini, A.J., Sawitzki, G., Smith, C., Smyth, G., Tierney, L., Yang, J.Y.H., Zhang, J.: Bioconductor: open software development for computational biology and bioinformatics. Genome Biol. 5(10), R80 (2004)

    Article  Google Scholar 

  • Gottardo, R., Pannucci, J.A., Kuske, C.R., Brettin, T.S.: Statistical analysis of microarray data: a Bayesian approach. Biostatistics 4(4), 597–620 (2003)

    Article  MATH  Google Scholar 

  • Gottardo, R., Raftery, A.E., Yeung, K.Y., Bumgarner, R.E.: Bayesian robust inference for differential gene expression in microarrays with multiple samples. Biometrics 62(1), 10–18 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  • Huber, W., Toedling, J., Steinetz, L.M.: Transcript mapping with high-density oligonucleotide tiling arrays. Bioinformatics 22(16), 1963–1970 (2006)

    Article  Google Scholar 

  • Irizarry, R.A., Hobbs, B., Collin, F., Beazer-Barclay, Y.D., Antonellis, K.J., Scherf, U., Speed, T.P.: Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics 4(2), 249–264 (2003)

    Article  MATH  Google Scholar 

  • Johnson, T., Elashoff, R., Harkema, S.: A Bayesian changepoint analysis of electromyographic data: detecting muscle activation patterns and associated applications. Biostatistics 4, 143–164 (2003)

    Article  MATH  Google Scholar 

  • Kapranov, P., Cawley, S.E., Drenkow, J., Bekiranov, S., Strausberg, R.L., Fodor, S.P.A., Gingeras, T.R.: Large-scale transcriptional activity in chromosomes 21 and 22. Science 296(5569), 916–919 (2002)

    Article  Google Scholar 

  • Kass, R., Raftery, A.: Bayes factors. J. Am. Stat. Assoc. 90, 773–795 (1995)

    Article  MATH  Google Scholar 

  • Kendziorski, C.M., Newton, M.A., Lan, H., Gould, M.N.: On parametric empirical Bayes methods for comparing multiple groups using replicated gene expression profiles. Stat. Med. 22(24), 3899–3914 (2003)

    Article  Google Scholar 

  • Lockhart, D.J., Dong, H., Byrne, M.C., Follettie, M.T., Gallo, M.V., Chee, M.S., Mittmann, M., Wang, C., Kobayashi, M., Horton, H., Brown, E.L.: Expression monitoring by hybridization to high-density oligonucleotide arrays. Nat. Biotechnol. 14(13), 1675–1680 (1996)

    Article  Google Scholar 

  • Newton, M.A., Kendziorski, C.M., Richmond, C.S., Blattner, F.R., Tsui, K.W.: On differential variability of expression ratios: improving statistical inference about gene expression changes from microarray data. J. Comput. Biol. 8(1), 37–52 (2001)

    Article  Google Scholar 

  • Poyiadjis, G., Doucet, A., Singh, S.S.: Particle approximations of the score and observed information matrix in state space models with application to parameter estimation. Biometrica 98(1), 65–80 (2011)

    Article  MATH  Google Scholar 

  • Stephens, D.: Bayesian retrospective multiple-changepoint identification. Appl. Stat. 43, 159–178 (1994)

    Article  MATH  Google Scholar 

  • Xuan, X., Murphy, K.: Modeling changing dependency structure in multivariate time series. In: International Conference on Machine Learning (2007)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to François Caron.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Caron, F., Doucet, A. & Gottardo, R. On-line changepoint detection and parameter estimation with application to genomic data. Stat Comput 22, 579–595 (2012). https://doi.org/10.1007/s11222-011-9248-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11222-011-9248-x

Keywords

Navigation