Abstract
An efficient on-line changepoint detection algorithm for an important class of Bayesian product partition models has been recently proposed by Fearnhead and Liu (in J. R. Stat. Soc. B 69, 589–605, 2007). However a severe limitation of this algorithm is that it requires the knowledge of the static parameters of the model to infer the number of changepoints and their locations. We propose here an extension of this algorithm which allows us to estimate jointly on-line these static parameters using a recursive maximum likelihood estimation strategy. This particle filter type algorithm has a computational complexity which scales linearly both in the number of data and the number of particles. We demonstrate our methodology on a synthetic and two real-world datasets from RNA transcript analysis. On simulated data, it is shown that our approach outperforms standard techniques used in this context and hence has the potential to detect novel RNA transcripts.
Similar content being viewed by others
References
Andrieu, C., Doucet, A., Tadic, V.: On-line parameter estimation in general state-space models. In: Proc. 44th IEEE Conference on Decision and Control (2005)
Azzalini, A., Capitanio, A.: Distributions generated by perturbation of symmetry with emphasis on a multivariate skew t-distribution. J. R. Stat. Soc. B 65, 367–389 (2003)
Barry, D., Hartigan, J.: Product partition models for change point problems. Ann. Stat. 20, 260–279 (1992)
Benveniste, A., Metivier, M., Priouret, P.: Adaptive Algorithms and Stochastic Approximations. Springer, Berlin (1990)
Bertone, P., Stolc, V., Royce, T.E., Rozowsky, J.S., Urban, A.E., Zhu, X., Rinn, J.L., Tongprasit, W., Samanta, M., Weissman, S., Gerstein, M.B., Snyder, M.: Global identification of human transcribed sequences with genome tiling arrays. Science 306(5705), 2242–2246 (2004)
Bolstad, B.M., Irizarry, R.A., Astrand, M., Speed, T.P.: A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics 19(2), 185–193 (2003)
Carlin, B., Gelfand, A., Smith, A.: Hierarchical Bayesian analysis of changepoint problems. Appl. Stat. 41, 389–405 (1992)
Cheng, J., Kapranov, P., Drenkow, J., Dike, S., Brubaker, S., Patel, S., Long, J., Stern, D., Tammana, H., Helt, G., Sementchenko, V., Piccolboni, A., Bekiranov, S., Bailey, D.K., Ganesh, M., Ghosh, S., Bell, I., Gerhard, D.S., Gingeras, T.R.: Transcriptional maps of 10 human chromosomes at 5-nucleotide resolution. Science 308(5725), 1149–1154 (2005)
Chib, S.: Estimation and comparison of multiple change-point models. J. Econom. 86, 221–241 (1998)
Chopin, N.: Dynamic detection of change points in long time series. Ann. Inst. Math. Sci. 59, 349–366 (2007)
Colella, S., Yau, C., Taylor, J., Mirza, G., Butler, H., Clouston, P., Bassett, A., Seller, A., Holmes, C., Ragoussis, J.: QuantiSNP: an objective Bayes hidden-Markov model to detect and accurately map copy number variation using SNP genotyping data. Nucleic Acids Res. 35, 2013–2025 (2007)
David, L., Huber, W., Granovskaia, M., Toedling, J., Palm, C.J., Bofkin, L., Jones, T., Davis, R.W., Steinmetz, L.M.: A high-resolution map of transcription in the yeast genome. Proc. Natl. Acad. Sci. USA 103(14), 5320–5325 (2006)
De Iorio, M., de Silva, E., Stumpf, M.: Recombination hotspots as a point process. Philos. Trans. R. Soc. B 360, 1597–1603 (2005)
Do, K., Muller, P., Tang, F.: A Bayesian mixture model for differential gene expression. J. R. Stat. Soc. C 54, 627–644 (2005)
Efron, B.: Large-scale simultaneous hypothesis testing: the choice of a null hypothesis. J. Am. Stat. Assoc. 99(465), 96–104 (2004)
Fearnhead, P.: MCMC, sufficient statistics and particle filter. J. Comput. Graph. Stat. 11, 848–862 (2002)
Fearnhead, P.: Exact Bayesian curve fitting and signal segmentation. IEEE Trans. Signal Process. 53, 2160–2166 (2005)
Fearnhead, P.: Exact and efficient Bayesian inference for multiple changepoint problems. Stat. Comput. 16, 203–213 (2006)
Fearnhead, P., Liu, Z.: On-line inference for multiple change points problems. J. R. Stat. Soc. B 69, 589–605 (2007)
Gentleman, R.C., Carey, V.J., Bates, D.M., Bolstad, B., Dettling, M., Dudoit, S., Ellis, B., Gautier, L., Ge, Y., Gentry, J., Hornik, K., Hothorn, T., Huber, W., Iacus, S., Irizarry, R.A., Leisch, F., Li, C., Maechler, M., Rossini, A.J., Sawitzki, G., Smith, C., Smyth, G., Tierney, L., Yang, J.Y.H., Zhang, J.: Bioconductor: open software development for computational biology and bioinformatics. Genome Biol. 5(10), R80 (2004)
Gottardo, R., Pannucci, J.A., Kuske, C.R., Brettin, T.S.: Statistical analysis of microarray data: a Bayesian approach. Biostatistics 4(4), 597–620 (2003)
Gottardo, R., Raftery, A.E., Yeung, K.Y., Bumgarner, R.E.: Bayesian robust inference for differential gene expression in microarrays with multiple samples. Biometrics 62(1), 10–18 (2006)
Huber, W., Toedling, J., Steinetz, L.M.: Transcript mapping with high-density oligonucleotide tiling arrays. Bioinformatics 22(16), 1963–1970 (2006)
Irizarry, R.A., Hobbs, B., Collin, F., Beazer-Barclay, Y.D., Antonellis, K.J., Scherf, U., Speed, T.P.: Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics 4(2), 249–264 (2003)
Johnson, T., Elashoff, R., Harkema, S.: A Bayesian changepoint analysis of electromyographic data: detecting muscle activation patterns and associated applications. Biostatistics 4, 143–164 (2003)
Kapranov, P., Cawley, S.E., Drenkow, J., Bekiranov, S., Strausberg, R.L., Fodor, S.P.A., Gingeras, T.R.: Large-scale transcriptional activity in chromosomes 21 and 22. Science 296(5569), 916–919 (2002)
Kass, R., Raftery, A.: Bayes factors. J. Am. Stat. Assoc. 90, 773–795 (1995)
Kendziorski, C.M., Newton, M.A., Lan, H., Gould, M.N.: On parametric empirical Bayes methods for comparing multiple groups using replicated gene expression profiles. Stat. Med. 22(24), 3899–3914 (2003)
Lockhart, D.J., Dong, H., Byrne, M.C., Follettie, M.T., Gallo, M.V., Chee, M.S., Mittmann, M., Wang, C., Kobayashi, M., Horton, H., Brown, E.L.: Expression monitoring by hybridization to high-density oligonucleotide arrays. Nat. Biotechnol. 14(13), 1675–1680 (1996)
Newton, M.A., Kendziorski, C.M., Richmond, C.S., Blattner, F.R., Tsui, K.W.: On differential variability of expression ratios: improving statistical inference about gene expression changes from microarray data. J. Comput. Biol. 8(1), 37–52 (2001)
Poyiadjis, G., Doucet, A., Singh, S.S.: Particle approximations of the score and observed information matrix in state space models with application to parameter estimation. Biometrica 98(1), 65–80 (2011)
Stephens, D.: Bayesian retrospective multiple-changepoint identification. Appl. Stat. 43, 159–178 (1994)
Xuan, X., Murphy, K.: Modeling changing dependency structure in multivariate time series. In: International Conference on Machine Learning (2007)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Caron, F., Doucet, A. & Gottardo, R. On-line changepoint detection and parameter estimation with application to genomic data. Stat Comput 22, 579–595 (2012). https://doi.org/10.1007/s11222-011-9248-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11222-011-9248-x