ABSTRACT
We present algorithms for time-series gene expression analysis that permit the principled estimation of unobserved time-points, clustering, and dataset alignment. Each expression profile is modeled as a cubic spline (piecewise polynomial) that is estimated from the observed data and every time point influences the overall smooth expression curve. We constrain the spline coefficients of genes in the same class to have similar expression patterns, while also allowing for gene specific parameters. We show that unobserved time-points can be reconstructed using our method with 10-15% less error when compared to previous best methods. Our clustering algorithm operates directly on the continuous representations of gene expression profiles, and we demonstrate that this is particularly effective when applied to non-uniformly sampled data. Our continuous alignment algorithm also avoids difficulties encountered by discrete approaches. In particular, our method allows for control of the number of degrees of freedom of the warp through the specification of parameterized functions, which helps to avoid overfitting. We demonstrate that our algorithm produces stable low-error alignments on real expression data and further show a specific application to yeast knockout data that produces biologically meaningful results.
- J. Aach and G. M. Church. Aligning gene expression time series with time warping algorithms. Bioinformatics, 17:495--508, 2001.Google ScholarCross Ref
- B. Brumback and J. Rice. Smoothing spline models for the analysis of nested and crossed samples of curves. Am. Statist. Assoc., 93:961--976, 1998.Google ScholarCross Ref
- S. Chu, J. DeRisi, and et al. The transcriptional program of sporulation in budding yeast. Science, 282:699--705, 1998.Google ScholarCross Ref
- C. de Boor. A practical guide to splines. Springer, 1978.Google ScholarCross Ref
- L. Deng, M. Aksmanovic, D. X. Sun, and C. F. J. X. Wu. Recognition using hidden markov models with polynomial regression functions as nonstationary states. IEEE Transactions on Speech and Audio Processing, 2:507--520, 1994.Google ScholarCross Ref
- P. D'haeseleer, X. Wen, S. Fuhrman, and R. Somogyi. Linear modeling of mrna expression levels during cns development and injury. In PSB99, 1999.Google Scholar
- M.B. Eisen, P.T. Spellman, P.O. Brown, and D. Botstein. Cluster analysis and display of genome-wide expression patterns. PNAS, 95:14863--14868, 1998.Google ScholarCross Ref
- N. Friedman, M. Linial, I. Nachman, and D. Pe'er. Using bayesian network to analyze expression data. In RECOMB, 2000. Google ScholarDigital Library
- N. S. Holter, A. Maritan, and et al. Dynamic modeling of gene expression data. PNAS, 98:1693--1698, 2001.Google ScholarCross Ref
- G. James and T. Hastie. Functional linear discriminant analysis for irregularly sampled curves. Journal of the Royal Statistical Society, to appear, 2001.Google Scholar
- S. H Neal, M. Madhusmita, and et al. Fundamental patterns underlying gene expression profiles: Simplicity from complexity. PNAS, 97:8409--8414, 2000.Google ScholarCross Ref
- Sharan R. and Shamir R. Algorithmic approaches to clustering gene expression data. Current Topics in Computational Biology, To appear.Google Scholar
- I. Simon, J. Barnett, and et al. Serial regulation of transcriptional regulators in the yeast cell cycle. Cell, 106:697--708, 2001.Google ScholarCross Ref
- T. S. Spellman, G Sherlock, and et al. Comprehensive identification of cell cycle-regulated genes of the yeast saccharomyces cerevisia by microarray hybridization. Mol. Biol. of the Cell, 9:3273--3297, 1998.Google ScholarCross Ref
- O. Troyanskaya, M. Cantor, and et al. Missing value estimation methods for dna microarrays. Bioinformatics, 17:520--525, 2001.Google ScholarCross Ref
- P. Viola. Alignment by Maximization of Mutual Information. PhD thesis, MIT AI Lab, 1995. Google ScholarDigital Library
- L. P. Zhao, R. Prentice, and L. Breeden. Statistical modeling of large microarray data sets to identify stimulus-response profiles. PNAS, 98:5631--5636, 2001.Google ScholarCross Ref
- G. Zhu, Spellman T. S., and et al. Two yeast forkhead genes regulate cell cycle and pseudohyphal growth. Nature, 406:90--94, 2000.Google ScholarCross Ref
Index Terms
- A new approach to analyzing gene expression time series data
Recommendations
Identifying time-lagged gene clusters using gene expression data
Motivation: Analysis of gene expression data can provide insights into the time-lagged co-regulation of genes/gene clusters. However, existing methods such as the Event Method and the Edge Detection Method are inefficient as they compare only two ...
Phase-Wise Clustering of Time Series Gene Expression Data
TRUSTCOM '11: Proceedings of the 2011IEEE 10th International Conference on Trust, Security and Privacy in Computing and CommunicationsExtensive studies have shown that analyzing micro array time series data is important in bioinformatics research and biomedical applications. An observation in the analysis of gene expression data is that many genes have similarity in their expression ...
Creating gene set activity profiles with time-series expression data
The use of predefined gene sets has become crucial in the interpretation of genomewide expression data. A limitation of the existing techniques that relate gene expression levels to gene sets is that they cannot readily be applied to time-course ...
Comments