research-article

Patterns from multiresolution 0-1 data

Authors:
Prem Raj Adhikari

Aalto University School of Science and Technology, Aalto

Aalto University School of Science and Technology, Aalto
View Profile

,
Jaakko Hollmén

Aalto University School of Science and Technology, Aalto

Aalto University School of Science and Technology, Aalto
View Profile

UP '10: Proceedings of the ACM SIGKDD Workshop on Useful PatternsJuly 2010Pages 8–16https://doi.org/10.1145/1816112.1816115

Published:25 July 2010Publication History

UP '10: Proceedings of the ACM SIGKDD Workshop on Useful Patterns

Pages 8–16

ABSTRACT

Biological systems are complex systems and often the biological data is available in different resolutions. Computational algorithms are often designed to work with only specific resolution of data. Hence, upsampling or downsampling is necessary before the data can be fed to the algorithm. Moreover, high-resolution data incorporates significant amount of noise thus producing explosion of redundant patterns such as maximal frequent itemset, closed frequent itemset and non-derivable itemset in the data which can be solved by downsampling the data if the information loss is insignificant during sampling. Furthermore, comparing the results of an algorithm on data in different resolution can produce interesting results which aids in determining suitable resolution of data. In addition, experiments in different resolutions can be helpful in determining the appropriate resolution for computational methods. In this paper, three methods of downsampling are proposed, implemented and experiments are performed on different resolutions and the suitability of the proposed methods are validated and the results compared. Mixture models are trained on the data and the results are analyzed and it was seen that the proposed methods produce plausible results showing that the significant patterns in the data are retained in lower resolution. The proposed methods can be extensively used in integration of databases.

References

L. G. Shaffer and N. Tommerup. ISCN 2005: An International System for Human Cytogenetic Nomenclature (2005) Recommendations of the International Standing Committee on Human Cytogenetic Nomenclature. Karger, 2005.Google Scholar
A. Kallioniemi, O. P. Kallioniemi, D. Sudar, D. Rutovitz, J. W. Gray, F. Waldman, and D. Pinkel. Comparative genomic hybridization for molecular cytogenetic analysis of solid tumors. SCIENCE, 258(5083):818--821, OCT 30 1992.Google ScholarCross Ref
D. Pinkel, R. Segraves, D. Sudar, S. Clark, I. Poole, D. Kowbel, C. Collins, W. L. Kuo, C. Chen, Y. Zhai, S. H. Dairkee, B. M. Ljung, J. W. Gray, and D. G. Albertson. High resolution analysis of DNA copy number variation using comparative genomic hybridization to microarrays. Nature Genetics, 20: 207--211, 1998.Google ScholarCross Ref
I. K. Fodor. A survey of dimension reduction techniques. Technical report, U.S. Department of Energy, June 2002.Google Scholar
R. Agrawal, T. Imieliński, and A. Swami. Mining association rules between sets of items in large databases. In SIGMOD '93: Proceedings of the 1993 ACM SIGMOD international conference on Management of data, pages 207--216, New York, NY, USA, 1993. ACM. Google ScholarDigital Library
H. Mannila, H. Toivonen, and A. I. Verkamo. Efficient algorithms for discovering association rules. In Usama M. Fayyad and Ramasamy Uthurusamy, editors, AAAI Workshop on Knowledge Discovery in Databases (KDD-94), pages 181--192, Seattle, Washington, 1994. AAAI Press.Google Scholar
Arianna Gallo, Pauli Miettinen, and Heikki Mannila. Finding subgroups having several descriptions: Algorithms for redescription mining. In SDM, pages 334--345, 2008.Google ScholarCross Ref
Doug Burdick, Manuel Calimlim, and Johannes Gehrke. Mafia: A maximal frequent itemset algorithm for transactional databases. In In ICDE, pages 443--452, 2001. Google ScholarDigital Library
J. R. Pollack, C. M. Perou, A. A. Alizadeh, M. B. Eisen, A. Pergamenschikov, C. F. Williams, S. S. Jeffrey, D. Botstein, and P. O. Brown. Genome-wide analysis of dna copy-number changes using cdna microarrays. Nature Genetics, 23(1):41--46, 1999.Google ScholarCross Ref
S. Knuutila, Y. Aalto, K. Autio, A. Björkqvist, W. El-Rifai, Hemmer S., T. Huhta, E. Kettunen, S. Kiuru-Kuhlefelt, M. L. Larramendy, T Lushnikova, O. Monni, H. Pere, J. Tapper, M. Tarkkanen, A. Varis, V. Wasenius, M. Wolf, and Y. Zhu. Dna copy number losses in human neoplasms. Gynecologic Oncology, 155(2):683--694, 1999.Google Scholar
S. Myllykangas, J. Himberg, T. Böhling, B. Nagy, J. Hollmén, and S. Knuutila. DNA copy number amplification profiling of human neoplasms. Oncogene, 25(55):7324--7332, 2006.Google ScholarCross Ref
S. Myllykangas, J. Tikka, T. Böhling, S. Knuutila, and J. Hollmén. Classification of human cancers based on DNA copy number amplification modeling. BMC Medical Genomics, 1:15, 2008.Google ScholarCross Ref
J. Tikka, J. Hollmén, and S. Myllykangas. Mixture modeling of DNA copy number amplification patterns in cancer. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 4507 LNCS:972--979, 2007. Google ScholarDigital Library
J. Hollmén and J. Tikka. Compact and understandable descriptions of mixtures of bernoulli distributions. Lecture Notes in Computer Science including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics, 4723 LNCS:1--12, 2007. Google ScholarDigital Library
P. M. V. Rancoita, M. Hutter, F. Bertoni, and I. Kwee. Bayesian DNA copy number analysis. BMC Bioinformatics, 10, 2009.Google ScholarCross Ref
B. D'haene, J. Vandesompele, and J. Hellemans. Accurate and objective copy number profiling using real-time quantitative PCR. Methods, 50(4):262--270, 2010.Google ScholarCross Ref
E. Despierre, D. Lambrechts, P. Neven, F. Amant, S. Lambrechts, and I. Vergote. The molecular genetic basis of ovarian cancer and its roadmap towards a better treatment. Gynecologic Oncology, 117(2):358--365, 2010.Google ScholarCross Ref
L. Wall. Perl: Practical Extraction and Report Language. Website, 1987. http://www.perl.org/: Last Accessed: 15 Mar 2010.Google Scholar
National Center for Biotechnology Information. Human genome project. Website, February 2010. http://www.ncbi.nlm.nih.gov/projects/mapview/ Last Accessed: 5 Feb 2010.Google Scholar
G. J. McLachlan and D. Peel. Finite mixture models, volume 299 of Probability and Statistics -- Applied Probability and Statistics Section. Wiley, New York, 2000.Google Scholar
B. S. Everitt and D. J. Hand. Finite mixture distributions. Chapman and Hall, 1981.Google ScholarCross Ref
C. M. Bishop. Pattern Recognition and Machine Learning (Information Science and Statistics). Springer-Verlag New York, Inc., Secaucus, NJ, USA, 1st ed. 2006. corr. 2nd printing edition, October 2007. Google ScholarDigital Library
S. Geisser. A predictive approach to the random effect model. Biometrika, 61(1):101--107, 1974.Google ScholarCross Ref
F. Monsteller and J. Tukey. Data analysis including statistics. In Lindzey G. and Aronson E., editors, Handbook of Social Psychology, Vol-2, Addison-Wesley, 1968.Google Scholar
A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete data via the EM algorithm. Journal Of The Royal Statistical Society, Series B, 39(1):1--38, 1977.Google Scholar
J. H. Wolfe. Pattern clustering by multivariate mixture analysis. Multivariate Behavioral Research, 5:329--350, 1970.Google ScholarCross Ref
J. Hollmén. BernoulliMix: Program package for finite mixture models of multivariate Bernoulli distributions, May 2009. Freely available in http://www.cis.hut.fi/jHollmen/BernoulliMix/.Google Scholar
Mathworks. Matlab: the language of technical computing. Website, 1994. http://www.mathworks.com/products/matlab/: Last Accessed: 15 Mar 2010.Google Scholar
G. W. Stewart. Matrix Algorithms: Volume 1, Basic Decompositions. Society for Industrial Mathematics, 1998.Google Scholar
S. D. Gay. Datamining in proteomics: extracting knowledge from peptide mass fingerprinting spectra. PhD thesis, University of Geneva, Geneva, 2002.Google Scholar
G. J. Mclachlan and T. Krishnan. The EM Algorithm and Extensions. Wiley-Interscience, 1 edition, November 1996.Google Scholar
W. Powell. Approximate Dynamic Programming: Solving the Curses of Dimensionality. Wiley, 2007. Google ScholarDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
UP '10: Proceedings of the ACM SIGKDD Workshop on Useful Patterns
July 2010
82 pages
ISBN:9781450302166
DOI:10.1145/1816112
General Chairs:
Bart Goethals
Universiteit Antwerpen
,
Nikolaj Tatti
Universiteit Antwerpen
,
Jilles Vreeken
Universiteit Antwerpen
Copyright © 2010 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 25 July 2010
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
binary data
downsampling
mixture models
multiple resolutions
upsampling
Qualifiers
- research-article
Conference
Upcoming Conference
KDD '24

Sponsor:

sigkdd

sigkdd

The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 25 - 29, 2024

Barcelona , Spain
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 7
  Total Citations
  View Citations
- 207
  Total Downloads
- Downloads (Last 12 months)0
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader