skip to main content
10.1145/1816112.1816115acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Patterns from multiresolution 0-1 data

Published:25 July 2010Publication History

ABSTRACT

Biological systems are complex systems and often the biological data is available in different resolutions. Computational algorithms are often designed to work with only specific resolution of data. Hence, upsampling or downsampling is necessary before the data can be fed to the algorithm. Moreover, high-resolution data incorporates significant amount of noise thus producing explosion of redundant patterns such as maximal frequent itemset, closed frequent itemset and non-derivable itemset in the data which can be solved by downsampling the data if the information loss is insignificant during sampling. Furthermore, comparing the results of an algorithm on data in different resolution can produce interesting results which aids in determining suitable resolution of data. In addition, experiments in different resolutions can be helpful in determining the appropriate resolution for computational methods. In this paper, three methods of downsampling are proposed, implemented and experiments are performed on different resolutions and the suitability of the proposed methods are validated and the results compared. Mixture models are trained on the data and the results are analyzed and it was seen that the proposed methods produce plausible results showing that the significant patterns in the data are retained in lower resolution. The proposed methods can be extensively used in integration of databases.

References

  1. L. G. Shaffer and N. Tommerup. ISCN 2005: An International System for Human Cytogenetic Nomenclature (2005) Recommendations of the International Standing Committee on Human Cytogenetic Nomenclature. Karger, 2005.Google ScholarGoogle Scholar
  2. A. Kallioniemi, O. P. Kallioniemi, D. Sudar, D. Rutovitz, J. W. Gray, F. Waldman, and D. Pinkel. Comparative genomic hybridization for molecular cytogenetic analysis of solid tumors. SCIENCE, 258(5083):818--821, OCT 30 1992.Google ScholarGoogle ScholarCross RefCross Ref
  3. D. Pinkel, R. Segraves, D. Sudar, S. Clark, I. Poole, D. Kowbel, C. Collins, W. L. Kuo, C. Chen, Y. Zhai, S. H. Dairkee, B. M. Ljung, J. W. Gray, and D. G. Albertson. High resolution analysis of DNA copy number variation using comparative genomic hybridization to microarrays. Nature Genetics, 20: 207--211, 1998.Google ScholarGoogle ScholarCross RefCross Ref
  4. I. K. Fodor. A survey of dimension reduction techniques. Technical report, U.S. Department of Energy, June 2002.Google ScholarGoogle Scholar
  5. R. Agrawal, T. Imieliński, and A. Swami. Mining association rules between sets of items in large databases. In SIGMOD '93: Proceedings of the 1993 ACM SIGMOD international conference on Management of data, pages 207--216, New York, NY, USA, 1993. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. H. Mannila, H. Toivonen, and A. I. Verkamo. Efficient algorithms for discovering association rules. In Usama M. Fayyad and Ramasamy Uthurusamy, editors, AAAI Workshop on Knowledge Discovery in Databases (KDD-94), pages 181--192, Seattle, Washington, 1994. AAAI Press.Google ScholarGoogle Scholar
  7. Arianna Gallo, Pauli Miettinen, and Heikki Mannila. Finding subgroups having several descriptions: Algorithms for redescription mining. In SDM, pages 334--345, 2008.Google ScholarGoogle ScholarCross RefCross Ref
  8. Doug Burdick, Manuel Calimlim, and Johannes Gehrke. Mafia: A maximal frequent itemset algorithm for transactional databases. In In ICDE, pages 443--452, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. J. R. Pollack, C. M. Perou, A. A. Alizadeh, M. B. Eisen, A. Pergamenschikov, C. F. Williams, S. S. Jeffrey, D. Botstein, and P. O. Brown. Genome-wide analysis of dna copy-number changes using cdna microarrays. Nature Genetics, 23(1):41--46, 1999.Google ScholarGoogle ScholarCross RefCross Ref
  10. S. Knuutila, Y. Aalto, K. Autio, A. Björkqvist, W. El-Rifai, Hemmer S., T. Huhta, E. Kettunen, S. Kiuru-Kuhlefelt, M. L. Larramendy, T Lushnikova, O. Monni, H. Pere, J. Tapper, M. Tarkkanen, A. Varis, V. Wasenius, M. Wolf, and Y. Zhu. Dna copy number losses in human neoplasms. Gynecologic Oncology, 155(2):683--694, 1999.Google ScholarGoogle Scholar
  11. S. Myllykangas, J. Himberg, T. Böhling, B. Nagy, J. Hollmén, and S. Knuutila. DNA copy number amplification profiling of human neoplasms. Oncogene, 25(55):7324--7332, 2006.Google ScholarGoogle ScholarCross RefCross Ref
  12. S. Myllykangas, J. Tikka, T. Böhling, S. Knuutila, and J. Hollmén. Classification of human cancers based on DNA copy number amplification modeling. BMC Medical Genomics, 1:15, 2008.Google ScholarGoogle ScholarCross RefCross Ref
  13. J. Tikka, J. Hollmén, and S. Myllykangas. Mixture modeling of DNA copy number amplification patterns in cancer. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 4507 LNCS:972--979, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. J. Hollmén and J. Tikka. Compact and understandable descriptions of mixtures of bernoulli distributions. Lecture Notes in Computer Science including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics, 4723 LNCS:1--12, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. P. M. V. Rancoita, M. Hutter, F. Bertoni, and I. Kwee. Bayesian DNA copy number analysis. BMC Bioinformatics, 10, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  16. B. D'haene, J. Vandesompele, and J. Hellemans. Accurate and objective copy number profiling using real-time quantitative PCR. Methods, 50(4):262--270, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  17. E. Despierre, D. Lambrechts, P. Neven, F. Amant, S. Lambrechts, and I. Vergote. The molecular genetic basis of ovarian cancer and its roadmap towards a better treatment. Gynecologic Oncology, 117(2):358--365, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  18. L. Wall. Perl: Practical Extraction and Report Language. Website, 1987. http://www.perl.org/: Last Accessed: 15 Mar 2010.Google ScholarGoogle Scholar
  19. National Center for Biotechnology Information. Human genome project. Website, February 2010. http://www.ncbi.nlm.nih.gov/projects/mapview/ Last Accessed: 5 Feb 2010.Google ScholarGoogle Scholar
  20. G. J. McLachlan and D. Peel. Finite mixture models, volume 299 of Probability and Statistics -- Applied Probability and Statistics Section. Wiley, New York, 2000.Google ScholarGoogle Scholar
  21. B. S. Everitt and D. J. Hand. Finite mixture distributions. Chapman and Hall, 1981.Google ScholarGoogle ScholarCross RefCross Ref
  22. C. M. Bishop. Pattern Recognition and Machine Learning (Information Science and Statistics). Springer-Verlag New York, Inc., Secaucus, NJ, USA, 1st ed. 2006. corr. 2nd printing edition, October 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. S. Geisser. A predictive approach to the random effect model. Biometrika, 61(1):101--107, 1974.Google ScholarGoogle ScholarCross RefCross Ref
  24. F. Monsteller and J. Tukey. Data analysis including statistics. In Lindzey G. and Aronson E., editors, Handbook of Social Psychology, Vol-2, Addison-Wesley, 1968.Google ScholarGoogle Scholar
  25. A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete data via the EM algorithm. Journal Of The Royal Statistical Society, Series B, 39(1):1--38, 1977.Google ScholarGoogle Scholar
  26. J. H. Wolfe. Pattern clustering by multivariate mixture analysis. Multivariate Behavioral Research, 5:329--350, 1970.Google ScholarGoogle ScholarCross RefCross Ref
  27. J. Hollmén. BernoulliMix: Program package for finite mixture models of multivariate Bernoulli distributions, May 2009. Freely available in http://www.cis.hut.fi/jHollmen/BernoulliMix/.Google ScholarGoogle Scholar
  28. Mathworks. Matlab: the language of technical computing. Website, 1994. http://www.mathworks.com/products/matlab/: Last Accessed: 15 Mar 2010.Google ScholarGoogle Scholar
  29. G. W. Stewart. Matrix Algorithms: Volume 1, Basic Decompositions. Society for Industrial Mathematics, 1998.Google ScholarGoogle Scholar
  30. S. D. Gay. Datamining in proteomics: extracting knowledge from peptide mass fingerprinting spectra. PhD thesis, University of Geneva, Geneva, 2002.Google ScholarGoogle Scholar
  31. G. J. Mclachlan and T. Krishnan. The EM Algorithm and Extensions. Wiley-Interscience, 1 edition, November 1996.Google ScholarGoogle Scholar
  32. W. Powell. Approximate Dynamic Programming: Solving the Curses of Dimensionality. Wiley, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Conferences
    UP '10: Proceedings of the ACM SIGKDD Workshop on Useful Patterns
    July 2010
    82 pages
    ISBN:9781450302166
    DOI:10.1145/1816112

    Copyright © 2010 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 25 July 2010

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article

    Upcoming Conference

    KDD '24

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader