Skip to main content
Log in

Classification of Symbol Sequences over Their Frequency Dictionaries: Towards the Connection between Structure and Natural Taxonomy

  • Published:
Open Systems & Information Dynamics

Abstract

The classifications of bacterial 16S RNA sequences developed over the real and transformed frequency dictionaries have been studied. Two sequences are considered to be close, when their frequency dictionaries are close in Euclidean metrics. A procedure to transform a dictionary is proposed that makes clear some features of the information pattern of a symbol sequence. A comparative study of classifications developed over real frequency dictionaries vs. the transformed ones has been carried out. A correlation between information patterns of nucleotide sequences and taxonomy of the bearer of the sequence was found. The sites with high information value are found to be the main factors of the difference between the classes in a classification. The classification of nucleotide sequences developed over real frequency dictionaries of thickness 3 reveals the best correlation to a gender of bacteria. A set of sequences of the same gender is included entirely into one class, as a rule, and exclusions occur rarely. A hierarchical classification yields one or two taxonomy groups on each level of classification. An unexpectedly often, or unexpectedly rare occurrence of some sites within a sequence makes a basic difference between the structure patterns of the classes yielded; a number of those sites is not to large. Further investigations are necessary in order to campare the sites revealed with those determined due to other methodology.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Bibliography

  1. Ph. A. Sharp, Cell 77,No. 6, 805 (1994).

    Google Scholar 

  2. H. P. Vockey, Information Theory and Molecular Biology, Cambridge Univ. Press, N.Y., 1992.

    Google Scholar 

  3. A. N. Gorban, E. M. Mirkes, T. G. Popova, M. G. Sadovsky, Biofizika 38, 762 (1993) (in Russian).

    Google Scholar 

  4. A. N. Gorban, E. M. Mirkes, T. G. Popova, M. G. Sadovsky, Genetika 29, 1314 (1994) (in Russian).

    Google Scholar 

  5. A. N. Gorban, T. G. Popova, M. G. Sadovsky, Molekulyarnaya biologiya 28, 313 (1994) (in Russian).

    Google Scholar 

  6. V. D. Gusev, V. A. Kulichkova, T. N. Titkova, Empirical prediction of Images, Comp. Systems 83, Novosibirsk Inst. of Math. of SD of Acad. Sci. USSR, pp. 11-33, 1980 (in Russian).

  7. N. N. Bugaenko, A. N. Gorban, M. G. Sadovsky, Molekulyarnaya biologiya 30, 529 (1996) (in Russian).

    Google Scholar 

  8. N. N. Bugaenko, A. N. Gorban, M. G. Sadovsky, Open Sys. Information Dyn. 5, 265 (1998).

    Google Scholar 

  9. A. N. Gorban, T. G. Popova, M. G. Sadovsky, Proc. of First Int. Conf. on Bioinformatics of Genome Regulation and Structure, Novosibirsk, 1998, vol. 2, pp. 314-317.

    Google Scholar 

  10. M. S. Gelfand, J. Comput. Biol. 2, 87 (1995).

    Google Scholar 

  11. J.-M. Claverie, I. Sauvaget, L. Bougueleret, in: Molecular Evolution: Computer Analysis of Protein and Nucleic Acid Sequences, R. F. Doolittle, ed., (Meth. Enzymol. 183), pp. 252-281, 1994.

  12. E. B. Baum, D. Boneh, DIMACS Ser. in Discrete Math. and Theor. Computer Science 44, 77 (1999).

  13. J. Kirkwood and E. Boggs, J. Chem. Phys. 10, 394 (1942).

    Google Scholar 

  14. P. Bork, Trends Genet. 12, 425 (1996).

    Google Scholar 

  15. A. N. Gorban, D. F. Rossiev, Neural networks on PC, Novosibirsk, Nauka Pbls., 1996, (in Russian).

    Google Scholar 

  16. H. G. Schlegel, Allgemeine Mikrobiologie. 6 überarbeiten Auflage, Georg Thieme Verlag, Stuttgardt, N.Y., 1985.

    Google Scholar 

  17. ftp://ccrv.obs-vlfr.fr/pub/christen/16S/

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gorban, A.N., Popova, T.G. & Sadovsky, M.G. Classification of Symbol Sequences over Their Frequency Dictionaries: Towards the Connection between Structure and Natural Taxonomy. Open Systems & Information Dynamics 7, 1–17 (2000). https://doi.org/10.1023/A:1009652616706

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1009652616706

Keywords

Navigation