Document Clustering with Evolutionary Systems through Straight-Line Programs “slp

Abstract

In this paper, we show a clustering method supported on evolutionary algorithms with the paradigm of linear genetic programming. “The Straight-Line Programs (slp)”, which uses a data structure which will be useful to represent collections of documents. This data structure can be seen as a linear representation of programs, as well as representations in the form of graphs. It has been used as a theoretical model in Computer Algebra, and our purpose is to reuse it in a completely different context. In this case, we apply it to the field of grouping library collections through evolutionary algorithms. We show its efficiency with experimental data we got from traditional library collections.

Share and Cite:

J. Sequera, J. del Castillo Diez and L. Sotos, "Document Clustering with Evolutionary Systems through Straight-Line Programs “slp”," Journal of Intelligent Learning Systems and Applications, Vol. 4 No. 4, 2012, pp. 303-318. doi: 10.4236/jilsa.2012.44032.

Conflicts of Interest

The authors declare no conflicts of interest.

References

[1] R. Baeza-Yates and B. Ribeiro Neto, “Modern Information Retrieval,” 2nd Edition, ACM Press Book Addison-Wesley, New York, 1999.
[2] Z. Michalewicz, “Genetic Algorithms + Data Structures = Evolution Program,” Third Edition, Springer-Verlag, Berlin, 1999.
[3] M. Aldaz, J. Heintz, G. Matera, J. L. Monta?a and L. M. Pardo, “Time-Space Tradeoffs in Algebraic Complexity Theory,” Journal of Complexity, Vol. 16, No. 1, 1998, pp. 2-49. doi:10.1006/jcom.1999.0526
[4] C. L. Alonso, J. L. Monta?a and J. Puente, “Straight Line Programs: A New Linear Genetic Programming Approach,” Proceedings of the 20th International Conference of the IEEE on Tools with Artificial Intelligence, ICTAI, Dayton, 3-5 November 2008, pp. 517-524.
[5] D. Goldberg, “Genetic Algorithms in Search, Optimization and Machine Learning,” First Edition, Addison-Wesley, 1989.
[6] J. Koza, “Genetic Programming: On the Programming Computers by Means of Natural Selection,” Massachusetts Institute of Technology, Cambridge, 1992.
[7] J. L. Castillo, J. R. Fernández del Castillo and L. González, “Feature Reduction for Document Clustering with NZIPF Method,” Proceedings of IADIS International Conference e-Society 2009, Barcelona, 25-28 February 2009, pp. 205-209.
[8] J. L. Castillo, J. R. Fernández del Castillo and L. González, “Methodology of Preprocessing of Documents for Systems of Recovery of Information,” Proceedings of IADIS International Conference, Information System 2008, Algarve, 9-11 April 2008, pp. 324-327.
[9] G. K. Zipf, “Human Behavior and the Principle of Least Effort: An Introduction to Human Ecology,” Addison Wesley, Cambridge, 1949.
[10] J. A. Moreiro González, “Applications from Content Analysis of the Mathematical Theory of Information,” Journal of Documentation, Vol. 5, 2002, pp. 273-286.
[11] M. L. Pao, “Automatic Indexing Based on Goffman Transition of Word Occurrences,” American society for Information Science. Meeting 40th Information Management in the 1980’s: Proceedings of the ASIS Annual Meeting, Chicago, 26 September-1 October 1977.
[12] M. Berry and M. Castellanos, “Survey of Text Mining II,” Springer Verlag, London, 2008. doi:10.1007/978-1-84800-046-9
[13] J. Arabas, Z. Michalewicz Zbigniew and J. Mulawka “GAVaPs—A Genetic Algorithm with Varying Population Size,” Proceedings of the First Conference on Evolutionary Computation of the IEEE, orlando, 27-29 June 1994, pp. 73-79.
[14] J. L. Castillo, J. R. Fernández del Castillo and L. González, “Information Retrieval with Cluster Genetic,” Proceedings of IADIS International Conference on Data Mining, Amsterdam, 24-26 July 2008, pp. 77-81.
[15] J. L. Castillo, J. R. Fernández del Castillo and L. González, “Group Method of Documentary Collections using Genetic Algorithms,” Proceedings Part II of Distributed Computing, Artificial Intelligence, Bioinformatics, Soft Computing, and Ambient Assisted Living. 10th International Work-Conference on Artificial Neural Network IWANN, Salamanca, 10-12 June 2009, pp. 992-1000.
[16] L. Davis, “Reuters-21578 Text Categorization Test collection,” 1987. http://www.daviddlewis.com/resources/testcollections/reuters21578/
[17] Eurovoc, “Eurovoc Thesaurus. Alphabetic Presentation Permuted,” Edition 4.2, Spanish Language, European Communities, 2006.
[18] D. Olson and D. Delen, “Advanced Data Mining Techniques,” Springer-Verlag, Berlin, 2008.
[19] A. M. Robertson and P. Willet, “Generation of Equifrecuent Groups of Words Using a Genetic Algorithms,” Journal of Documentation, Vol. 50, No. 3, 1994, pp. 213232.
[20] J. C. Bezdeck, “Genetic Algorithms Guided Clustering,” IEEE Proceedings of the First Conference on Evolutionary Computation, Orlando, 27-29 June 1994, pp. 34-40.

Copyright © 2024 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.