skip to main content
10.1145/170791.170891acmconferencesArticle/Chapter ViewAbstractPublication PagescscConference Proceedingsconference-collections
Article
Free Access

Trigrams as index element in full text retrieval: observations and experimental results

Authors Info & Claims
Published:01 March 1993Publication History

ABSTRACT

A trigram is a three element sequence of characters. In this paper we demonstrate the effectiveness of a trigram based index for morphologically based retrievals from a full text document retrieval system. Retrieved documents are considered relevant if they contain exact matches for each of the query terms. Using this definition of relevance we consistently achieve a recall rate of 100%. In the experiments described here, we used sets of 100 anded three term queries, and the average precision per set varied from 47% to 87%. We propose a method for increasing the average precision to 100%. Using overlapping trigrams extracted from the Brown Corpus [KUCE67] and a character set of 45 elements, we found a horizontal asymptote near 11,000 for the number of entries in a trigram based index. Finally we show that a trigram based system provides a reasonable alternative to a word based one and is superior to it in retrievals of word fragments.

References

  1. ADAM92.Adtms, l~nbeh, "A Study of Trigrams and Their Feasibility as Index Terms in a Full Text Information Retrieval System', D.Sc. diuertation, George Washington University, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. COML90.Comlekoglu, Fatih, "Optimizing a Text Retrieval System Utilizing N-gram Indexing', D.Sc. diuertation, George Washington University, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. D’AM85.D'Amore, Raymond J., and Mah, Climon, P., "One-Tune Con~lete Indexing of Text: Theory and Practice', Research and develooment in information retrieval: Ei~,hth AnmmlJntema--uoul A CM $IGIR Conference, pp. 155-164,Montreal, Quebec, Canada, 1985. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. FOX90.Fox, Christopher, "A Stop last for General Text', SIC.dR FORUM, Vol. 24, Nos. 1-2, pp. 19-35, Fall 89/W'mter 90. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. JONE84.Jones, Kevin P., and Bell, Colin L.M., "The Auwnmtic Extraction of Words from Texts Especially for Input into Information Retrieval Syltems Bated on Inverted Fdes', pp. 410-419, in Research a~ Develovment in Information R~tl, van Rijsbergen, CJ. ed., Cambridge University Press, 1984. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. KRAC81.Kractony, Peter, Kowalski, Gerald and Meltzer, Arnold, "Comparative Analysis of Hardware Versus So/twaee Text Seazch', Chapter 17, in Information Retrieval Research, Oddy, R.N., Robeemon, S.E., Rijsbergen, CJ., and Williams, P.W., eds., Btmerwoahs, London, 1981. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. KUCE67.Kucera, Henry and Francis, W. Nelson, Cornou.tatio.nsl Analysis .of _Present-Day American Earl,h, Brown University Press, Providence, Rhode Island, 1%7.Google ScholarGoogle Scholar
  8. LAPI91.Lapir, G.M., "AJu,ociative Technique for Database Access', Report of the Institute for System Studies, USSR Academy of Sciences, 1991 (in Russian).Google ScholarGoogle Scholar
  9. LESL51.Leslie, Louis, 20.000 Words Svelledi Divided, and Accented for the Use of S .~ents AuO~rs and Proofreaden, Third Edition, Oregg Publishing Division of McGraw-Hill Book Coaq~any, Inc., New York, 1951.Google ScholarGoogle Scholar
  10. MELT87.Meltzer, Arnold C., and Kowah~, Gerald, "Text Searching Using an Inversion Databuc Consisting of Trigrams', IEEE Proceedings of Second International Coqference on Ctmqmters and Appltcatlmts, pp. 65-69, 1987Google ScholarGoogle Scholar
  11. SALT83.Salton, Gerard, and McOill, Michael I., h~oduction to Modern Infornmfion Retrievjal, McGraw Hill, New York, 1983. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. WILK89.Wilkinson, Leland, SYSTAT: The System for Statis~c.s, SYSTAT, Inc., Evanston, IL, 1989.Google ScholarGoogle Scholar
  13. WILL79.W'dlett, Peter, "Document P.e~eval F.xperimenu U,ing Indexing Vocabularies of Varying Size- 2. Hashing, Truncation, Digram and Trigram Eacoding of Index Terms', Journa/ofDocumenmdon, Vol. 35, No. 4, pp. 296-305, December 1979.Google ScholarGoogle Scholar
  14. WISN87.W'umiewski, Janusz L., "Effective Text Compression with Sinml~s Digram and Trigram Encoding', Journal of Information Science: Prb~ples & Pracdce, Vol. 13, No. 3, pp. 159-164, 1987. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. YOCH85.Yochum, Julian A., "A High-Speed Text Scanning Algorithm Utilizing Least Frequent Trigraphs',/EEE Proceedings l~v Dtreclkms in C_.ompu~f b'ympo~um, Trondhe'un, Norway, pp. 114-121, 1985.Google ScholarGoogle Scholar

Index Terms

  1. Trigrams as index element in full text retrieval: observations and experimental results

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            CSC '93: Proceedings of the 1993 ACM conference on Computer science
            March 1993
            543 pages
            ISBN:0897915585
            DOI:10.1145/170791

            Copyright © 1993 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 1 March 1993

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • Article

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader