Abstract
WEIRD is an automatic document retrieval system designed and implemented at Syracuse University, which attempts to advance the art of computerized retrieval from word-matching to judging conceptual similarity. WEIRD uses a vector space model to represent the relations among terms and documents. Items in the space are located according to their "meaning", which is their proximity to all other items in the data base as measured by co-occurrence frequencies. This is done without manipulating large matrices. The dimensions of the space are not used to define relations; items are defined solely by their position relative to the other items. Retrieval is determined by Euclidean distance from the plotted query. In the first section of the paper the basic characteristics of WEIRD are described. Second, the results of a preliminary evaluation are reported. Alternatives for further development of WEIRD are then considered.
- Bookstein, A.; Kraft, D. "Operations Research Applied to Document Indexing and Retrieval Decisions." Journal of the ACM, 24(3): 418--427 (1977). Google ScholarDigital Library
- Cagan, C. "A Highly Associative Document Retrieval System." Journal of the American Society for Information Science, 21(5): 330--337 (1970).Google ScholarCross Ref
- Cleveland, D. B. "An n-Dimensional Retrieval Model." Journal of the American Society for Information Science, 27(5/6): 342--347 (1976).Google ScholarCross Ref
- Cooper, W. S. "Expected Search Length: A Single Measure of Retrieval Effectiveness Based on the Weak Ordering Action of Retrieval Systems." American Documentation, 19(1): 30--41 (1968).Google ScholarCross Ref
- Doyle, L. B. "Semantic Road Maps for Literature Searches." Journal of the ACM, 8(4) (1961). Google ScholarDigital Library
- Giuliano, V. E. "Analog Networks for Word Associations." IEEE Transactions on Military Electronics, 1963: 221--225.Google Scholar
- Harter, S. P. "A Probabilistic Model for Automatic Keyword Indexing, Part 1." Journal of the American Society for Information Science, 26(4): 197--206 (1975).Google ScholarCross Ref
- Iker, H. P. "An Historical Note on the Use of Word Frequency Contiguities in Content Analysis." Computers and the Humanitites, 8: 93--98 (1974).Google ScholarCross Ref
- Katter, R. V. A Study of Document Representations: Multidimensional Scaling of Index Terms. SDC - Final Report, 1967.Google Scholar
- Kim, C. "Theoretical Foundation of Thesaurus-Construction and Some Methodological Considerations for Thesaurus Updating." Journal of the American Society for Information Science, 24(2): 148--156 (1973).Google ScholarCross Ref
- Maron, M. E.; Kuhns, J. L. "On Relevance, Probabilistic Indexing, and Information Retrieval." Journal of the ACM, 7(3): 216--244 (1960). Google ScholarDigital Library
- Noreault, T.; Koll, M. B.; McGill, M. J. "Automatic Ranked Output from Boolean Searches in SIRE." (Accepted for publication in Journal of the American Society for Information Science, 1977).Google ScholarCross Ref
- Osgood, C.; Suci, G.; Tannenbaum, P. The Measurement of Meaning. Urbana: The University of Illinois Press, 1957.Google Scholar
- Smith, L. C. "Artificial Intelligence in Information Retrieval Systems." Information Processing and Management, 12(3): 189--222 (1976).Google ScholarCross Ref
- Sparck Jones, K. "Index Term Weighting." Information Storage and Retrieval, 9(11): 619--633 (1973).Google ScholarCross Ref
- Switzer, P. "Vector Images in Information Retrieval." In: Statistical Association Methods for Mechanical Documentation, Symposium Proceedings, Wash., D.C., 1964. (NBS Misc. Publ. 269, 1965) Stevens, M. E.; Heilprin, L.; Giuliano, V. E. (eds.). 163--171.Google Scholar
- Tars, A. "Stemming as a System Design Consideration." ACM SIGIR Forum, XI(1):9--15 (1976). Google ScholarDigital Library
- Woelfel, J. Sociology and Science. Unpublished manuscript, Michigan State University, Department of Communication, 1971.Google Scholar
- Yu, C. T.; Salton, G. "Effective Information Retrieval Using Term Accuracy." Communications of the ACM, 20(3): 135--142 (1977). Google ScholarDigital Library
Index Terms
- WEIRD: an approach to concept-based information retrieval
Recommendations
WEIRD: An approach to concept-based information retrieval
SIGIR '78: Proceedings of the 1st annual international ACM SIGIR conference on Information storage and retrievalWEIRD is an automatic document retrieval system designed and implemented at Syracuse University, which attempts to advance the art of computerized retrieval from word-matching to judging conceptual similarity. WEIRD uses a vector space model to ...
WEIRD: An approach to concept-based information retrieval
WEIRD is an automatic document retrieval system designed and implemented at Syracuse University, which attempts to advance the art of computerized retrieval from word-matching to judging conceptual similarity. WEIRD uses a vector space model to ...
WEIRD: An approach to concept-based information retrieval
WEIRD is an automatic document retrieval system designed and implemented at Syracuse University, which attempts to advance the art of computerized retrieval from word-matching to judging conceptual similarity. WEIRD uses a vector space model to ...
Comments