Phonetic confusion matrix based spoken document retrieval

Authors:
Savitha Srinivasan

IBM Almaden Research Center, 650 Harry Road, San Jose, CA

IBM Almaden Research Center, 650 Harry Road, San Jose, CA
View Profile

,
Dragutin Petkovic

IBM Almaden Research Center, 650 Harry Road, San Jose, CA

IBM Almaden Research Center, 650 Harry Road, San Jose, CA
View Profile

SIGIR '00: Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrievalJuly 2000Pages 81–87https://doi.org/10.1145/345508.345552

Published:01 July 2000Publication History

SIGIR '00: Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval

Pages 81–87

ABSTRACT

Combined word-based index and phonetic indexes have been used to improve the performance of spoken document retrieval systems primarily by addressing the out-of-vocabulary retrieval problem. However, a known problem with phonetic recognition is its limited accuracy in comparison with word level recognition. We propose a novel method for phonetic retrieval in the CueVideo system based on the probabilistic formulation of term weighting using phone confusion data in a Bayesian framework. We evaluate this method of spoken document retrieval against word-based retrieval for the search levels identified in a realistic video-based distributed learning setting. Using our test data, we achieved an average recall of 0.88 with an average precision of 0.69 for retrieval of out-of-vocabulary words on phonetic transcripts with 35% word error rate. For in-vocabulary words, we achieved a 17% improvement in recall over word-based retrieval with a 17% loss in precision for word error rites ranging from 35 to 65%.

References

1.Amir, A., Ponceleon, D., Blanchard, B., Petkovic, D., Srinivasan, S. and Cohen, G. Using Audio Time Scale Modification for Video Browsing, in Proceedings of HICSS-33, Hawaii, Jan. 2000.]]Google ScholarCross Ref
2.Amir, A., Srinivasan, S., Ponceleon, D., and Petkovic, D., CueVideo: Automated Indexing of Video for Searching and Browsing. Demonstration in Proceedings of SIGIR '99, pp. 326, Ca, Aug. 99.]] Google ScholarDigital Library
3.Dharanipragada, S., Franz, M. and Roukos, S. Audio-Indexing For Broadcast News. In Proceedings of Seventh Text Retrieval Conference, TREC-6, (NIST Special Publication) 1997.]]Google Scholar
4.Dharanipragada, S., and Roukos, S. A Fast vocabulary independent algorithm for spotting words in speech. In Proceedings of lCASSP 98, 1998.]]Google ScholarCross Ref
5.Fung, R. and Favero, B. Applying Bayesian Networks to Information Retrieval. In Communications of the ACM, March 1995, Vol. 38, No. 3.]] Google ScholarDigital Library
6.Garofolo, J.,Voorhees, E., Auzanne, C., Stanford, V. and Lund, B. (1997). The TREC-7 Spoken Document Retrieval Track Overview and Results. In Proceedings of the seventh Text Retrieval Conference (TREC-7), pp. 79. NIST Special Publication 500-242.]]Google Scholar
7.James, D. System for Unrestricted Topic Retrieval from Radio News Broadcasts, In Proceedings of ICASSP-96, Atlanta, GA, May196, pp. 279-282.]] Google ScholarDigital Library
8.Jansen, B.J., et al. Real Information Retrieval: A study of user queries on the web. In SIGIR FORUM, 32(1) 1998.]] Google ScholarDigital Library
9.Johnson, S.E., Jourlin, P., Moore, G.L., Jones, K.S. and Woodland, P.C. Spoken Document Retrieval for TREC-7 at Cambridge University. In Proceedings of the Seventh Text Retrieval Conference (TPREC-7), (NIST Special Publication) 1998]]Google Scholar
10.Jones, G. J. F., Foote, J. T., Jones, K. S., and Young, S. J.. Video Mail Retrieval: the effect of word spotting accuracy on precision. In Proceedings of ICASSP 95, volume 1, pp. 309-312, Detroit, MI.]]Google Scholar
11.Jones, G. J. F., Foote, J. T., Jones, K. S., and Young, S. J. Retrieving Spoken Documents by Combining Multiple Index Sources. In Proceedings of SIGIR 96, pp. 30-38, Zurich, Switzerland.]] Google ScholarDigital Library
12.Jones, K. S., Walker, S. and Robertson, S.E. A probabilistic model of information retrieval: Develepment and STatus, TR 446, Cambridge University Computer Laboratory, Sept 1998.]]Google Scholar
13.See URL at http://www.lotus.com/home.nsf/tabs/learnspace]]Google Scholar
14.Lunassen, L.M. and Mercer, R.L. An Information Theoretic Approach to Automatic Determination of Phonemic Baseforms. In Proceedings of ICASSP 84, pp. 42.5.1-42.5.4, 1984.]]Google Scholar
15.Maron, M.E. and Kuhns, J.L. On relevance, probabilistic indexng, and information retrieval. L ACM 7 (1960), 21-244.]] Google ScholarDigital Library
16.Ng, K. and Zue, V. Phonetic Recognition for Spoken Document Retrieval. In Proceedings of ICASSP 98, pp. 325-328.]]Google Scholar
17.Robertson, S.E. and SparckoJones, K. Relevance weighting of search terms. In Journal of American Society of Information Sciences. 27 (May-June 1976). pp. 126-146.]]Google ScholarCross Ref
18.Robertson, S.E., Walker, A., Sparck-Jones, K., Hancock-Beaulieu M.M & Gatford, M. Okapi at TREC-3. In Prec. Third Text Retrieval Conference. (NIST special publication), 1995.]]Google Scholar
19.Sch/tuble, P. and Wechsler, M. First Experiences with a System for Content Based Retrieval of Information from Speech Recordings. In IJCAI-95, Workshop on Intelligent Multimedia Information Retrieval, Maybury, M.T.]]Google Scholar
20.Siegler, M.A., Witbrock, M.J., Slattery, S.T., Seymore, K., Jones, R.E. and Hauptmann, A.G. Experiments in Spoken Document Retrieval at CMU. In Ptvceedings of the Seventh Text Retrieval Conference (TREC-7), (NIST Special Publication) 1998.]]Google Scholar
21.Singhal, A., Col, J., Hindle, D., Lewis, D. and Pereira, F. AT&T at TREC-7. In Proceedings of the Seventh Text Retrieval Conference TREC-7, (NIST Special Publication) 1998.]]Google Scholar
22.Srinivasan, S., Petkovic, D., Ponceleon, D. and Viswanathan, M. Query Expansion for Imperfect Speech: Applications in Distributed Learning. In CBAIVL-2000, IEEE Workshop on Content-based Access of Image and Video Libraries, Hilton Head Island, South Carolina.]] Google ScholarDigital Library
23.See URL at http://cwp.stanford.edu.]]Google Scholar
24.See URL at http://www-4.ibm.com/software/speecld]]Google Scholar
25.Voorhees, E., Garofolo, J. and Jones, K. (1997). The TREC-6 Spoken Document Retrieval Track Overview and Results. In Proceedings of the sixth Text Retrieval Conference (TREC-6), pp. 83. NIST Special Publication 500-240.]]Google ScholarCross Ref
26.Wechsler, M., Munteanu, E., and Schuble, P. New techniques for open vocabulary spoken document retrieval. In Proceedings of SIGIR'98, pp, 20-27, Melbourne, Australia, 1998]] Google ScholarDigital Library
27.Witbrock, M. and Hauptmann, A. Using Words and Phonetic Strings for Efficient Information Retrieval from Imperfectly Transcribed Spoken Documents. In Proceedings of DL97, The Second ACM International Conference on Digital Libraries, Philadelphia, PA.]] Google ScholarDigital Library

Index Terms

Phonetic confusion matrix based spoken document retrieval
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Speech recognition
2. Mathematics of computing
  1. Probability and statistics

Recommendations

Indexing confusion networks for morph-based spoken document retrieval
SIGIR '07: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval

In this paper, we investigate methods for improving the performance of morph-based spoken document retrieval in Finnish by extracting relevant index terms from confusion networks. Our approach uses morpheme-like subword units ("morphs") for recognition ...
Read More
Cross-language spoken document retrieval using HMM-based retrieval model with multi-scale fusion

Cross-language spoken document retrieval (CL-SDR) is the technology that facilitates automatic retrieval of relevant information from a collection of spoken documents in a language that is different from that used in the queries. Information sources ...
Read More
Query expansion using phonetic confusions for Chinese spoken document retrieval
IRAL '00: Proceedings of the fifth international workshop on on Information retrieval with Asian languages

This paper presents a method of query expansion based on phonetic confusions for retrieving spoken documents using text queries. This method is applied to a Chinese spoken document retrieval task. A series of experiments have been carried out for ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGIR '00: Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
July 2000
396 pages
ISBN:1581132263
DOI:10.1145/345508
Chairmen:
Emmanuel Yannakoudakis
Athens Univ. of Economics and Business, Greece
,
Nicholas J. Belkin
Rutgers Univ.
,
Mun-Kew Leong
Kent Ridge Digital Labs
,
Peter Ingwersen
Royal School of Library and Information Science
Copyright © 2000 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 July 2000
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate792of3,983submissions,20%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 53
  Total Citations
  View Citations
- 293
  Total Downloads
- Downloads (Last 12 months)89
- Downloads (Last 6 weeks)17
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Phonetic confusion matrix based spoken document retrieval

SIGIR '00: Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval

ABSTRACT

References

Cited By

Index Terms

Recommendations

Indexing confusion networks for morph-based spoken document retrieval

Cross-language spoken document retrieval using HMM-based retrieval model with multi-scale fusion

Query expansion using phonetic confusions for Chinese spoken document retrieval