ISCA Archive Interspeech 2007
ISCA Archive Interspeech 2007

Speech-based annotation and retrieval of digital photographs

Timothy J. Hazen, Brennan Sherry, Mark Adler

In this paper we describe the development of a speech-based annotation and retrieval system for digital photographs. The system uses a client/server architecture which allows photographs to be captured and annotated on light-weight clients, such as mobile camera phones, and then processed, indexed and stored on networked servers. For speech-based retrieval we have developed a mixed grammar recognition approach which allows the speech recognition system to construct a single finite-state network combining context-free grammars, for recognizing and parsing query carrier phrases and metadata phrases, with an unconstrained statistical n-gram model for recognizing free-form search terms. Experiments demonstrating successful retrieval of photographs using purely speech-based annotation and retrieval are presented.


doi: 10.21437/Interspeech.2007-584

Cite as: Hazen, T.J., Sherry, B., Adler, M. (2007) Speech-based annotation and retrieval of digital photographs. Proc. Interspeech 2007, 2165-2168, doi: 10.21437/Interspeech.2007-584

@inproceedings{hazen07c_interspeech,
  author={Timothy J. Hazen and Brennan Sherry and Mark Adler},
  title={{Speech-based annotation and retrieval of digital photographs}},
  year=2007,
  booktitle={Proc. Interspeech 2007},
  pages={2165--2168},
  doi={10.21437/Interspeech.2007-584}
}