Abstract
The objective of this study is to automatically extract annotated sign data from the broadcast news recordings for the hearing impaired. These recordings present an excellent source for automatically generating annotated data: In news for the hearing impaired, the speaker also signs with the hands as she talks. On top of this, there is also corresponding sliding text superimposed on the video. The video of the signer can be segmented via the help of either the speech or both the speech and the text, generating segmented, and annotated sign videos. We call this application as Signiary, and aim to use it as a sign dictionary where the users enter a word as text and retrieve sign videos of the related sign. This application can also be used to automatically create annotated sign databases that can be used for training recognizers.
Similar content being viewed by others
References
Zeshan U (2002) Sign language in Turkey: the story of a hidden language. Turk Lang 6(2):229–274
Zeshan U (2003) Aspects of Turk isaret dili (Turkish sign language). Sign Lang Linguist 6(1):43–75
Rabiner LR, Sambur M (1975) An algorithm for determining the endpoints of isolated utterances. Bell Syst Tech J 54(2)
Arisoy E, Sak H, Saraclar M (2007) Language modeling for automatic Turkish broadcast news transcription. In: Proceedings of interspeech
HTK speech recognition toolkit. http://htk.eng.cam.ac.uk
AT&T fsm & dcd tools. http://www.research.att.com
Saraclar M, Sproat R (2004) Lattice-based search for spoken utterance retrieval. In: HLTNAACL
Allauzen C, Mohri M, Saraclar M (2004) General indexation of weighted automata-application to spoken utterance retrieval. In: HLTNAACL
Parlak S, Saraclar M (2008) Spoken term detection for Turkish broadcast news. In: ICASSP
Otsu N (1979) A threshold selection method from gray-level histograms. IEEE Trans Syst Man Cybern 9(1):62–66
Tubbs JD (1989) A note on binary template matching. Pattern Recognit 22(4):359–365
Brill E (1995) Transformation-based error-driven learning and natural language processing: a case study in part-of-speech tagging. Comput Linguist 21(4):543–565
Ngai G, Florian R (2001) Transformation-based learning in the fast lane. In: NAACL 2001, pp 40–47
Vezhnevets V, Sazonov V, Andreeva A (2003) A survey on pixel-based skin color detection techniques. In: Graphicon, pp 85–92
Raja Y, McKenna SJ, Gong S (1998) Tracking and segmenting people in varying lighting conditions using colour. In: IEEE international conference on face and gesture recognition, Nara, Japan, pp 228–233
OpenCV, Intel open source computer vision library. http://opencvlibrary.sourceforge.net/
OpenCV blob extraction library. http://opencvlibrary.sourceforge.net/cvBlobsLib
Tanibata N, Shimada N, Shirai Y (2002) Extraction of hand features for recognition of sign language words. In: International conference on vision interface, pp 391–398
Ong S, Ranganath S (2005) Automatic sign language analysis: a survey and the future beyond lexical meaning. IEEE Trans Pattern Anal Mach Intell 27(6):873–891
Bogazici University, Perceptual Intelligence Laboratory (PILAB). http://www.cmpe.boun.edu.tr/pilab
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Aran, O., Ari, I., Akarun, L. et al. Speech and sliding text aided sign retrieval from hearing impaired sign news videos. J Multimodal User Interfaces 2, 117–131 (2008). https://doi.org/10.1007/s12193-008-0007-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12193-008-0007-z