Abstract
Speech recognition is an area with a considerable literature, but there is little discussion of the topic within the computer science algorithms literature. Many computer scientists, however, are interested in the computational problems of speech recognition. This paper presents the field of speech recognition and describes some of its major open problems from an algorithmic viewpoint. Our goal is to stimulate the interest of algorithm designers and experimenters to investigate the algorithmic problems of effective automatic speech recognition.
Supplemental Material
Available for Download
- ALLEN, J.B. 1996. Harvey Fletcher's role in the creation of communication acoustics. Journal of the Acoustical Society of America 99, 4, 1825-39.]]Google ScholarCross Ref
- BAHL, L. R., BAKIS, R., COHEN, P. S., COLE, A. G., JELINEK, F., LEWIS, B. L., AND MERCER, R.L. 1980. Further results on the recognition of a continuously read natural corpus. In Proc. IEEE Int'l. Conf. on Acoustics, Speech, and Signal Processing, Volume 3 (1980), pp. 872-5.]]Google ScholarCross Ref
- BAHL, L. R., BAKIS, R., DE SOUZA, P. V., AND MERCER, R.L. 1988. Obtaining candidate words by polling in a large vocabulary speech recognition system. In Proc. INNN Int'l. Conf. on Acoustics, Speech, and Signal Processing, Volume 1 (1988), pp. 489-92.]]Google ScholarCross Ref
- BAHL, L. R., GENNARO, S. V. D., GOPALAKRISHNAN, P. S., AND MERCER, R.L. 1993. A fast approximate acoustic match for large vocabulary speech recognition. IEEE Transactions on Speech and Audio Processing 1, 59-67.]]Google ScholarCross Ref
- BAHL, L. R., GOPALAKRISHNAN, P. S., KANEVSKI, D., AND NAHAMOO, D. 1989. Matrix fast match: A fast method for identifying a short list of candidate words for decoding. In Proc. IEEE Int'l. Conf. on Acoustics, Speech, and Signal Processing, Volume 1 (1989), pp. 345-8.]]Google ScholarCross Ref
- BAHL, L. R., JELINEK, F., AND MERCER, R.L. 1983. A maximum likelihood approach to continuous speech recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence PAMI-5, 179-190.]]Google ScholarDigital Library
- BAUM, L. E. AND EAGON, J.A. 1967. An inequality with applications to statistical estimation for probabilistic functions of a Markov process and to a model for ecology. Bulletin of the American Mathematical Society 73, 360-3.]]Google ScholarCross Ref
- BAUM, L. E. AND SELL, G.R. 1968. Growth transformations for functions on manifolds. Pacific Journal of Mathematics 27, 211-27.]]Google ScholarCross Ref
- BELLMAN, R. 1958. On a routing problem. Quarterly of Applied Mathematics 16, 87-90.]]Google ScholarCross Ref
- BERSTEL, J. 1979. Transduction and Context-Free Languages, Volume 38 of Leitfaden der angewandten Mathematik und Mechanik LAMM. Springer-Verlag.]]Google Scholar
- BERSTEL, J. AND REUTENAUER, C. 1988. Rational Series and Their Languages, Volume 12 of EATCS Monographs on Theoretical Computer Science. Springer-Verlag.]] Google ScholarDigital Library
- BOOTHROYD, A. 1968. Statistical theory of the speech discrimination score. Journal of the Acoustical Society of America 43, 2, 362-7.]]Google ScholarCross Ref
- BOOTHROYD, A. AND NITTROUER, S. 1988. Mathematical treatment of context effects in phoneme and word recognition. Journal of the Acoustical Society of America 84, 1, 101- 14.]]Google ScholarCross Ref
- BRESLAUER, D. 1996. The suffix tree of a tree and minimizing sequential transducers. In Proc. 7th Symposium on Combinatorial Pattern Matching (1996).]] Google ScholarDigital Library
- CHOFFRUT, C. 1978. Contributions á l'étude de quelques families remarquables de function rationnelles. Ph. D. thesis, LITP-Université Paris 7, Paris, France.]]Google Scholar
- CORMEN, T. H., LEISERSON, C. E., AND RIVEST, R.L. 1991. Introduction to Algorithms. The MIT Electrical Engineering and Computer Science Series. MIT Press, Cambridge, MA.]] Google ScholarDigital Library
- DIJKSTRA, E.W. 1959. A note on two problems in connexion with graphs. Numerische Mathematik 1, 269-271.]]Google ScholarDigital Library
- EDMONDS, J. AND KARP, R.M. 1972. Theoretical improvements in algorithmic efficiency for network flow problems. Journal of the ACM 19, 248-64.]] Google ScholarDigital Library
- EILENBERG, S. 1974. Automata, Languages, and Machines, Volume A. Academic Press, San Diego.]] Google ScholarDigital Library
- ELGOT, C. C. AND MEZEI, J.E. 1965. On relations defined by generalized finite automata. IBM Journal of Research and Development 9, 47-68.]]Google ScholarDigital Library
- FLETCHER H. AND GALT, R.H. 1950. The perception of speech and its relationto telephony. Journal of the Acoustical Society of America 22, 2, 89-151.]]Google ScholarCross Ref
- FORD, L. R. AND FULKERSON, D.R. 1962. Flows in Networks. Princeton University Press, Princeton, NJ.]]Google Scholar
- GABOW, H.N. 1985. Scaling algorithms for network problems. Journal of Computer and System Sciences 31, 148-68.]] Google ScholarDigital Library
- GABOW, H. N. AND TARJAN, R. E. 1989. Faster scaling algorithms for network problems. SIAM Journal on Computing 18, 1013-36.]] Google ScholarDigital Library
- HART, P. E., NILSSON, N. J., AND RAPHAEL, B. 1968. A formal basis for the heuristic determination of minimum cost paths. IEEE Transactions on Systems, Science, and Cybernetics 4, 100-7.]]Google ScholarCross Ref
- HOPCROFT, J. E. AND ULLMAN, J.D. 1979. Introduction to Automata Theory, Languages, and Computation. Addison-Wesley Series in Computer Science. Addison-Wesley, Reading, MA.]] Google ScholarDigital Library
- JELINEK, F., BAHL, L. R., AND MERCER, R.L. 1975. Design of a linguistic statistical decoder for the recognition of continuous speech. IEEE Transactions on Information Theory IT-21, 250-6.]]Google ScholarCross Ref
- JELINEK, F., MERCER, R. L., AND ROUKOS, S. 1992. Principles of lexical language modeling for speech recognition. In S. FURUI AND M. M. SONDHI Eds., Advances in Speech Signal Processing, Chapter 21, pp. 651-99. New York: Marcel Dekker.]]Google Scholar
- KENNY, P., HOLLAN, R., GUPTA, V. N., LENNING, M., MERMELSTEIN, P., AND O'SHAUGHNESSY, D. 1993. A<sup>*</sup>-Admissible heuristics for rapid lexical access. IEEE Transactions on Speech and Audio Processing 1, 49-57.]]Google ScholarCross Ref
- KUICH, W. AND SALOMAA, A. 1986. Semirings, Automata, Languages, Volume 5 of EATCS Monographs on Theoretical Computer Science. Springer-Verlag.]] Google ScholarDigital Library
- LACOUTURE, R. AND MORI, R.D. 1991. Lexical tree compression. In Proc. 2nd Euro. Conf. on Speech Communication and Technology, Volume 2 (1991), pp. 581-4.]]Google Scholar
- LEE, K.-F. 1990. Context-dependent phonetic hidden Markov models for speaker-independent continuous speech recognition. In A. WAIBEL AND K.-F. LEE Eds., Readings in Speech Recognition, pp. 347-65. Morgan Kaufman.]] Google ScholarDigital Library
- LJOLJE, A. AND RILEY, M.D. 1992. Optimal search recognition using phone recognition and lexical access. In Proc. 2nd Int'l. Conf. on Spoken Language Processing (1992), pp. 313-316.]]Google Scholar
- LOWERRE, B. AND REDDY, R. 1980. The Harpy speech understanding system. In Trends in Speech Recognition, Chapter 15, pp. 340-60. Englewood Cliffs, NJ: Prentice-Hall.]]Google Scholar
- MOHRI, M. 1994. Minimization of sequential transducers. In Proc. 5th Symposium on Combinatorial Pattern Matching, Volume 807 of Lecture Notes in Computer Science (1994), pp. 151-63.]] Google ScholarDigital Library
- MOHRI, M. 1997a. Finite-state transducers in language and speech processing. To appear in Computational Linguistics.]] Google ScholarDigital Library
- MOHRI, M. 1997b. On the use of sequential transducers in natural language processing. In Finite State Devices in Natural Language Processing. MIT Press. To appear.]]Google Scholar
- PEREIRA, F. AND RILEY, M. 1997. Speech recognition by composition of weighted finite automata. In Finite State Devices in Natural Language Processing. MIT Press. To appear.]]Google Scholar
- PEREIRA, F., RILEY, M., AND SPROAT, R. 1994. Weighted rational transductions and their application to human language processing. In Proc. ARPA Human Language Technology Conf. (1994), pp. 249-54.]] Google ScholarDigital Library
- PICKERING, J. B. AND ROSNER, B. S. 1993. The Oxford Acoustic Phonetic Database on Compact Disk. Oxford University Press.]]Google Scholar
- RABINER, L. AND JUANG, B.-H. 1993. Fundamentals of Speech Recognition. Prentice Hall Signal Processing Series. Prentice Hall, Englewood Cliffs, NJ.]] Google ScholarDigital Library
- RABINER, L.R. 1990. A tutorial on hidden Markov models and selected applications in speech recognition. In A. WAIBEL AND K.-F. LEE Eds., Readings in Speech Recognition, pp. 367-96. Morgan Kaufman.]] Google ScholarDigital Library
- REUTENAUER, C. AND SCHÜTZENBERGER, M.-P. 1991. Minimization of rational word functions. SIAM Journal on Computing 20, 4, 669-85.]] Google ScholarDigital Library
- RILEY, M. D., LJOLJE, A., HINDLE, D., AND PEREIRA, F. C. N. 1995. The AT&T 60,000 word speech-to-text system. In J. M. PARDO, E. ENRÌQUEZ, J. ORTEGA, J. FERREIROS, J. MACÌAS, AND F. J. VALVERDE Eds., Proc. 4th Euro. Conf. on Speech Communication and Technology, Volume 1 (1995), pp. 207-210.]]Google Scholar
- ROCHE, E. 1993. Analyse syntaxique transformationelle du francais par transducteurs et lexique-grammaire. Ph.D. thesis, LITP-Université Paris 7, Paris, France.]]Google Scholar
- ROCHE, E. 1995. Smaller representations for finite-state transducers and finite-state automata. In Proc. 6th Symposium on Combinatorial Pattern Matching, Volume 937 of Lecture Notes in Computer Science (1995), pp. 352-65.]]Google ScholarCross Ref
- SCHWARTZ, R., CHOW, Y., ROUCOS, S., KRASNER, M., AND MAKHOUL, J. 1984. Improved hidden Markov modeling of phonemes for continuous speech recognition. In Proc. IEEE Int'l. Conf. on Acoustics, Speech, and Signal Processing, Volume 3 (1984), pp. 35.6.1-4.]]Google ScholarCross Ref
- SHOUP, J. E. 1980. Phonological aspects of speech recognition. In Trends in Speech Recognition , Chapter 6, pp. 125-38. Englewood Cliffs, NJ: Prentice-Hall.]]Google Scholar
- SILBERZTEIN, M. 1993. Dictionnaires électroniques et analise automatique de textes: le systéme intex. Ph. D. thesis, Masson, Paris, France.]]Google Scholar
- SOONG, F. K. AND HUANG, E.-F. 1991. A tree-trellis based fast search for finding the N best sentence hypotheses in continuous speech recognition. In Proc. IEEE Int'l. Conf. on Acoustics, Speech, and Signal Processing, Volume 1 (1991), pp. 705-8.]] Google ScholarDigital Library
- VITERBI, A.J. 1967. Error bounds for convolutional codes and an asymptotically optimal decoding algorithm. IEEE Transactions on Information Theory IT-13, 260-9.]]Google ScholarCross Ref
- WAIBEL, A. AND LEE, K.-F. Eds. 1990. Readings in Speech Recognition. Morgan Kaufmann.]] Google ScholarDigital Library
- WEBER, A. AND KLEMM, R. 1995. Economy of description for single-valued transducers. Information and Computation 118, 327-40.]] Google ScholarDigital Library
Index Terms
- Algorithmic aspects in speech recognition: an introduction
Recommendations
MFCC-GMM based accent recognition system for Telugu speech signals
Speech processing is very important research area where speaker recognition, speech synthesis, speech codec, speech noise reduction are some of the research areas. Many of the languages have different speaking styles called accents or dialects. ...
Acoustical pre-processing for robust speech recognition
HLT '89: Proceedings of the workshop on Speech and Natural LanguageIn this paper we describe our initial efforts to make SPHINX, the CMU continuous speech recognition system, environmentally robust. Our work has two major goals: to enable SPHINX to adapt to changes in microphone and acoustical environment, and to ...
Speech-Input Speech-Output Communication for Dysarthric Speakers Using HMM-Based Speech Recognition and Adaptive Synthesis System
Dysarthria is a motor speech disorder that causes inability to control and coordinate one or more articulators. This makes it difficult for a dysarthric speaker to utter certain speech sound units, thereby producing poorly articulated, slurred, and ...
Comments