skip to main content
article
Free Access

Algorithmic aspects in speech recognition: an introduction

Published:01 January 1997Publication History
Skip Abstract Section

Abstract

Speech recognition is an area with a considerable literature, but there is little discussion of the topic within the computer science algorithms literature. Many computer scientists, however, are interested in the computational problems of speech recognition. This paper presents the field of speech recognition and describes some of its major open problems from an algorithmic viewpoint. Our goal is to stimulate the interest of algorithm designers and experimenters to investigate the algorithmic problems of effective automatic speech recognition.

Skip Supplemental Material Section

Supplemental Material

References

  1. ALLEN, J.B. 1996. Harvey Fletcher's role in the creation of communication acoustics. Journal of the Acoustical Society of America 99, 4, 1825-39.]]Google ScholarGoogle ScholarCross RefCross Ref
  2. BAHL, L. R., BAKIS, R., COHEN, P. S., COLE, A. G., JELINEK, F., LEWIS, B. L., AND MERCER, R.L. 1980. Further results on the recognition of a continuously read natural corpus. In Proc. IEEE Int'l. Conf. on Acoustics, Speech, and Signal Processing, Volume 3 (1980), pp. 872-5.]]Google ScholarGoogle ScholarCross RefCross Ref
  3. BAHL, L. R., BAKIS, R., DE SOUZA, P. V., AND MERCER, R.L. 1988. Obtaining candidate words by polling in a large vocabulary speech recognition system. In Proc. INNN Int'l. Conf. on Acoustics, Speech, and Signal Processing, Volume 1 (1988), pp. 489-92.]]Google ScholarGoogle ScholarCross RefCross Ref
  4. BAHL, L. R., GENNARO, S. V. D., GOPALAKRISHNAN, P. S., AND MERCER, R.L. 1993. A fast approximate acoustic match for large vocabulary speech recognition. IEEE Transactions on Speech and Audio Processing 1, 59-67.]]Google ScholarGoogle ScholarCross RefCross Ref
  5. BAHL, L. R., GOPALAKRISHNAN, P. S., KANEVSKI, D., AND NAHAMOO, D. 1989. Matrix fast match: A fast method for identifying a short list of candidate words for decoding. In Proc. IEEE Int'l. Conf. on Acoustics, Speech, and Signal Processing, Volume 1 (1989), pp. 345-8.]]Google ScholarGoogle ScholarCross RefCross Ref
  6. BAHL, L. R., JELINEK, F., AND MERCER, R.L. 1983. A maximum likelihood approach to continuous speech recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence PAMI-5, 179-190.]]Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. BAUM, L. E. AND EAGON, J.A. 1967. An inequality with applications to statistical estimation for probabilistic functions of a Markov process and to a model for ecology. Bulletin of the American Mathematical Society 73, 360-3.]]Google ScholarGoogle ScholarCross RefCross Ref
  8. BAUM, L. E. AND SELL, G.R. 1968. Growth transformations for functions on manifolds. Pacific Journal of Mathematics 27, 211-27.]]Google ScholarGoogle ScholarCross RefCross Ref
  9. BELLMAN, R. 1958. On a routing problem. Quarterly of Applied Mathematics 16, 87-90.]]Google ScholarGoogle ScholarCross RefCross Ref
  10. BERSTEL, J. 1979. Transduction and Context-Free Languages, Volume 38 of Leitfaden der angewandten Mathematik und Mechanik LAMM. Springer-Verlag.]]Google ScholarGoogle Scholar
  11. BERSTEL, J. AND REUTENAUER, C. 1988. Rational Series and Their Languages, Volume 12 of EATCS Monographs on Theoretical Computer Science. Springer-Verlag.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. BOOTHROYD, A. 1968. Statistical theory of the speech discrimination score. Journal of the Acoustical Society of America 43, 2, 362-7.]]Google ScholarGoogle ScholarCross RefCross Ref
  13. BOOTHROYD, A. AND NITTROUER, S. 1988. Mathematical treatment of context effects in phoneme and word recognition. Journal of the Acoustical Society of America 84, 1, 101- 14.]]Google ScholarGoogle ScholarCross RefCross Ref
  14. BRESLAUER, D. 1996. The suffix tree of a tree and minimizing sequential transducers. In Proc. 7th Symposium on Combinatorial Pattern Matching (1996).]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. CHOFFRUT, C. 1978. Contributions á l'étude de quelques families remarquables de function rationnelles. Ph. D. thesis, LITP-Université Paris 7, Paris, France.]]Google ScholarGoogle Scholar
  16. CORMEN, T. H., LEISERSON, C. E., AND RIVEST, R.L. 1991. Introduction to Algorithms. The MIT Electrical Engineering and Computer Science Series. MIT Press, Cambridge, MA.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. DIJKSTRA, E.W. 1959. A note on two problems in connexion with graphs. Numerische Mathematik 1, 269-271.]]Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. EDMONDS, J. AND KARP, R.M. 1972. Theoretical improvements in algorithmic efficiency for network flow problems. Journal of the ACM 19, 248-64.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. EILENBERG, S. 1974. Automata, Languages, and Machines, Volume A. Academic Press, San Diego.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. ELGOT, C. C. AND MEZEI, J.E. 1965. On relations defined by generalized finite automata. IBM Journal of Research and Development 9, 47-68.]]Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. FLETCHER H. AND GALT, R.H. 1950. The perception of speech and its relationto telephony. Journal of the Acoustical Society of America 22, 2, 89-151.]]Google ScholarGoogle ScholarCross RefCross Ref
  22. FORD, L. R. AND FULKERSON, D.R. 1962. Flows in Networks. Princeton University Press, Princeton, NJ.]]Google ScholarGoogle Scholar
  23. GABOW, H.N. 1985. Scaling algorithms for network problems. Journal of Computer and System Sciences 31, 148-68.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. GABOW, H. N. AND TARJAN, R. E. 1989. Faster scaling algorithms for network problems. SIAM Journal on Computing 18, 1013-36.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. HART, P. E., NILSSON, N. J., AND RAPHAEL, B. 1968. A formal basis for the heuristic determination of minimum cost paths. IEEE Transactions on Systems, Science, and Cybernetics 4, 100-7.]]Google ScholarGoogle ScholarCross RefCross Ref
  26. HOPCROFT, J. E. AND ULLMAN, J.D. 1979. Introduction to Automata Theory, Languages, and Computation. Addison-Wesley Series in Computer Science. Addison-Wesley, Reading, MA.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. JELINEK, F., BAHL, L. R., AND MERCER, R.L. 1975. Design of a linguistic statistical decoder for the recognition of continuous speech. IEEE Transactions on Information Theory IT-21, 250-6.]]Google ScholarGoogle ScholarCross RefCross Ref
  28. JELINEK, F., MERCER, R. L., AND ROUKOS, S. 1992. Principles of lexical language modeling for speech recognition. In S. FURUI AND M. M. SONDHI Eds., Advances in Speech Signal Processing, Chapter 21, pp. 651-99. New York: Marcel Dekker.]]Google ScholarGoogle Scholar
  29. KENNY, P., HOLLAN, R., GUPTA, V. N., LENNING, M., MERMELSTEIN, P., AND O'SHAUGHNESSY, D. 1993. A<sup>*</sup>-Admissible heuristics for rapid lexical access. IEEE Transactions on Speech and Audio Processing 1, 49-57.]]Google ScholarGoogle ScholarCross RefCross Ref
  30. KUICH, W. AND SALOMAA, A. 1986. Semirings, Automata, Languages, Volume 5 of EATCS Monographs on Theoretical Computer Science. Springer-Verlag.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. LACOUTURE, R. AND MORI, R.D. 1991. Lexical tree compression. In Proc. 2nd Euro. Conf. on Speech Communication and Technology, Volume 2 (1991), pp. 581-4.]]Google ScholarGoogle Scholar
  32. LEE, K.-F. 1990. Context-dependent phonetic hidden Markov models for speaker-independent continuous speech recognition. In A. WAIBEL AND K.-F. LEE Eds., Readings in Speech Recognition, pp. 347-65. Morgan Kaufman.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. LJOLJE, A. AND RILEY, M.D. 1992. Optimal search recognition using phone recognition and lexical access. In Proc. 2nd Int'l. Conf. on Spoken Language Processing (1992), pp. 313-316.]]Google ScholarGoogle Scholar
  34. LOWERRE, B. AND REDDY, R. 1980. The Harpy speech understanding system. In Trends in Speech Recognition, Chapter 15, pp. 340-60. Englewood Cliffs, NJ: Prentice-Hall.]]Google ScholarGoogle Scholar
  35. MOHRI, M. 1994. Minimization of sequential transducers. In Proc. 5th Symposium on Combinatorial Pattern Matching, Volume 807 of Lecture Notes in Computer Science (1994), pp. 151-63.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. MOHRI, M. 1997a. Finite-state transducers in language and speech processing. To appear in Computational Linguistics.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. MOHRI, M. 1997b. On the use of sequential transducers in natural language processing. In Finite State Devices in Natural Language Processing. MIT Press. To appear.]]Google ScholarGoogle Scholar
  38. PEREIRA, F. AND RILEY, M. 1997. Speech recognition by composition of weighted finite automata. In Finite State Devices in Natural Language Processing. MIT Press. To appear.]]Google ScholarGoogle Scholar
  39. PEREIRA, F., RILEY, M., AND SPROAT, R. 1994. Weighted rational transductions and their application to human language processing. In Proc. ARPA Human Language Technology Conf. (1994), pp. 249-54.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. PICKERING, J. B. AND ROSNER, B. S. 1993. The Oxford Acoustic Phonetic Database on Compact Disk. Oxford University Press.]]Google ScholarGoogle Scholar
  41. RABINER, L. AND JUANG, B.-H. 1993. Fundamentals of Speech Recognition. Prentice Hall Signal Processing Series. Prentice Hall, Englewood Cliffs, NJ.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. RABINER, L.R. 1990. A tutorial on hidden Markov models and selected applications in speech recognition. In A. WAIBEL AND K.-F. LEE Eds., Readings in Speech Recognition, pp. 367-96. Morgan Kaufman.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. REUTENAUER, C. AND SCHÜTZENBERGER, M.-P. 1991. Minimization of rational word functions. SIAM Journal on Computing 20, 4, 669-85.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. RILEY, M. D., LJOLJE, A., HINDLE, D., AND PEREIRA, F. C. N. 1995. The AT&T 60,000 word speech-to-text system. In J. M. PARDO, E. ENRÌQUEZ, J. ORTEGA, J. FERREIROS, J. MACÌAS, AND F. J. VALVERDE Eds., Proc. 4th Euro. Conf. on Speech Communication and Technology, Volume 1 (1995), pp. 207-210.]]Google ScholarGoogle Scholar
  45. ROCHE, E. 1993. Analyse syntaxique transformationelle du francais par transducteurs et lexique-grammaire. Ph.D. thesis, LITP-Université Paris 7, Paris, France.]]Google ScholarGoogle Scholar
  46. ROCHE, E. 1995. Smaller representations for finite-state transducers and finite-state automata. In Proc. 6th Symposium on Combinatorial Pattern Matching, Volume 937 of Lecture Notes in Computer Science (1995), pp. 352-65.]]Google ScholarGoogle ScholarCross RefCross Ref
  47. SCHWARTZ, R., CHOW, Y., ROUCOS, S., KRASNER, M., AND MAKHOUL, J. 1984. Improved hidden Markov modeling of phonemes for continuous speech recognition. In Proc. IEEE Int'l. Conf. on Acoustics, Speech, and Signal Processing, Volume 3 (1984), pp. 35.6.1-4.]]Google ScholarGoogle ScholarCross RefCross Ref
  48. SHOUP, J. E. 1980. Phonological aspects of speech recognition. In Trends in Speech Recognition , Chapter 6, pp. 125-38. Englewood Cliffs, NJ: Prentice-Hall.]]Google ScholarGoogle Scholar
  49. SILBERZTEIN, M. 1993. Dictionnaires électroniques et analise automatique de textes: le systéme intex. Ph. D. thesis, Masson, Paris, France.]]Google ScholarGoogle Scholar
  50. SOONG, F. K. AND HUANG, E.-F. 1991. A tree-trellis based fast search for finding the N best sentence hypotheses in continuous speech recognition. In Proc. IEEE Int'l. Conf. on Acoustics, Speech, and Signal Processing, Volume 1 (1991), pp. 705-8.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. VITERBI, A.J. 1967. Error bounds for convolutional codes and an asymptotically optimal decoding algorithm. IEEE Transactions on Information Theory IT-13, 260-9.]]Google ScholarGoogle ScholarCross RefCross Ref
  52. WAIBEL, A. AND LEE, K.-F. Eds. 1990. Readings in Speech Recognition. Morgan Kaufmann.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. WEBER, A. AND KLEMM, R. 1995. Economy of description for single-valued transducers. Information and Computation 118, 327-40.]] Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Algorithmic aspects in speech recognition: an introduction

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Journal of Experimental Algorithmics
      ACM Journal of Experimental Algorithmics  Volume 2, Issue
      1997
      150 pages
      ISSN:1084-6654
      EISSN:1084-6654
      DOI:10.1145/264216
      Issue’s Table of Contents

      Copyright © 1997 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 1 January 1997
      Published in jea Volume 2, Issue

      Qualifiers

      • article

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader