article

Free Access

Algorithmic aspects in speech recognition: an introduction

Authors:
Adam L. Buchsbaum

AT&T Labs, Florham Park, NJ

AT&T Labs, Florham Park, NJ
View Profile

,
Raffaele Giancarlo

Univ. di Palermo, Palermo, Italy

Univ. di Palermo, Palermo, Italy
View Profile

Authors Info & Claims

ACM Journal of Experimental Algorithmics Volume 2pp 1–eshttps://doi.org/10.1145/264216.264219

Published:01 January 1997Publication History

ACM Journal of Experimental Algorithmics

Abstract

Speech recognition is an area with a considerable literature, but there is little discussion of the topic within the computer science algorithms literature. Many computer scientists, however, are interested in the computational problems of speech recognition. This paper presents the field of speech recognition and describes some of its major open problems from an algorithmic viewpoint. Our goal is to stimulate the interest of algorithm designers and experimenters to investigate the algorithmic problems of effective automatic speech recognition.

Supplemental Material

Available for Download

vol2nbr1.ps (835 KB)

tar

vol2nbr1.tex.tar (1.6 MB)

References

ALLEN, J.B. 1996. Harvey Fletcher's role in the creation of communication acoustics. Journal of the Acoustical Society of America 99, 4, 1825-39.]]Google ScholarCross Ref
BAHL, L. R., BAKIS, R., COHEN, P. S., COLE, A. G., JELINEK, F., LEWIS, B. L., AND MERCER, R.L. 1980. Further results on the recognition of a continuously read natural corpus. In Proc. IEEE Int'l. Conf. on Acoustics, Speech, and Signal Processing, Volume 3 (1980), pp. 872-5.]]Google ScholarCross Ref
BAHL, L. R., BAKIS, R., DE SOUZA, P. V., AND MERCER, R.L. 1988. Obtaining candidate words by polling in a large vocabulary speech recognition system. In Proc. INNN Int'l. Conf. on Acoustics, Speech, and Signal Processing, Volume 1 (1988), pp. 489-92.]]Google ScholarCross Ref
BAHL, L. R., GENNARO, S. V. D., GOPALAKRISHNAN, P. S., AND MERCER, R.L. 1993. A fast approximate acoustic match for large vocabulary speech recognition. IEEE Transactions on Speech and Audio Processing 1, 59-67.]]Google ScholarCross Ref
BAHL, L. R., GOPALAKRISHNAN, P. S., KANEVSKI, D., AND NAHAMOO, D. 1989. Matrix fast match: A fast method for identifying a short list of candidate words for decoding. In Proc. IEEE Int'l. Conf. on Acoustics, Speech, and Signal Processing, Volume 1 (1989), pp. 345-8.]]Google ScholarCross Ref
BAHL, L. R., JELINEK, F., AND MERCER, R.L. 1983. A maximum likelihood approach to continuous speech recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence PAMI-5, 179-190.]]Google ScholarDigital Library
BAUM, L. E. AND EAGON, J.A. 1967. An inequality with applications to statistical estimation for probabilistic functions of a Markov process and to a model for ecology. Bulletin of the American Mathematical Society 73, 360-3.]]Google ScholarCross Ref
BAUM, L. E. AND SELL, G.R. 1968. Growth transformations for functions on manifolds. Pacific Journal of Mathematics 27, 211-27.]]Google ScholarCross Ref
BELLMAN, R. 1958. On a routing problem. Quarterly of Applied Mathematics 16, 87-90.]]Google ScholarCross Ref
BERSTEL, J. 1979. Transduction and Context-Free Languages, Volume 38 of Leitfaden der angewandten Mathematik und Mechanik LAMM. Springer-Verlag.]]Google Scholar
BERSTEL, J. AND REUTENAUER, C. 1988. Rational Series and Their Languages, Volume 12 of EATCS Monographs on Theoretical Computer Science. Springer-Verlag.]] Google ScholarDigital Library
BOOTHROYD, A. 1968. Statistical theory of the speech discrimination score. Journal of the Acoustical Society of America 43, 2, 362-7.]]Google ScholarCross Ref
BOOTHROYD, A. AND NITTROUER, S. 1988. Mathematical treatment of context effects in phoneme and word recognition. Journal of the Acoustical Society of America 84, 1, 101- 14.]]Google ScholarCross Ref
BRESLAUER, D. 1996. The suffix tree of a tree and minimizing sequential transducers. In Proc. 7th Symposium on Combinatorial Pattern Matching (1996).]] Google ScholarDigital Library
CHOFFRUT, C. 1978. Contributions á l'étude de quelques families remarquables de function rationnelles. Ph. D. thesis, LITP-Université Paris 7, Paris, France.]]Google Scholar
CORMEN, T. H., LEISERSON, C. E., AND RIVEST, R.L. 1991. Introduction to Algorithms. The MIT Electrical Engineering and Computer Science Series. MIT Press, Cambridge, MA.]] Google ScholarDigital Library
DIJKSTRA, E.W. 1959. A note on two problems in connexion with graphs. Numerische Mathematik 1, 269-271.]]Google ScholarDigital Library
EDMONDS, J. AND KARP, R.M. 1972. Theoretical improvements in algorithmic efficiency for network flow problems. Journal of the ACM 19, 248-64.]] Google ScholarDigital Library
EILENBERG, S. 1974. Automata, Languages, and Machines, Volume A. Academic Press, San Diego.]] Google ScholarDigital Library
ELGOT, C. C. AND MEZEI, J.E. 1965. On relations defined by generalized finite automata. IBM Journal of Research and Development 9, 47-68.]]Google ScholarDigital Library
FLETCHER H. AND GALT, R.H. 1950. The perception of speech and its relationto telephony. Journal of the Acoustical Society of America 22, 2, 89-151.]]Google ScholarCross Ref
FORD, L. R. AND FULKERSON, D.R. 1962. Flows in Networks. Princeton University Press, Princeton, NJ.]]Google Scholar
GABOW, H.N. 1985. Scaling algorithms for network problems. Journal of Computer and System Sciences 31, 148-68.]] Google ScholarDigital Library
GABOW, H. N. AND TARJAN, R. E. 1989. Faster scaling algorithms for network problems. SIAM Journal on Computing 18, 1013-36.]] Google ScholarDigital Library
HART, P. E., NILSSON, N. J., AND RAPHAEL, B. 1968. A formal basis for the heuristic determination of minimum cost paths. IEEE Transactions on Systems, Science, and Cybernetics 4, 100-7.]]Google ScholarCross Ref
HOPCROFT, J. E. AND ULLMAN, J.D. 1979. Introduction to Automata Theory, Languages, and Computation. Addison-Wesley Series in Computer Science. Addison-Wesley, Reading, MA.]] Google ScholarDigital Library
JELINEK, F., BAHL, L. R., AND MERCER, R.L. 1975. Design of a linguistic statistical decoder for the recognition of continuous speech. IEEE Transactions on Information Theory IT-21, 250-6.]]Google ScholarCross Ref
JELINEK, F., MERCER, R. L., AND ROUKOS, S. 1992. Principles of lexical language modeling for speech recognition. In S. FURUI AND M. M. SONDHI Eds., Advances in Speech Signal Processing, Chapter 21, pp. 651-99. New York: Marcel Dekker.]]Google Scholar
KENNY, P., HOLLAN, R., GUPTA, V. N., LENNING, M., MERMELSTEIN, P., AND O'SHAUGHNESSY, D. 1993. A<sup>*</sup>-Admissible heuristics for rapid lexical access. IEEE Transactions on Speech and Audio Processing 1, 49-57.]]Google ScholarCross Ref
KUICH, W. AND SALOMAA, A. 1986. Semirings, Automata, Languages, Volume 5 of EATCS Monographs on Theoretical Computer Science. Springer-Verlag.]] Google ScholarDigital Library
LACOUTURE, R. AND MORI, R.D. 1991. Lexical tree compression. In Proc. 2nd Euro. Conf. on Speech Communication and Technology, Volume 2 (1991), pp. 581-4.]]Google Scholar
LEE, K.-F. 1990. Context-dependent phonetic hidden Markov models for speaker-independent continuous speech recognition. In A. WAIBEL AND K.-F. LEE Eds., Readings in Speech Recognition, pp. 347-65. Morgan Kaufman.]] Google ScholarDigital Library
LJOLJE, A. AND RILEY, M.D. 1992. Optimal search recognition using phone recognition and lexical access. In Proc. 2nd Int'l. Conf. on Spoken Language Processing (1992), pp. 313-316.]]Google Scholar
LOWERRE, B. AND REDDY, R. 1980. The Harpy speech understanding system. In Trends in Speech Recognition, Chapter 15, pp. 340-60. Englewood Cliffs, NJ: Prentice-Hall.]]Google Scholar
MOHRI, M. 1994. Minimization of sequential transducers. In Proc. 5th Symposium on Combinatorial Pattern Matching, Volume 807 of Lecture Notes in Computer Science (1994), pp. 151-63.]] Google ScholarDigital Library
MOHRI, M. 1997a. Finite-state transducers in language and speech processing. To appear in Computational Linguistics.]] Google ScholarDigital Library
MOHRI, M. 1997b. On the use of sequential transducers in natural language processing. In Finite State Devices in Natural Language Processing. MIT Press. To appear.]]Google Scholar
PEREIRA, F. AND RILEY, M. 1997. Speech recognition by composition of weighted finite automata. In Finite State Devices in Natural Language Processing. MIT Press. To appear.]]Google Scholar
PEREIRA, F., RILEY, M., AND SPROAT, R. 1994. Weighted rational transductions and their application to human language processing. In Proc. ARPA Human Language Technology Conf. (1994), pp. 249-54.]] Google ScholarDigital Library
PICKERING, J. B. AND ROSNER, B. S. 1993. The Oxford Acoustic Phonetic Database on Compact Disk. Oxford University Press.]]Google Scholar
RABINER, L. AND JUANG, B.-H. 1993. Fundamentals of Speech Recognition. Prentice Hall Signal Processing Series. Prentice Hall, Englewood Cliffs, NJ.]] Google ScholarDigital Library
RABINER, L.R. 1990. A tutorial on hidden Markov models and selected applications in speech recognition. In A. WAIBEL AND K.-F. LEE Eds., Readings in Speech Recognition, pp. 367-96. Morgan Kaufman.]] Google ScholarDigital Library
REUTENAUER, C. AND SCHÜTZENBERGER, M.-P. 1991. Minimization of rational word functions. SIAM Journal on Computing 20, 4, 669-85.]] Google ScholarDigital Library
RILEY, M. D., LJOLJE, A., HINDLE, D., AND PEREIRA, F. C. N. 1995. The AT&T 60,000 word speech-to-text system. In J. M. PARDO, E. ENRÌQUEZ, J. ORTEGA, J. FERREIROS, J. MACÌAS, AND F. J. VALVERDE Eds., Proc. 4th Euro. Conf. on Speech Communication and Technology, Volume 1 (1995), pp. 207-210.]]Google Scholar
ROCHE, E. 1993. Analyse syntaxique transformationelle du francais par transducteurs et lexique-grammaire. Ph.D. thesis, LITP-Université Paris 7, Paris, France.]]Google Scholar
ROCHE, E. 1995. Smaller representations for finite-state transducers and finite-state automata. In Proc. 6th Symposium on Combinatorial Pattern Matching, Volume 937 of Lecture Notes in Computer Science (1995), pp. 352-65.]]Google ScholarCross Ref
SCHWARTZ, R., CHOW, Y., ROUCOS, S., KRASNER, M., AND MAKHOUL, J. 1984. Improved hidden Markov modeling of phonemes for continuous speech recognition. In Proc. IEEE Int'l. Conf. on Acoustics, Speech, and Signal Processing, Volume 3 (1984), pp. 35.6.1-4.]]Google ScholarCross Ref
SHOUP, J. E. 1980. Phonological aspects of speech recognition. In Trends in Speech Recognition , Chapter 6, pp. 125-38. Englewood Cliffs, NJ: Prentice-Hall.]]Google Scholar
SILBERZTEIN, M. 1993. Dictionnaires électroniques et analise automatique de textes: le systéme intex. Ph. D. thesis, Masson, Paris, France.]]Google Scholar
SOONG, F. K. AND HUANG, E.-F. 1991. A tree-trellis based fast search for finding the N best sentence hypotheses in continuous speech recognition. In Proc. IEEE Int'l. Conf. on Acoustics, Speech, and Signal Processing, Volume 1 (1991), pp. 705-8.]] Google ScholarDigital Library
VITERBI, A.J. 1967. Error bounds for convolutional codes and an asymptotically optimal decoding algorithm. IEEE Transactions on Information Theory IT-13, 260-9.]]Google ScholarCross Ref
WAIBEL, A. AND LEE, K.-F. Eds. 1990. Readings in Speech Recognition. Morgan Kaufmann.]] Google ScholarDigital Library
WEBER, A. AND KLEMM, R. 1995. Economy of description for single-valued transducers. Information and Computation 118, 327-40.]] Google ScholarDigital Library

Index Terms

Algorithmic aspects in speech recognition: an introduction
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Speech recognition

Recommendations

MFCC-GMM based accent recognition system for Telugu speech signals

Speech processing is very important research area where speaker recognition, speech synthesis, speech codec, speech noise reduction are some of the research areas. Many of the languages have different speaking styles called accents or dialects. ...
Read More
Acoustical pre-processing for robust speech recognition
HLT '89: Proceedings of the workshop on Speech and Natural Language

In this paper we describe our initial efforts to make SPHINX, the CMU continuous speech recognition system, environmentally robust. Our work has two major goals: to enable SPHINX to adapt to changes in microphone and acoustical environment, and to ...
Read More
Speech-Input Speech-Output Communication for Dysarthric Speakers Using HMM-Based Speech Recognition and Adaptive Synthesis System

Dysarthria is a motor speech disorder that causes inability to control and coordinate one or more articulators. This makes it difficult for a dysarthric speaker to utter certain speech sound units, thereby producing poorly articulated, slurred, and ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Journal of Experimental Algorithmics Volume 2, Issue
1997
150 pages
ISSN:1084-6654
EISSN:1084-6654
DOI:10.1145/264216
Editor:
Bernard M. E. Moret
Univ. of New Mexico, Albuquerque
Issue’s Table of Contents
Copyright © 1997 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 January 1997
Published in jea Volume 2, Issue
Author Tags
automata theory
graph searching
Qualifiers
- article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 13
  Total Citations
  View Citations
- 661
  Total Downloads
- Downloads (Last 12 months)46
- Downloads (Last 6 weeks)8
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Algorithmic aspects in speech recognition: an introduction

ACM Journal of Experimental Algorithmics

Abstract

Supplemental Material

Available for Download

References

Cited By

Index Terms

Recommendations

MFCC-GMM based accent recognition system for Telugu speech signals

Acoustical pre-processing for robust speech recognition

Speech-Input Speech-Output Communication for Dysarthric Speakers Using HMM-Based Speech Recognition and Adaptive Synthesis System

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Algorithmic aspects in speech recognition: an introduction

ACM Journal of Experimental Algorithmics

Abstract

Supplemental Material

Available for Download

References

Cited By

Index Terms

Recommendations

MFCC-GMM based accent recognition system for Telugu speech signals

Acoustical pre-processing for robust speech recognition

Speech-Input Speech-Output Communication for Dysarthric Speakers Using HMM-Based Speech Recognition and Adaptive Synthesis System

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media