research-article

Free Access

Modularity in inductively-learned word pronunciation systems

Authors:
Antal van den Bosch

Tilburg University, Tilburg, The Netherlands

Tilburg University, Tilburg, The Netherlands
View Profile

,
Ton Weijters

Eindhoven University of Technology, Eindhoven, The Netherlands

Eindhoven University of Technology, Eindhoven, The Netherlands
View Profile

,
Walter Daelemans

Tilburg University, Tilburg, The Netherlands

Tilburg University, Tilburg, The Netherlands
View Profile

NeMLaP3/CoNLL '98: Proceedings of the Joint Conferences on New Methods in Language Processing and Computational Natural Language LearningJanuary 1998Pages 185–194

Published:11 January 1998Publication History

NeMLaP3/CoNLL '98: Proceedings of the Joint Conferences on New Methods in Language Processing and Computational Natural Language Learning

Pages 185–194

ABSTRACT

In leading morpho-phonological theories and state-of-the-art text-to-speech systems it is assumed that word pronunciation cannot be learned or performed without in-between analyses at several abstraction levels (e.g., morphological, graphemic, phonemic, syllabic, and stress levels). We challenge this assumption for the case of English word pronunciation. Using igtree, an inductive-learning decision-tree algorithms, we train and test three word-pronunciation systems in which the number of abstraction levels (implemented as sequenced modules) is reduced from five, via three, to one. The latter system, classifying letter strings directly as mapping to phonemes with stress markers, yields significantly better generalisation accuracies than the two multi-module systems. Analyses of empirical results indicate that positive utility effects of sequencing modules are outweighed by cascading errors passed on between modules.

References

Allen, J., S. Hunnicutt, and D. Klatt. 1987. From text to speech: The MITalk system. Cambridge, UK: Cambidge University Press. Google ScholarDigital Library
Bloomfield, L. 1933. Language. New York: Holt, Rinehard and Winston.Google Scholar
Breiman, L., J. Friedman, R. Ohlsen, and C. Stone. 1984. Classification and regression trees. Belmont, CA: Wadsworth International Group.Google Scholar
Burnage, G., 1990. CELBX: A guide for users. Centre for Lexical Information, Nijmegen.Google Scholar
Chomsky, N. and M. Halle. 1968. The sound pattern of English. New York, NY: Harper and Row.Google Scholar
Daelemans, W. 1988. Grafon: A grapheme-to-phoneme system for Dutch. In Proceedings Twelfth International Conference on Computational Linguistics (COLING-88), Budapest, pages 133--138. Google ScholarDigital Library
Daelemans, W. 1996. Experience-driven language acquisition and processing. In M. Van der Avoird and C. Corsius, editors, Proceedings of the CLS Opening Academic Year 1996--1997. Tilburg: CLS, pages 83--95.Google Scholar
Daelemans, W., S. Gillis, and G. Durieux. 1994. The acquisition of stress: a data-oriented approach. Computational Linguistics, 20(3):421--451. Google ScholarDigital Library
Daelemans, W. and A. Van den Bosch. 1992. Generalisation performance of backpropagation learning on a syllabification task. In M. F. J. Drossaers and A. Nijholt, editors, TWLT3: Connectionism and Natural Language Processing, pages 27--37, Enschede. Twente University.Google Scholar
Daelemans, W. and A. Van den Bosch. 1997. Language-independent data-oriented grapheme-to-phoneme conversion. In J. P. H. Van Santen, R. W. Sproat, J. P. Olive, and J. Hirschberg, editors, Progress in Speech Processing. Berlin: Springer-Verlag, pages 77--89.Google Scholar
Daelemans, W., A. Van den Bosch, and A. Weijters. 1997. IGTree: using trees for classification in lazy learning algorithms. Artificial Intelligence Review, 11:407--423. Google ScholarDigital Library
De Saussure, F. 1916. Course de linguistique générale. Paris: Payot. edited posthumously by C. Bally and A. Riedlinger.Google Scholar
Dietterich, T. G., H. Hild, and G. Bakiri. 1995. A comparison of ID 3 and backpropagation for English text-to-speech mapping. Machine Learning, 19(1):5--28.Google ScholarDigital Library
Goldsmith, J. 1976. An overview of autosegmental phonology. Linguistic Analysis, 2:23--68.Google Scholar
Hunnicutt, S. 1976. Phonological rules for a text-to-speech system. American Journal of Computational Linguistics, Microfiche 57:1--72.Google Scholar
Hunnicutt, S. 1980. Grapheme-phoneme rules: a review. Technical Report STL QPSR 2--3, Speech Transmission Laboratory, KTH, Sweden.Google Scholar
Koskenniemi, K. 1984. A general computational model for wordform recognition and production. In Proceedings of the Tenth International Conference on Computational Linguistics / 22nd Annual Conference of the ACL, pages 178--181. Google ScholarDigital Library
Liberman, M. and A. Prince. 1977. On stress and linguistic rhythm. Linguistic Inquiry, (8):249--336.Google Scholar
Mitchell, T. 1997. Machine learning. New York, NY: McGraw Hill. Google ScholarDigital Library
Mohanan, K. P. 1986. The theory of lexical phonology. Dordrecht: D. Reidel.Google Scholar
Piatelli-Palmarini, M., editor. 1980. Language learning: The debate between Jean Piaget and Noam Chomsky. Cambridge, MA: Harvard University Press.Google Scholar
Quinlan, J. R. 1986. Induction of decision trees. Machine Learning, 1:81--206. Google ScholarCross Ref
Quinlan, J. R. 1993. c4.5: Programs for machine learning. San Mateo, CA: Morgan Kaufmann. Google ScholarDigital Library
Sejnowski, T. J. and C. S. Rosenberg. 1987. Parallel networks that learn to pronounce English text. Complex Systems, 1:145--168.Google Scholar
Shavlik, J. W., R. J. Mooney, and G. G. Towell. 1991. Symbolic and neural learning algorithms: An experimental comparison. Machine Learning, 6:111--143. Google ScholarDigital Library
Stanfill, C. and D. Waltz. 1986. Toward memory-based reasoning. Communications of the ACM, 29(12):1213--1228. Google ScholarDigital Library
Van den Bosch, A. 1997. Learning to pronounce written words, a study in inductive language learning. Ph.D. thesis, Universiteit Maastricht.Google Scholar
Van den Bosch, A. and W. Daelemans. 1993. Data-oriented methods for grapheme-to-phoneme conversion. In Proceedings of the 6th Conference of the EACL, pages 45--53. Google ScholarDigital Library
Van den Bosch, A., W. Daelemans, and A. Weijters. 1996. Morphological analysis as classification: an inductive-learning approach. In K. Oflazer and H. Somers, editors, Proceedings of NeMLaP-2, Ankara, Turkey, pages 79--89.Google Scholar
Weijters, A. 1991. A simple look-up procedure superior to NETtalk? In Proceedings of icann-91, Espoo, Finland.Google ScholarCross Ref
Weiss, S. and C. Kulikowski. 1991. Computer systems that learn. San Mateo, CA: Morgan Kaufmann.Google ScholarDigital Library
Wolpert, D. H. 1990. Constructing a generalizer superior to NETtalk via a mathematical theory of generalization. Neural Networks, 3:445--452. Google ScholarDigital Library
Yvon, F. 1996. Prononcer par analogie: motivation, formalisation et évaluation. Ph.D. thesis, Ecole Nationale Supérieure des Télécommunication, Paris.Google Scholar

Recommendations

Within-word pronunciation variation modeling for Arabic ASRs: a direct data-driven approach

Pronunciation variation is a major obstacle in improving the performance of Arabic automatic continuous speech recognition systems. This phenomenon alters the pronunciation spelling of words beyond their listed forms in the pronunciation dictionary, ...
Read More
Cross-word Arabic pronunciation variation modeling for speech recognition

One of the problems in the speech recognition of Modern Standard Arabic (MSA) is the cross-word pronunciation variation. Cross-word pronunciation variations alter the phonetic spelling of words beyond their listed forms in the phonetic dictionary, ...
Read More
Improving the Arabic pronunciation dictionary for phone and word recognition with linguistically-based pronunciation rules
NAACL '09: Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics

In this paper, we show that linguistically motivated pronunciation rules can improve phone and word recognition results for Modern Standard Arabic (MSA). Using these rules and the MADA morphological analysis and disambiguation tool, multiple ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
NeMLaP3/CoNLL '98: Proceedings of the Joint Conferences on New Methods in Language Processing and Computational Natural Language Learning
January 1998
332 pages
ISBN:0725806346
Conference Chair:
David M. W. Powers
Flinders Uni, Australia
Sponsors
In-Cooperation
Publisher
Association for Computational Linguistics
United States
Publication History
- Published: 11 January 1998
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 2
  Total Citations
  View Citations
- 164
  Total Downloads
- Downloads (Last 12 months)24
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Modularity in inductively-learned word pronunciation systems

NeMLaP3/CoNLL '98: Proceedings of the Joint Conferences on New Methods in Language Processing and Computational Natural Language Learning

ABSTRACT

References

Cited By

Recommendations

Within-word pronunciation variation modeling for Arabic ASRs: a direct data-driven approach

Cross-word Arabic pronunciation variation modeling for speech recognition

Improving the Arabic pronunciation dictionary for phone and word recognition with linguistically-based pronunciation rules

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Modularity in inductively-learned word pronunciation systems

NeMLaP3/CoNLL '98: Proceedings of the Joint Conferences on New Methods in Language Processing and Computational Natural Language Learning

ABSTRACT

References

Cited By

Recommendations

Within-word pronunciation variation modeling for Arabic ASRs: a direct data-driven approach

Cross-word Arabic pronunciation variation modeling for speech recognition

Improving the Arabic pronunciation dictionary for phone and word recognition with linguistically-based pronunciation rules

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media