ABSTRACT
In leading morpho-phonological theories and state-of-the-art text-to-speech systems it is assumed that word pronunciation cannot be learned or performed without in-between analyses at several abstraction levels (e.g., morphological, graphemic, phonemic, syllabic, and stress levels). We challenge this assumption for the case of English word pronunciation. Using igtree, an inductive-learning decision-tree algorithms, we train and test three word-pronunciation systems in which the number of abstraction levels (implemented as sequenced modules) is reduced from five, via three, to one. The latter system, classifying letter strings directly as mapping to phonemes with stress markers, yields significantly better generalisation accuracies than the two multi-module systems. Analyses of empirical results indicate that positive utility effects of sequencing modules are outweighed by cascading errors passed on between modules.
- Allen, J., S. Hunnicutt, and D. Klatt. 1987. From text to speech: The MITalk system. Cambridge, UK: Cambidge University Press. Google ScholarDigital Library
- Bloomfield, L. 1933. Language. New York: Holt, Rinehard and Winston.Google Scholar
- Breiman, L., J. Friedman, R. Ohlsen, and C. Stone. 1984. Classification and regression trees. Belmont, CA: Wadsworth International Group.Google Scholar
- Burnage, G., 1990. CELBX: A guide for users. Centre for Lexical Information, Nijmegen.Google Scholar
- Chomsky, N. and M. Halle. 1968. The sound pattern of English. New York, NY: Harper and Row.Google Scholar
- Daelemans, W. 1988. Grafon: A grapheme-to-phoneme system for Dutch. In Proceedings Twelfth International Conference on Computational Linguistics (COLING-88), Budapest, pages 133--138. Google ScholarDigital Library
- Daelemans, W. 1996. Experience-driven language acquisition and processing. In M. Van der Avoird and C. Corsius, editors, Proceedings of the CLS Opening Academic Year 1996--1997. Tilburg: CLS, pages 83--95.Google Scholar
- Daelemans, W., S. Gillis, and G. Durieux. 1994. The acquisition of stress: a data-oriented approach. Computational Linguistics, 20(3):421--451. Google ScholarDigital Library
- Daelemans, W. and A. Van den Bosch. 1992. Generalisation performance of backpropagation learning on a syllabification task. In M. F. J. Drossaers and A. Nijholt, editors, TWLT3: Connectionism and Natural Language Processing, pages 27--37, Enschede. Twente University.Google Scholar
- Daelemans, W. and A. Van den Bosch. 1997. Language-independent data-oriented grapheme-to-phoneme conversion. In J. P. H. Van Santen, R. W. Sproat, J. P. Olive, and J. Hirschberg, editors, Progress in Speech Processing. Berlin: Springer-Verlag, pages 77--89.Google Scholar
- Daelemans, W., A. Van den Bosch, and A. Weijters. 1997. IGTree: using trees for classification in lazy learning algorithms. Artificial Intelligence Review, 11:407--423. Google ScholarDigital Library
- De Saussure, F. 1916. Course de linguistique générale. Paris: Payot. edited posthumously by C. Bally and A. Riedlinger.Google Scholar
- Dietterich, T. G., H. Hild, and G. Bakiri. 1995. A comparison of ID 3 and backpropagation for English text-to-speech mapping. Machine Learning, 19(1):5--28.Google ScholarDigital Library
- Goldsmith, J. 1976. An overview of autosegmental phonology. Linguistic Analysis, 2:23--68.Google Scholar
- Hunnicutt, S. 1976. Phonological rules for a text-to-speech system. American Journal of Computational Linguistics, Microfiche 57:1--72.Google Scholar
- Hunnicutt, S. 1980. Grapheme-phoneme rules: a review. Technical Report STL QPSR 2--3, Speech Transmission Laboratory, KTH, Sweden.Google Scholar
- Koskenniemi, K. 1984. A general computational model for wordform recognition and production. In Proceedings of the Tenth International Conference on Computational Linguistics / 22nd Annual Conference of the ACL, pages 178--181. Google ScholarDigital Library
- Liberman, M. and A. Prince. 1977. On stress and linguistic rhythm. Linguistic Inquiry, (8):249--336.Google Scholar
- Mitchell, T. 1997. Machine learning. New York, NY: McGraw Hill. Google ScholarDigital Library
- Mohanan, K. P. 1986. The theory of lexical phonology. Dordrecht: D. Reidel.Google Scholar
- Piatelli-Palmarini, M., editor. 1980. Language learning: The debate between Jean Piaget and Noam Chomsky. Cambridge, MA: Harvard University Press.Google Scholar
- Quinlan, J. R. 1986. Induction of decision trees. Machine Learning, 1:81--206. Google ScholarCross Ref
- Quinlan, J. R. 1993. c4.5: Programs for machine learning. San Mateo, CA: Morgan Kaufmann. Google ScholarDigital Library
- Sejnowski, T. J. and C. S. Rosenberg. 1987. Parallel networks that learn to pronounce English text. Complex Systems, 1:145--168.Google Scholar
- Shavlik, J. W., R. J. Mooney, and G. G. Towell. 1991. Symbolic and neural learning algorithms: An experimental comparison. Machine Learning, 6:111--143. Google ScholarDigital Library
- Stanfill, C. and D. Waltz. 1986. Toward memory-based reasoning. Communications of the ACM, 29(12):1213--1228. Google ScholarDigital Library
- Van den Bosch, A. 1997. Learning to pronounce written words, a study in inductive language learning. Ph.D. thesis, Universiteit Maastricht.Google Scholar
- Van den Bosch, A. and W. Daelemans. 1993. Data-oriented methods for grapheme-to-phoneme conversion. In Proceedings of the 6th Conference of the EACL, pages 45--53. Google ScholarDigital Library
- Van den Bosch, A., W. Daelemans, and A. Weijters. 1996. Morphological analysis as classification: an inductive-learning approach. In K. Oflazer and H. Somers, editors, Proceedings of NeMLaP-2, Ankara, Turkey, pages 79--89.Google Scholar
- Weijters, A. 1991. A simple look-up procedure superior to NETtalk? In Proceedings of icann-91, Espoo, Finland.Google ScholarCross Ref
- Weiss, S. and C. Kulikowski. 1991. Computer systems that learn. San Mateo, CA: Morgan Kaufmann.Google ScholarDigital Library
- Wolpert, D. H. 1990. Constructing a generalizer superior to NETtalk via a mathematical theory of generalization. Neural Networks, 3:445--452. Google ScholarDigital Library
- Yvon, F. 1996. Prononcer par analogie: motivation, formalisation et évaluation. Ph.D. thesis, Ecole Nationale Supérieure des Télécommunication, Paris.Google Scholar
Recommendations
Within-word pronunciation variation modeling for Arabic ASRs: a direct data-driven approach
Pronunciation variation is a major obstacle in improving the performance of Arabic automatic continuous speech recognition systems. This phenomenon alters the pronunciation spelling of words beyond their listed forms in the pronunciation dictionary, ...
Cross-word Arabic pronunciation variation modeling for speech recognition
One of the problems in the speech recognition of Modern Standard Arabic (MSA) is the cross-word pronunciation variation. Cross-word pronunciation variations alter the phonetic spelling of words beyond their listed forms in the phonetic dictionary, ...
Improving the Arabic pronunciation dictionary for phone and word recognition with linguistically-based pronunciation rules
NAACL '09: Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational LinguisticsIn this paper, we show that linguistically motivated pronunciation rules can improve phone and word recognition results for Modern Standard Arabic (MSA). Using these rules and the MADA morphological analysis and disambiguation tool, multiple ...
Comments