Skip to main content
Log in

The Role of Evaluation in the Development of Spoken Language Systems

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

In this article, several criteria and paradigms are described tomeasure the performance of spoken language systems developed in theframework of national and international research projects. Theseevaluations are carried out in the domain of spontaneous human-humaninteraction as supported by machine translation systems. They are alsoapplied in the domain of spontaneous human-machine interactiontypically used in information retrieval applications. Some evaluationparadigms are discussed in more detail. It is also shown that officialperformance tests and site-specific evaluation criteria arecomplementary in use.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Allen, J.F., Miller, B.W., Ringger, E.K., and Sikorski, T. (1996). A robust system for natural spoken dialogue. Proceedings of the 34th Annual Meeting of the Association of Computational Linguistics (ACL), Santa Cruz, USA, pp. 62–70.

  • d’Alessandro, C., Aubergé, V., Bailly, G., Béchet, F., Boula de Mareüil, P., Foukia, S., Goldman, J.P., Isabelle, J.F., Keller, E., Marchal, A., Mertens, P., Pagel, V., O’Shaughnessy, D., Richard, G., Talon, M.-H., Wehrli, E., and Yvon, F. (1997). Vers l’évaluation de systèmes de synthèse de parole à partir du texte en français. Proceedings of the Journées Scientifiques et Techniques du Réseau Francophone d’Ingénierie de la Langue de l’AUPELF-UREF, Avignou, France, pp. 393–397.

  • Bennacef, S.K., Bonneau-Maynard, H., Gauvain, J.L., Lamel, L.F., and Minker, W. (1994). A spoken language system for information retrieval. Proceedings of the International Conference on Speech and Language Processing (ICSLP), Yokohama, Japan, pp. 1271–1274.

  • Bruce, B. (1975). Case systems for natural language. Artificial Intelligence, 6:327–360.

    Google Scholar 

  • Dahl, D.A., Bates, M., Brown, M., Fisher, W., Huncke-Smith, K., Pallett, D., Pao, C., Rudnicky, A., and Shriberg, E. (1992). Expanding the scope of the ATIS task: The ATIS-3 corpus. Proceedings of the ARPA Workshop on Human Language Technology, Plainsborrow, USA, pp. 43–48.

  • Dolmazon, J.M., Bimbot, F., Adda, G., El Bèze, M., Caërou, J.C., Zeiliger, J., and Adda-Decker, M. (1997). Organisation de la première campagne aupelf pour l’évaluation des systèmes de dictée vocale. Proceedings of the Journées Scientifiques et Techniques du Réseau Francophone d’Ingénierie de la Langue de l’AUPELF-UREF, Avignou, France, pp. 13–18.

  • Gates, D., Lavie, A., Levin, L., Waibel, A., Gavaldà, M., Mayfield, L., Woszczyna, M., and Zahn, P. (1996). End-to-end evaluation in JANUS: A speech-to-speech translation system. Proceedings of the European Conference on Artificial Intelligence (ECAI), Budapest, Hungary, pp. 35–40.

  • Gauvain, J.L., Bennacef, S., Devillers, L., Lamel, L., and Rosset, S. (1997). Spoken language component of the MASK kiosk. In K. Varghese and S. Pfleger (Eds.), Human Comfort & Security of Information Systems. Berlin/Heidelberg: Springer-Verlag, pp. 93–103.

    Google Scholar 

  • Gibbon, D., Moore, R., and Winski, R. (Eds.) (1997). Handbook of Standards and Resources for Spoken Language Systems. Berlin/New York: Walter de Gruyter.

    Google Scholar 

  • Life, A., Salter, I., Temem, J.N., Bernard, F., Rosset, S., Bennacef, S., and Lamel, L. (1996). Data collection for the MASK kiosk: WOZ vs. prototype system. Proceedings of the International Conference on Speech and Language Processing (ICSLP), Philadelphia, USA, pp. 1672–1675.

  • MADCOW (1992). Multi-site data collection for a spoken language corpus. Proceedings of the DARPA Workshop on Speech and Natural Language, Harriman, USA, pp. 7–14.

  • Mariani, J.J. (1993). Overview of the cocosda initiative. Workshop of the International Coordinating Committee on Speech Databases and Speech I/O System Assessment, Berlin, Germany, pp. 1–3.

  • Markowitz, J.A. (1996). Using Speech Recognition. Upper Saddle River, NJ: Prentice Hall.

    Google Scholar 

  • Minami, Y., Shikano, K., Takahashi, S., Yamada, T., Yoshioka, O., and Furui, S. (1995). Large-vocabulary continuous speech recognition algorithm applied to a multi-modal telephone directory assistance system. Speech Communication, 15:301–310.

    Google Scholar 

  • Minker, W. (1998). Evaluation methodologies for interactive speech systems. Proc. First International Conference on Language Resources and Evaluation(LREC), Granada, Spain, pp. 199–206, May.

  • Minker, W., Bennacef, S.K., and Gauvain, J.L. (1996). A stochastic case frame approach for natural language understanding. Proceedings of the International Conference on Speech and Language Processing (ICSLP), Philadelphia, USA, pp. 1013–1016.

  • Néel, F., Chollet, G., Lamel, L.F., Minker, W., and Constantinescu, A. (1996). Reconnaissance et compréhension—Évaluation et applications. Fondements et perspectives en Traitement Automatique de la Parole, AUPELF-UREF, Paris, France, pp. 331–367.

  • Oerder, M. and Aust, H. (1994). A realtime prototype of an automatic inquiry system. Proceedings of the International Conference on Speech and Language Processing (ICSLP), Yokohama, Japan, pp. 703–706.

  • Pallett, D.S. (1990). DARPA ATIS test results June 1990. Proceedings of the DARPA Workshop on Speech and Natural Language, Hidden Valley, USA, pp. 114–121.

  • Pallett, D.S. (1991). DARPA resource management and ATIS benchmark test poster session. Proceedings of the DARPA Workshop on Speech and Natural Language, Pacific Grove, USA, pp. 49–58.

  • Pallett, D.S., Dahlgren, N.L., Fiscus, J.G., Fisher, W.M., Garofolo, J.S., and Tjaden, B.C. (1992). DARPA February 1992 ATIS benchmark test results. Proceedings of the DARPA Workshop on Speech and Natural Language, Harriman, USA, pp. 15–27.

  • Pallett, D.S., Fiscus, J.G., Fisher, W.M., Garofolo, J., Lund, B.A., Martin, A., and Przybocki, M.A. (1995). 1994 Benchmark tests for the ARPA spoken language program. Proceedings of the ARPA Workshop on Spoken Language Technology, Austin, USA, pp. 5–36.

  • Pallett, D.S., Fiscus, J.G., Fisher, W.M., Garofolo, J., Lund, B.A., and Przybocki. M.A. (1994). 1993 Benchmark tests for the ARPA spoken language program. Proceedings of the ARPA Workshop on Spoken Language Technology, Plainsborrow, USA, pp. 15–40.

  • Rabiner, L.R. (1986). A tutorial on hidden Markov models and selected applications in speech recognition. IEEE Transactions on Acoustics, Speech and Signal Processing, 77(2):257–285.

    Google Scholar 

  • Ramshaw, L.A. and Boisen, S. (1990). An SLS answer comparator. Technical report, BBN Systems and Technologies Corporation, SLS Note 7, Cambridge.

  • Waibel, A., Finke, M., Gates, D., Gavaldà, M., Kemp, T., Lavie, A., Maier, M., Mayfield, L., McNair, A., Rogina, I., Shima, K., Sloboda, T., Woszczyna, M., Zeppenfeld, T., and Zahn, P. (1996). JANUS-II-Translation of spontaneous conversational speech. Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), Atlanta, USA, pp. 409–412.

  • Young, S., Adda-Decker, M., Aubert, X., Dugast, C., Gauvain, J.L., Kershaw, D.J., Lamel, L., Leeuwen, D.A., Pye, D., Robinson, A.J., Steeneken, H.J.M., and Woodland, P.C. (1997). Multilingual large vocabulary speech recognition: The European SQALE project. Computer Speech and Language, 11:73–89.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Minker, W. The Role of Evaluation in the Development of Spoken Language Systems. International Journal of Speech Technology 3, 5–14 (1999). https://doi.org/10.1023/A:1009660908880

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1009660908880

Navigation