Skip to main content
Log in

The NXT-format Switchboard Corpus: a rich resource for investigating the syntax, semantics, pragmatics and prosody of dialogue

  • Published:
Language Resources and Evaluation Aims and scope Submit manuscript

Abstract

This paper describes a recently completed common resource for the study of spoken discourse, the NXT-format Switchboard Corpus. Switchboard is a long-standing corpus of telephone conversations (Godfrey et al. in SWITCHBOARD: Telephone speech corpus for research and development. In Proceedings of ICASSP-92, pp. 517–520, 1992). We have brought together transcriptions with existing annotations for syntax, disfluency, speech acts, animacy, information status, coreference, and prosody; along with substantial new annotations of focus/contrast, more prosody, syllables and phones. The combined corpus uses the format of the NITE XML Toolkit, which allows these annotations to be browsed and searched as a coherent set (Carletta et al. in Lang Resour Eval J 39(4):313–334, 2005). The resulting corpus is a rich resource for the investigation of the linguistic features of dialogue and how they interact. As well as describing the corpus itself, we discuss our approach to overcoming issues involved in such a data integration project, relevant to both users of the corpus and others in the language resource community undertaking similar projects.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  • Aylett, M. P., & Turk, A. (2004). The smooth signal redundancy hypothesis: A functional explanation for relationships between redundancy, prosodic prominence, and duration in spontaneous speech. Language and Speech, 47(1):31–56.

    Article  Google Scholar 

  • Badino, L., & Clark, R. A. (2008). Automatic labeling of contrastive word pairs from spontaneous spoken English. In IEEE/ACL Workshop on Spoken Language Technology, Goa, India.

  • Bard, E., Anderson, A., Sotillo, C., Aylett, M., Doherty-Sneddon, G., & Newlands, A. (2000). Controlling the intelligibility of referring expressions in dialogue. Journal of Memory and Language, 42(1), 1–22.

    Article  Google Scholar 

  • Beckman, M., & Hirschberg, J. (1999). The ToBI annotation conventions. http://www.ling.ohio-state.edu/~tobi/ame_tobi/annotation_conventions.html. Accessed 9 June 2006

  • Bell, A., Jurafsky, D., Fosler-Lussier, E., Girand, C., Gregory, M., & Gildea, D. (2003). Effects of disfluencies, predictability, and utterance position on word form variation in English conversation. Journal of the Acoustical Society of America, 113(2), 1001–1024.

    Article  Google Scholar 

  • Bird, S., & Liberman, M. (2001). A formal framework for linguistic annotation. Speech Communication, 33(1–2), 23–60.

    Article  Google Scholar 

  • Boersma, P., & Weenink, D. (2006). Praat: doing phonetics by computer. http://www.praat.org. Accessed 9 June 2006.

  • Brants, S., Dipper, S., Hansen, S., Lezius, W., & Smith, G. (2002). The TIGER Treebank. In Proceedings of the workshop on Treebanks and linguistic theories, Sozopol.

  • Brenier, J., & Calhoun, S. (2006). Switchboard prosody annotation scheme. Internal Publication, Stanford University and University of Edinburgh: http://groups.inf.ed.ac.uk/switchboard/prosody_annotation.pdf. Accessed 15 January 2008.

  • Brenier, J., Nenkova, A., Kothari, A., Whitton, L., Beaver, D., & Jurafsky, D. (2006). The (non)utility of linguistic features for predicting prominence in spontaneous speech. In Proceedings of IEEE/ACL 2006 workshop on spoken language technology, Aruba.

  • Bresnan, J., Cueni, A., Nikitina, T., & Baayen, R. H. (2007). Predicting the dative alternation. In G. Bouma, I. Kraemer, & J. Zwarts (Eds.), Cognitive foundations of interpretation (pp. 69–94). Amsterdam: Royal Netherlands Academy of Arts and Sciences.

    Google Scholar 

  • Buráňová, E., Hajičová, E., & Sgall, P. (2000). Tagging of very large corpora: Topic-focus articulation. In Proceedings of COLING conference (pp. 278–284), Saarbrücken, Germany.

  • Calhoun, S. (2005). Annotation scheme for discourse relations in Paraphrase Corpus. Internal Publication, University of Edinburgh: http://groups.inf.ed.ac.uk/switchboard/kontrast_guidelines.pdf. Accessed 15 January 2008.

  • Calhoun, S. (2006). Information structure and the prosodic structure of English: A probabilistic relationship. PhD thesis, University of Edinburgh.

  • Calhoun, S. (2007). Predicting focus through prominence structure. In Proceedings of interspeech. Antwerp, Belgium.

  • Calhoun, S. (2009). What makes a word contrastive: Prosodic, semantic and pragmatic perspectives. In D. Barth-Weingarten, N. Dehé, & A. Wichmann (Eds.), Where prosody meets pragmatics: Research at the interface, Vol. 8 of Studies in pragmatics (pp. 53–78). Emerald, Bingley.

  • Calhoun, S. (2010). How does informativeness affect prosodic prominence? Language and Cognitive Processes. Special Issue on Prosody (to appear).

  • Carletta, J. (1996). Assessing agreement on classification tasks: The kappa statistic. Computational Linguistics, 22(2), 249–254.

    Google Scholar 

  • Carletta, J., Ashby, S., Bourban, S., Flynn, M., Guillemot, M., Hain, T., Kadlec, J., Karaiskos, V., Kraiij, W., Kronenthal, M., Lathoud, G., Lincoln, M., Lisowska, A., McCowan, M., Post, W., Reidsma, D., & Wellner, P. (2006). The AMI Meeting Corpus: A pre-announcement. In S. Renals & S. Bengio (Eds.), Machine learning for multimodal interaction: Second international workshop, Vol. 3869 of Lecture notes in computer science. Springer.

  • Carletta, J., Dingare, S., Nissim, M., & Nikitina, T. (2004). Using the NITE XML toolkit on the Switchboard Corpus to study syntactic choice: A case study. In Proceedings of LREC2004, Lisbon, Portugal.

  • Carletta, J., Evert, S., Heid, U., & Kilgour, J. (2005). The NITE XML Toolkit: Data model and query language. Language Resources and Evaluation Journal, 39(4), 313–334.

    Article  Google Scholar 

  • Crystal, D. (1969). Prosodic systems and intonation in English. Cambridge, UK: Cambridge University Press.

    Google Scholar 

  • Deshmukh, N., Ganapathiraju, A., Gleeson, A., Hamaker, J., & Picone, J. (1998). Resegmentation of Switchboard. In Proceedings of ICSLP (pp. 1543–1546), Sydney, Australia.

  • Dubey, A., Sturt, P., & Keller, F. (2005). Parallelism in coordination as an instance of syntactic priming: Evidence from corpus-based modeling. In HLT/EMNLP, Vancouver, Canada.

  • Fisher, W. M. (1997). tsylb: NIST Syllabification Software. http://www.nist.gov/speech/tool. Accessed 9 October 2005.

  • Godfrey, J., Holliman, E., & McDaniel, J. (1992). SWITCHBOARD: Telephone speech corpus for research and development. In Proceedings of ICASSP-92 (pp. 517–520).

  • Godfrey, J. J., & Holliman, E. (1997). Switchboard-1 Release 2. Linguistic Data Consortium, Philadelphia. Catalog #LDC97S62.

  • Graff, D., & Bird, S. (2000). Many uses, many annotations for large speech corpora: Switchboard and TDT as case studies. In LREC, Athens, Greece.

  • Greenberg, S., Ellis, D., & Hollenback, J. (1996). Insights into spoken language gleaned from phonetic transcription of the Switchboard Corpus. In The fourth international conference on spoken language processing (pp. S24–S27), Philadelphia, PA.

  • Halliday, M. (1968). Notes on transitivity and theme in English: Part 3. Journal of Linguistics, 4, 179–215.

    Article  Google Scholar 

  • Harkins, D. (2003). Switchboard resegmentation project. http://www.cavs.msstate.edu/hse/ies/projects/switchboard. Accessed 1 February 2005.

  • Hedberg, N., & Sosa, J. M. (2001). The prosodic structure of topic and focus in spontaneous English dialogue. In Topic & focus: A workshop on intonation and meaning. University of California, Santa Barbara, July 2001. LSA Summer Institute.

  • Jaeger, T. F., & Wasow, T. (2005). Processing as a source of accessibility effects on variation. In Proceedings of the 31st Berkeley Linguistics Society.

  • Johnson, K. (2004). Massive reduction in conversational American English. In K. Yoneyama & K. Maekawa (Eds.), Spontaneous speech: Data and analysis. Proceedings of the 1st session of the 10th international symposium (pp. 29–54), Tokyo, Japan, 2004. The National International Institute for Japanese Language.

  • Jurafsky, D., Bates, R., Coccaro, N., Martin, R., Meteer, M., Ries, K., Shriberg, E., Stolcke, A., Taylor, P., & Ess-Dykema, C. V. (1998). Switchboard discourse language modeling project report. Center for Speech and Language Processing, Johns Hopkins University, Baltimore, MD, 1998. Research Note No. 30.

  • Jurafsky, D., Shriberg, E., & Biasca, D. (1997). Switchboard SWBD-DAMSL Labeling Project Coder’s Manual, Draft 13. Technical Report 97-02, University of Colorado Institute of Cognitive Science .

  • Ladd, D. R. (2008) Intonational phonology (2nd edn.). Cambridge, UK: Cambridge University Press

    Google Scholar 

  • Laprun, C., Fiscus, J. G., Garofolo, J., & Pajot, S. (2002). A practical introduction to ATLAS. In Proceedings of LREC, Las Palmas, Spain.

  • Liberman, M. (1975). The intonational system of English. PhD thesis, MIT Linguistics, Cambridge, MA.

  • Marcus, M., Santorini, B., & Marcinkiewicz, M. A. (1993). Building a large annotated corpus of English: The Penn Treebank. Computational Linguistics, 19, 313–330.

    Google Scholar 

  • Marcus, M. P., Santorini, B., Marcinkiewicz, M. A., & Taylor, A. (1999). Treebank-3. Linguistic Data Consortium (LDC). Catalog #LDC99T42.

  • Meteer, M., & Taylor, A. (1995). Disfluency annotation stylebook for the Switchboard Corpus. Ms., Department of Computer and Information Science, University of Pennsylvania, http://www.cis.upenn.edu/pub/treebank/swbd/doc/DFL-book.ps. Accessed 30 September 2003.

  • Michaelis, L. A., & Francis, H. S. (2004). Lexical subjects and the conflation strategy. In N. Hedberg & R. Zacharski (Eds.), Topics in the grammar-pragmatics interface: Papers in honor of Jeanette K. Gundel (pp. 19–48), Benjamins.

  • Müller, C., & Strube, M. (2006). Multi-level annotation of linguistic data with MMAX2. In S. Braun, K. Kohn, & J. Mukherjee (Eds.), Corpus technology and language pedagogy: New resources, new tools, new methods, English Corpus Linguistics (Vol. 3, pp. 197–214), Peter Lang.

  • Nakatani, C., Hirschberg, J., & Grosz, B. (1995). Discourse structure in spoken language: Studies on speech corpora. In Working notes of the AAAI spring symposium on empirical methods in discourse interpretation and generation (pp. 106–112), Stanford, CA.

  • Nenkova, A., Brenier, J., Kothari, A., Calhoun, S., Whitton, L., Beaver, D., & Jurafsky, D. (2007). To memorize or to predict: Prominence labeling in conversational speech. In NAACL human language technology conference, Rochester, NY.

  • Nenkova, A., & Jurafsky, D. (2007). Automatic detection of contrastive elements in spontaneous speech. In IEEE workshop on automatic speech recognition and understanding (ASRU), Kyoto, Japan.

  • Nissim, M. (2006). Learning information status of discourse entities. In Proceedings of the empirical methods in natural language processing conference, Sydney, Australia.

  • Nissim, M., Dingare, S., Carletta, J., & Steedman, M. (2004). An annotation scheme for information status in dialogue. In Fourth language resources and evaluation conference, Lisbon, Portugal.

  • Ostendorf, M., Shafran, I., Shattuck-Hufnagel, S., Carmichael, L., & Byrne, W. (2001). A prosodically labeled database of spontaneous speech. In Proceedings of the ISCA workshop on prosody in speech recognition and understanding (pp. 119–121), Red Bank, NJ.

  • Pellom, B. (2001). SONIC: The University of Colorado continuous speech recognizer. Technical Report TR-CSLR-2001-01, University of Colorado at Boulder.

  • Pierrehumbert, J., & Hirschberg, J. (1990). The meaning of intonational contours in the interpretation of discourse. In P. Cohen, J. Morgan, & M. Pollack (Eds.), Intentions in communication (pp. 271–311). MIT Press, Cambridge, MA.

  • Pitrelli, J., Beckman, M., & Hirschberg, J. (1994). Evaluation of prosodic transcription labelling reliability in the ToBI framework. In Proceedings of the third international conference on spoken language processing (Vol. 2, pp. 123–126).

  • Prince, E. (1992). The ZPG letter: Subjects, definiteness, and information-status. In S. Thompson & W. Mann (Eds.), Discourse description: Diverse analyses of a fund raising text (pp. 295–325). Philadelphia/Amsterdam: John Benjamins.

    Google Scholar 

  • Reitter, D. (2008). Context effects in language production: Models of syntactic priming in dialogue corpora. PhD thesis, University of Edinburgh.

  • Reitter, D., Moore, J. D., & Keller, F. (2006). Priming of syntactic rules in task-oriented dialogue and spontaneous conversation. In Proceedings of the conference of the cognitive science society (pp. 685–690), Vancouver, Canada.

  • Rooth, M. (1992). A theory of focus intepretation. Natural Language Semantics, 1, 75–116.

    Article  Google Scholar 

  • Selkirk, E. (1995). Sentence prosody: Intonation, stress and phrasing. In J. Goldsmith (Ed.), The handbook of phonological theory (pp. 550–569). Cambridge, MA & Oxford: Blackwell.

    Google Scholar 

  • Shriberg, E. (1994). Preliminaries to a theory of speech disfluencies. PhD thesis, University of California at Berkeley.

  • Shriberg, E., Taylor, P., Bates, R., Stolcke, A., Ries, K., Jurafsky, D., Coccaro, N., Martin, R., Meteer, M., & Ess-Dykema, C. (1998). Can prosody aid the automatic classification of dialog acts in conversational speech? Language and Speech, 41(3–4), 439–487.

    Google Scholar 

  • Siegel, S., & Castellan, N.J. (1988). Nonparametric statistics for the behavioral sciences (2nd edition). McGraw-Hill.

    Google Scholar 

  • Sridhar, V. K. R., Nenkova, A., Narayanan, S., & Jurafsky, D. (2008). Detecting prominence in conversational speech: Pitch accent, givenness and focus. In Speech prosody, Campinas, Brazil.

  • Steedman, M. (2000). Information structure and the syntax-phonology interface. Linguistic Inquiry, 31(4), 649–689.

    Article  Google Scholar 

  • Taylor, P. (2000). Analysis and synthesis of intonation using the Tilt model. Journal of the Acoustical Society of America, 107, 1697–1714.

    Article  Google Scholar 

  • Taylor, A., Marcus, M., & Santorini, B. (2003). The Penn Treebank: An overview.

  • Terken, J., & Hirschberg, J. (1994). Deaccentuation of words representing ‘given’ information: Effects of persistence of grammatical role and surface position. Language and Speech, 37, 125–145.

    Google Scholar 

  • Vallduví, E., & Vilkuna, M. (1998). On rheme and kontrast. Syntax and Semantics, 29, 79–108.

    Google Scholar 

  • Weide, R. (1998). The Carnegie Mellon Pronouncing Dictionary [cmudict. 0.6]. Carnegie Mellon University: http://www.speech.cs.cmu.edu/cgi-bin/cmudict. Accessed 9 October 2005.

  • Yoon, T.-J., Chavarría, S., Cole, J., & Hasegawa-Johnson, M. (2004). Intertranscriber reliability of prosodic labeling on telephone conversation using ToBI. In Proceedings of ICSLP, Jeju, Korea.

  • Zaenen, A., Carletta, J., Garretson, G., Bresnan, J., Koontz-Garboden, A., Nikitina, T., O’Connor, M., & Wasow, T. (2004). Animacy encoding in English: Why and how. In B. Webber & D. Byron (Eds.), ACL 2004 workshop on discourse annotation (pp. 118–125).

  • Zhang, T., Hasegawa-Johnson, M., & Levinson, S. (2006). Extraction of pragmatic and semantic salience from spontaneous spoken English. Speech Communication, 48, 437–462.

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported by Scottish Enterprise through the Edinburgh-Stanford Link, and via EU IST Cognitive Systems IP FP6-2004-IST-4-27657 “Paco-Plus” to Mark Steedman. Thanks to Bob Ladd, Florian Jaeger, Jonathan Kilgour, Colin Matheson and Shipra Dingare for useful discussions, advice and technical help in the development of the corpus and annotation standards; and to Joanna Keating, Joseph Arko and Hannele Nicholson for their hard work in annotating. Thanks also to the creators of existing Switchboard annotations who kindly agreed to include them in the corpus, including Joseph Piccone, Malvina Nissim, Annie Zaenen, Joan Bresnan, Mari Ostendorf and their respective colleagues. Finally, thank you to the Linguistics Data Consortium for agreeing to release the corpus under a ShareAlike licence through their website, and for their work in finalising the corpus data and permissions for release.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sasha Calhoun.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Calhoun, S., Carletta, J., Brenier, J.M. et al. The NXT-format Switchboard Corpus: a rich resource for investigating the syntax, semantics, pragmatics and prosody of dialogue. Lang Resources & Evaluation 44, 387–419 (2010). https://doi.org/10.1007/s10579-010-9120-1

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10579-010-9120-1

Keywords

Navigation