Abstract
Keyword analysis is used in a range of sub-disciplines of applied linguistics from genre analyses to critically-oriented studies for different purposes ranging from producing a general characterization of a genre to identifying text-specific ideological issues. This study compares the use of log-likelihood (LL), a probability statistic, and odds ratio (OR), an effect size statistic, for keyword identification and argues that the two methods produce different keywords applicable to research focusing on different purposes. Through two case studies, keyword analyses of advance fee scams against the British National Corpus and research articles in applied linguistics against research articles from other academic disciplines, we show that both the LL and OR keywords concern the aboutness of the corpus, but differ in their specificity and pervasiveness through the corpus. LL highlights words which are relatively common in general use serving genre purposes, whereas OR highlights more specialized words serving critically-oriented purposes. Methodological and practical contributions to keyword analysis are discussed.
References
Adolphs, Svenja. 2006. Introducing electronic text analysis: A practical guide for language and literacy studies. New York: Routledge.10.4324/9780203087701Search in Google Scholar
Adolphs, Svenja, Brian Brown, Ronald Carter, Paul Crawford & Opinder Sahota. 2004. Applying corpus linguistics in a health care context. Journal of Applied Linguistics 1(1). 9–28.10.1558/japl.1.1.9.55871Search in Google Scholar
Agresti, Alan. 2002 [1990]. Categorical data analysis, 2nd edn. New York: Wiley.10.1002/0471249688Search in Google Scholar
Agresti, Alan. 2007 [1996]. An introduction to categorical data analysis, 2nd edn. New York: Wiley.10.1002/0470114754Search in Google Scholar
Anthony, Laurence. 2013a. AntWordProfiler (Version 1.4.0.1) [Computer Software]. Tokyo: Waseda University. http://www.laurenceanthony.net/software/antwordprofiler/ (accessed 8 October 2014).Search in Google Scholar
Anthony, Laurence. 2013b. A critical look at software tools in corpus linguistics. Linguistic Research 30(2). 141–161.10.17250/khisli.30.2.201308.001Search in Google Scholar
Anthony, Laurence. 2014. AntConc (Version 3.4.3) [Computer Software]. Tokyo: Waseda University. http://www.laurenceanthony.net/software/antconc/ (accessed 8 October 2014).Search in Google Scholar
Baker, Paul. 2004. Querying key words: Questions of difference, frequency, and sense in key words analysis. Journal of English Linguistics 32(4). 346–359.10.1177/0075424204269894Search in Google Scholar
Baker, Paul. 2006a. The question is, how cruel is it? Keywords, foxhunting and the House of Commons. Paper presented at AHRC ICT [Information and Communications Technology in Arts and Humanities Research] Methods Network Expert Seminar on Linguistics, Lancaster University, 8 September.Search in Google Scholar
Baker, Paul. 2006b. Using corpora in discourse analysis. London: Continuum.10.5040/9781350933996Search in Google Scholar
Bassi, Erica. 2010. A contrastive analysis of keywords in newspaper articles on the “Kyoto Protocol”. In Marina Bondi & Mike Scott (eds.), Keyness in texts, 207–218. Amsterdam: John Benjamins.10.1075/scl.41.15basSearch in Google Scholar
Bestgen, Yves & Sylviane Granger. 2014. Quantifying the development of phraseological competence in L2 English writing: An automated approach. Journal of Second Language Writing 26(1). 28–41.10.1016/j.jslw.2014.09.004Search in Google Scholar
Bestgen, Yves. 2014. Inadequacy of the chi-squared test to examine vocabulary differences between corpora. Literary & Linguistic Computing 29(2). 164–170.10.1093/llc/fqt020Search in Google Scholar
Bondi, Marina & Mike Scott (eds.). 2010. Keyness in texts. Amsterdam: John Benjamins.10.1075/scl.41Search in Google Scholar
Bowker, Lynne & Jennifer Pearson. 2002. Working with specialized language: A practical guide to using corpora. London: Routledge.10.4324/9780203469255Search in Google Scholar
Butler, Christopher S. 2001. A matter of give and take: Corpus linguistics and the predicate frame. Revista Canaria de Estudios Ingleses 42. 55–78.Search in Google Scholar
Carreon, Jonathan Rante & Richard Watson Todd. 2011. Analysing private hospital websites from a critical perspective: Potential issues of methodology, analysis and interpretation of findings. In Proceedings of the International Conference on Doing Research in Applied Linguistics [DRAL], 26–36. Bangkok: King Mongkut’s University of Technology Thonburi.Search in Google Scholar
Chujo, Kiyomi & Masao Utiyama. 2006. Selecting level-specific specialized vocabulary using statistical measures. System 34(2). 255–269.10.1016/j.system.2005.12.003Search in Google Scholar
Crawford, Lynn, Julien Pollack & David England. 2006. Uncovering the trends in project management: Journal emphases over the last 10 years. International Journal of Project Management 24. 175–184.10.1016/j.ijproman.2005.10.005Search in Google Scholar
Cruickshank, Douglas. 2001. I crave your distinguished indulgence (and all your cash). http://www.salon.com/2001/08/07/419scams/ (accessed 14 May 2015).Search in Google Scholar
Cukier, Wendy L., Eva J. Nesselroth & Susan Cody. 2007. Genre, narrative and the “Nigerian letter” in electronic mail. Proceedings of the 40th Annual Hawaii International Conference on System Sciences [HICSS’07]. 70a. http://www.computer.org/csdl/proceedings/hicss/2007/2755/00/27550070a.pdf (accessed 25 May 2015).10.1109/HICSS.2007.238Search in Google Scholar
Culpeper, Jonathan. 2002. Computers, language and characterisation: An analysis of six characters in Romeo and Juliet. In Ulla Melander-Marttala, Carin Östman & Merja Kytö (eds.), Conversation in life and in literature, 11–30. Uppsala: Universitetstryckeriet.Search in Google Scholar
Culpeper, Jonathan. 2009. Keyness: Words, parts-of-speech and semantic categories in the character-talk of Shakespeare’s Romeo and Juliet. International Journal of Corpus Linguistics 14(1). 29–59.10.1075/ijcl.14.1.03culSearch in Google Scholar
De Schryver, Gilles-Maurice. 2012. Trends in twenty-five years of academic lexicography. International Journal of Lexicography 25(4). 464–506.10.1093/ijl/ecs030Search in Google Scholar
del-Teso-Craviotto, Marisol. 2006. Words that mater: Lexical choice and gender ideologies in women’s magazines. Journal of Pragmatics 38(11). 2003–2021.10.1016/j.pragma.2005.03.012Search in Google Scholar
Dörnyei, Zoltán. 2007. Research methods in applied linguistics. Oxford: Oxford University Press.Search in Google Scholar
Dunning, Ted. 1993. Accurate methods for the statistics of surprise and coincidence. Computational Linguistics 19(1). 61–74.Search in Google Scholar
Dyrud, Marilyn A. 2005. Letters, “I brought you a good news”: An analysis of Nigerian 419 letters. In Lisa E. Gueldenzoph (ed.), Proceedings of the 2005 Association for Business Communication Annual Convention [ABC], 1–11. Irvine: The Association for Business Communication.Search in Google Scholar
Evert, Stefan. 2008. Corpora and collocations. In Anke Lüdeling & Merja Kytö (eds.), Corpus linguistics: An international handbook 2, 1212–1248. Berlin & New York: Mouton de Gruyter.Search in Google Scholar
Feng, Haiying. 2006. A corpus-based study of research grant proposal abstracts. Perspectives: Working Papers in English and Communication 17(1). 1–24.Search in Google Scholar
Freddi, Maria. 2005. Arguing linguistics: Corpus investigation of one functional variety of academic discourse. Journal of English for Academic Purposes 4(1). 5–26.10.1016/j.jeap.2003.09.002Search in Google Scholar
Gabrielatos, Costas & Paul Baker. 2008. Fleeing, sneaking, flooding a corpus analysis of discursive constructions of refugees and asylum seekers in the UK Press, 1996–2005. Journal of English Linguistics 36(1). 5–38.10.1177/0075424207311247Search in Google Scholar
Gabrielatos, Costas & Anna Marchi. 2012. Keyness: Appropriate metrics and practical issues. Paper presented at Critical Approaches to Discourse Studies, University of Bologna, 13–14 September. http://repository.edgehill.ac.uk/4196/1/Gabrielatos%26Marchi-Keyness-CADS2012.pdf (accessed 20 September 2015).Search in Google Scholar
Gabrielatos, Costas. 2007. Selecting query terms to build a specialised corpus from a restricted-access database. ICAME Journal 31. 5–44.Search in Google Scholar
Gerbig, Andrea. 2010. Key words and key phrases in a corpus of travel writing. In Marina Bondi & Mike Scott (eds.), Keyness in texts, 147–168. Amsterdam: John Benjamins.10.1075/scl.41.11gerSearch in Google Scholar
Gleick, James. 2003. You have spam. Australian Magazine March 15. 16. http://web.lexis-nexis.com/universe/document?_m=3550ffbea5787e1788de3f3a33bdabf&_docnum=48&wchp=dGLbVtz-zSkVb&+md5=34b249bcee6db14d8b237c3448899aab.Search in Google Scholar
Goldstein, Alan. 2003. Growing junk e-mail traffic has become a ‘Headache.’ Hamilton Spectator [Ontario, Canada] August 12. http://web.lexis-nexis.com/universe/document?_m=35501T6bea5787e1788de3f3a33bdabf&_docnum=48&wchp=dGLbVtz-zSkVb&_md5=34b249bcee6db14d8b237c3448899aab.Search in Google Scholar
Gooberman-Hill, Rachael, Melissa French, Paul Dieppe & Gillian Hawker. 2009. Expressing pain and fatigue: A new method of analysis to explore differences in osteoarthritis experience. Arthritis and Rheumatism 61(3). 353–360.10.1002/art.24273Search in Google Scholar
Graham, Dougal. 2014. KeyBNC [Computer Software]. Bangkok: King Mongkut’s University of Technology Thonburi. http://crs2.kmutt.ac.th/Key-BNC/ (accessed 27 November 2014).Search in Google Scholar
Gries, Stefan Th. 2014. Frequency tables, effect sizes, and explorations. In Dylan Glynn & Justyna Robinson (eds.), Corpus methods for semantics: Quantitative studies in polysemy and synonymy, 365–389. Amsterdam & Philadelphia: John Benjamins.10.1075/hcp.43.14griSearch in Google Scholar
Gries, Stefan Th. 2015. Quantitative designs and statistical techniques. In Douglas Biber & Randi Reppen (eds.), The Cambridge handbook of English corpus linguistics, 50–72. Cambridge: Cambridge University Press.10.1017/CBO9781139764377.004Search in Google Scholar
Grissom, Robert J. & John J. Kim. 2005. Effect sizes for research: A broad practical approach. New Jersey: Lawrence Erlbaum.Search in Google Scholar
Hardie, Andrew. 2014. Log Ratio – an informal introduction. http://cass.lancs.ac.uk/?p=1133 (accessed 27 August 2015).Search in Google Scholar
Jimarkon, Pattamawan & Richard Watson Todd. 2013. Red or yellow, peace or war: Agonism and antagonism in online discussion during the 2010 political unrest in Thailand. In Antoon De Rycker & Zuraidah Mohd Don (eds.), Discourse and crisis: Critical perspectives, 301–322. Amsterdam: John Benjamins.10.1075/dapsac.52.10jimSearch in Google Scholar
Kang, Ning & Qiaofeng Yu. 2011. Corpus-based stylistic analysis of tourism English. Journal of Language Teaching and Research 2(1). 129–136.10.4304/jltr.2.1.129-136Search in Google Scholar
Kich, Martin. 2005. A rhetorical analysis of fund-transfer-scam solicitations. Cercles 14. 129–142.Search in Google Scholar
Kilgarriff, Adam. 2001. Comparing corpora. International Journal of Corpus Linguistics 6(1). 97–133.10.1075/ijcl.6.1.05kilSearch in Google Scholar
Kotzé, Ernst Frederick. 2010. Author identification from opposing perspectives in forensic linguistics. Southern Africa Linguistics and Applied Language Studies 28(2). 185–197.10.2989/16073614.2010.519111Search in Google Scholar
Kwary, Deny Arnos. 2011. A hybrid method for determining technical vocabulary. System 39(2). 175–185.10.1016/j.system.2011.04.003Search in Google Scholar
Lamberger, Igor, Bojan Dobovšek & Boštjan Slak. 2013. Analysis of the fraudulent letters A.K.A. Nigerian letters. In Gorazd Meško, Andrej Sotlar & Jack R. Greene (eds.), Proceedings of the Biennial International Conference: Criminal Justice and Security–Contemporary Criminal Justice Practice and Research, 443–466. Ljubljana: University of Maribor. https://www.ncjrs.gov/pdffiles1/242949.pdf (accessed 25 May 2015).Search in Google Scholar
Leone, Paola. 2010. General spoken language and school language: Key words and discourse patterns in history textbooks. In Marina Bondi & Mike Scott (eds.), Keyness in texts, 234–248. Amsterdam: John Benjamins.10.1075/scl.41.17leoSearch in Google Scholar
Lijffijt, Jefrey, Terttu Nevalainen, Tanja Säily, Panagiotis Papapetrou, Kai Puolamäki & Heikki Mannila. 2014. Significance testing of word frequencies in corpora. Digital Scholarship in the Humanities 29(4). http://users.ics.aalto.fi/lijffijt/articles/lijffijt2015a.pdf (accessed 20 September 2015).10.1093/llc/fqu064Search in Google Scholar
Ljung, Magnus. 2002. What vocabulary tells us about genre differences: A study of lexis in five newspaper genres. Language and Computers 40(1). 181–196.10.1163/9789004334267_011Search in Google Scholar
Loudermilk, Brandon Conner. 2007. Occluded academic genres: An analysis of the MBA thought essay. Journal of English for Academic Purposes 6(3). 190–205.10.1016/j.jeap.2007.07.001Search in Google Scholar
Malavasi, Donatella & Davide Mazzi. 2010. History v. marketing: Keywords as a clue to disciplinary epistemology. In Marina Bondi & Mike Scott (eds.), Keyness in texts, 169–184. Amsterdam: John Benjamins.10.1075/scl.41.12malSearch in Google Scholar
Martínez, Antonia Sánchez. 2008. Collocation analysis of a sample corpus using some statistical measures: An empirical approach. In Rafael Monroy & Aquilino Sánchez (eds.), Proceedings of the 25th International AESLA [The Spanish Society for Applied Linguistics] Conference: 25 years of Applied Linguistics in Spain: milestones and challenges, 763–768. Murcia: University of Murcia.Search in Google Scholar
Moudraia, Olga. 2003. The student engineering corpus: Analysing word frequency. In Dawn Archer, Paul Rayson, Andrew Wilson & Tony McEnery (eds.), Proceedings of the Corpus Linguistics 2003 Conference [CL2003], 552–561. Lancaster: Lancaster University.Search in Google Scholar
Nassaji, Hossein. 2012. Statistical significance tests and result generalisability. In Graeme Porte (ed.), Replication research in applied linguistics, 92–115. Cambridge: Cambridge University Press.Search in Google Scholar
Nation, Pual & Alex Heatley. 2002. Range: A program for the analysis of vocabulary in texts [Computer Software]. Wellington: Victoria University. http://www.victoria.ac.nz/lals/about/staff/paul-nation (accessed 19 September 2014).Search in Google Scholar
O’Halloran, Kieran. 2011. Investigating argumentation in reading groups: Combining manual qualitative coding and automated corpus analysis tools. Applied Linguistics 32(2). 172–196.10.1093/applin/amq041Search in Google Scholar
Oakes, Michael P. 2008. Measures from information retrieval to find the words which are characteristic of a corpus. In Barbara Lewandowska-Tomaszczyj (ed.), Corpus linguistics, computer tools, and applications–state of the art: PALC 2007, 127–138. Frankfurt: Peter Lang.Search in Google Scholar
Paquot, Magali & Yves Bestgen. 2009. Distinctive words in academic writing: A comparison of three statistical tests for keyword extraction. In Andreas H. Jucker, Daniel Schreier & Marianne Hundt (eds.), Corpora: Pragmatics and discourse, 247–269. Amsterdam & New York: Rodopi.10.1163/9789042029101_014Search in Google Scholar
Rayson, Paul & Roger Garside. 2000. Comparing corpora using frequency profiling. In Proceedings of the Workshop on Comparing Corpora [WCC’00], 1–6. Hong Kong: Association for Computational Linguistics.10.3115/1117729.1117730Search in Google Scholar
Rayson, Paul. 2008a. From key words to key semantic domains. International Journal of Corpus Linguistics 13(4). 519–149.10.1075/ijcl.13.4.06raySearch in Google Scholar
Rayson, Paul. 2008b. Log-likelihood and effect size calculator. http://ucrel.lancs.ac.uk/llwizard.html (accessed 27 August 2015).Search in Google Scholar
Rayson, Paul. 2009. Wmatrix: a web-based corpus processing environment [Computer Software]. Lancaster: Lancaster University. http://ucrel.lancs.ac.uk/wmatrix/Search in Google Scholar
Rayson, Paul. 2013. Corpus analysis of key words. In Carol A. Chapelle (ed.), The encyclopaedia of applied linguistics, 1–7. Oxford: Wiley-Blackwell.10.1002/9781405198431.wbeal0247Search in Google Scholar
Rayson, Paul, Damon Berridge & Brian Francis. 2004. Extending the Cochran rule for the comparison of word frequencies between corpora. In Gérald Purnelle, Cédrick Fairon & Anne Dister (eds.), Proceedings of the 7th International Conference on Statistical Analysis of Textual Data [JADT], 926–936. Louvain-la-Neuve: UCL Presses universitaires de Louvain.Search in Google Scholar
Renström, Caroline. 2011. Framing Obama: A comparative study of keywords and frames in two Washington newspapers. Stockholm: Stockholm University Bachelor Degree Thesis. http://su.diva-portal.org/smash/get/diva2:479520/FULLTEXT01 (accessed 24 September 2013).Search in Google Scholar
Römer, Ute & Stefanie Wulff. 2010. Applying corpus methods to written academic texts: Explorations of MICUSP. Journal of Writing Research 2(2). 99–127.10.17239/jowr-2010.02.02.2Search in Google Scholar
Schaffer, Deborah. 2012. The language of scam spams: linguistic features of “Nigerian fraud” e-mails. et Cetera 69(2). 157–179.Search in Google Scholar
Scharl, Arno & Albert Weichselbraun. 2008. An automated approach to investigating the online media coverage of US presidential elections. Journal of Information Technology and Politics 5(1). 121–132.10.1080/19331680802149582Search in Google Scholar
Schmitt, Norbert. 2010. Researching vocabulary: A vocabulary research manual. Basingstoke: Palgrave Macmillan.10.1057/9780230293977Search in Google Scholar
Scott, Mike & Christopher Tribble. 2006. Textual patterns: Key words and corpus analysis in language education. Amsterdam: John Benjamins.10.1075/scl.22Search in Google Scholar
Scott, Mike. 1997. PC analysis of key words – and key key words. System 25(2). 233–245.10.1016/S0346-251X(97)00011-0Search in Google Scholar
Scott, Mike. 2000. Focusing on the text and its key words. In Lou Burnard & Tony McEnery (eds.), Rethinking language pedagogy from a corpus perspective, 103–122. Frankfurt: Peter Lang.Search in Google Scholar
Scott, Mike. 2015. WordSmith Tools (Version 6.0) [Computer Software]. Oxford: Oxford University Press.Search in Google Scholar
Seale, Clive 2008. Mapping the field of medical sociology: A comparative analysis of journals. Sociology of Health & Illness 30(5). 677–695.10.1111/j.1467-9566.2008.01090.xSearch in Google Scholar
Seale, Clive, Sue Ziebland & Jonathan Charteris-Black. 2006. Gender, cancer experience and internet use: A comparative keyword analysis of interviews and online cancer support groups. Social Science and Medicine 62(10). 2577–2590.10.1016/j.socscimed.2005.11.016Search in Google Scholar
Sealey, Alison. 2009. Probabilities and surprises: A realist approach to identifying linguistic and social patterns, with reference to an oral history corpus. Applied Linguistics 31(2). 215–235.10.1093/applin/amp023Search in Google Scholar
Stubbs, Michael. 1995. Collocations and semantic profiles: On the cause of the trouble with quantitative studies. Functions of Language 2(1), 23–55.10.1075/fol.2.1.03stuSearch in Google Scholar
Sweeney, Latanya. 2006. Protecting job seekers from identity theft. IEEE Internet Computing 10(2). http://dataprivacylab.org/dataprivacy/projects/idangel/paper3.pdf (accessed 25 May 2015).10.1109/MIC.2006.40Search in Google Scholar
Thompson, Geoff. 2004 [1996]. Introducing functional grammar, 2nd edn. London: Arnold.Search in Google Scholar
Tomokiyo, Takashi & Matthew Hurst. 2003. A language model approach to keyphrase extraction. In Proceedings of the ACL [Association for Computational Linguistics] 2003 Workshop on Multiword Expressions: Analysis, Acquisition and Treatment [MWE’03], 33–40. Sapporo: Association for Computational Linguistics.10.3115/1119282.1119287Search in Google Scholar
Vechtomova, Olga & Stephen Robertson. 2000. Integration of collocation statistics into the probabilistic retrieval model. In Stephen Robertson & Goker Ayse (eds.), Proceedings of the 22nd Annual Colloquium on Information Retrieval Research [ECIR], 165–177. Cambridge: Sidney Sussex College.Search in Google Scholar
Viosca, R. Charles Jr., Blaise J. Bergiel & Phillip Balsmeier. 2004. Effects of the electronic Nigerian money fraud on the brand equity of Nigeria and Africa. Management Research News 27(6). 11–20.10.1108/01409170410784167Search in Google Scholar
Viswamohan, Aysha Iqbal, Charles Hadfield & Jill Hadfield. 2010. ‘Dearest beloved one, I need your assistance’: the rhetoric of spam mail. ELT Journal 64(1). 85–94.10.1093/elt/ccp086Search in Google Scholar
Walsh, Matthew. 2005. Collocation and the learner of English. Language teaching publications. Hove 2(7). 26–54.Search in Google Scholar
Webb, Stuart & Paul Nation. 2008. Evaluating the vocabulary load of written text. TESOLANZ Journal 16. 1–10.Search in Google Scholar
Wilson, Andrew. 2013. Embracing Bayes factors for key item analysis in corpus linguistics. In Markus Bieswanger & Amei Koll-Stobbe (eds.), New approaches to the study of linguistic variability (Language competence and language awareness in Europe 4), 3–11. Frankfurt: Peter Lang.Search in Google Scholar
419 Advance Fee Fraud Statistics 2009. 2010. http://www.ultrascan-agi.com/public_html/html/aff_37_countries.htm.Search in Google Scholar
© 2018 Walter de Gruyter GmbH, Berlin/Boston