The word as a unit of internal predictability

John Mansfield

doi:10.1515/ling-2020-0118

Published by De Gruyter Mouton October 1, 2021

The word as a unit of internal predictability

John Mansfield

From the journal Linguistics

https://doi.org/10.1515/ling-2020-0118

Showing a limited preview of this publication:

Abstract

A long-standing problem in linguistics is how to define word. Recent research has focused on the incompatibility of diverse definitions, and the challenge of finding a definition that is crosslinguistically applicable. In this study I take a different approach, asking whether one structure is more word-like than another based on the concepts of predictability and information. I hypothesize that word constructions tend to be more “internally predictable” than phrase constructions, where internal predictability is the degree to which the entropy of one constructional element is reduced by mutual information with another element. I illustrate the method with case studies of complex verbs in German and Murrinhpatha, comparing verbs with selectionally restricted elements against those built from free elements. I propose that this method identifies an important mathematical property of many word-like structures, though I do not expect that it will solve all the problems of wordhood.

Keywords: complex verbs; entropy; information theory; morphology; wordhood

Corresponding author: John Mansfield, School of Languages and Linguistic, Babel Building, University of Melbourne, Babel Building, Parkville, VIC 3011, Australia, E-mail: john.mansfield@unimelb.edu.au

Funding source: Australian Research Council

Award Identifier / Grant number: DE180100872

Acknowledgments

This paper has benefited from discussion with Christian Döhler, Charles Kemp, William Lane, Frank Mollica, Nicholas Lester, Rachel Nordlinger, and Adam Tallman, as well as audiences at the University of Zurich Centre for Linguistics and the University of Melbourne Computational Cognitive Science lab. Further improvements were made thanks to the comments of two anonymous reviewers.

Research funding: The research was funded by the Australian Research Council, grant number DE180100872.

Appendix: Sample size effects on entropy and internal predictability

Entropy estimates can be highly inaccurate with small samples, and this is an important issue for any corpus study of lexical items, which by Zipf’s law include many rare items. In this study the sample size effect is mitigated in two ways: firstly, but using the Chao-Shen entropy estimation method (Gotelli and Chao 2013), which corrects for small samples. Secondly, in the presentation of individual lexemes’ predictability measurements (Sections 6.1 and 7.1), I exclude lexemes with less than 10 corpus tokens.

In this Appendix I illustrate some effects of sample size on the estimation of complex verb entropy. I focus here on the German corpus data, which yielded a larger sample of 77,946 complex verb tokens. Comparing entropy estimates for smaller subsets of this data gives us some insight into the accuracy of the smaller Murrinhpatha sample, which consists of 6,041 complex verb tokens.

First, I show the effect of different sample sizes on estimating the preverb entropy of individual verb stems. This is done by drawing repeated independent samples from the full dataset.^[19] Figure A1 shows particle entropy estimates for lexical stems appearing in the phrase construction, using the three stems illustrated in Table 6 of the main paper: arbeiten, streichen, werfen. Entropy estimates are on the y-axis, and sample size is on the x-axis (which is on a square-root scale). Cho-Shen entropy estimates are shown as heavier dots, and empirical entropy estimates as lighter crosses. For both methods, estimates have a high degree of variance with smaller samples, and gradually converge as the sample size increases. The Chao-Shen method both over- and under-estimates entropy in small samples, but importantly, estimates tend to cluster around the central value converged upon in larger samples. Empirical entropy estimates, on the other hand, systematically under-estimate entropy at smaller sample sizes.

Figure A1:

Particle entropy estimates for three German verb stems, using different corpus sample sizes.

In my presentation of preverb predictability among individual verb stems (Sections 6.1 and 7.1), a minimal token threshold of 10 was selected to mitigate estimation inaccuracy, while also including as many verb stems as possible. As shown in Figure A1, Chao-Shen estimates with only 10 tokens can be somewhat inaccurate, though estimates cluster towards the true value. The three stems shown here are among those with higher token counts (between 50 and 300), but as is typical with Zipfian lexical distributions, many stems have far fewer tokens. Therefore, the preverb entropy estimates shown for individual verb stems in the main paper will have variable accuracy, according to token count, with N ≥ 10 set as a floor to avoid the most egregious errors.

Figure A2 shows prefix entropy estimates for three verb stems in the word-type construction. All have very low prefix entropy. At smaller sample sizes the figure shows some massive over-estimates, which occur when a small sample happens to include one of the rare prefix combinations. However, the vast majority of small-sample estimates are in fact zero, i.e., quite accurate. Over-plotting of points obscures the predominance of accurate estimates, but regression lines (dashed for Chao-Shen, solid for empirical) have been added to show the overall accuracy. The stem reichen only ever occurs with the prefix er- in our sample, and therefore all estimates are zero.

Figure A2:

Prefix entropy estimates for three German verb stems, using different corpus sample sizes.

In the overall measures of construction type internal predictability (IP) (Figure 5 in the main article), all verb stems are included irrespective of token frequency. This gives a more complete picture of predictability in the construction type, since rare lexemes are an intrinsic part of corpus distributions. Importantly, IP is a weighted average across verb stems, and therefore considers token frequency (i.e., verb stem probability), in a way that is not evident in the individual lexeme figures. Highly frequent stems, with more accurate entropy estimates, have a greater influence on IP. Low-frequency stems, with less reliable entropy estimates, each have a very small influence on IP.

Finally, it is worth considering the effect of the total sample size on IP, especially since Murrinhpatha provided a much smaller sample. Figure A3 shows IP measures for different sized independent samples of the German complex verb dataset. Again, both Chao-Shen and empirical estimates are shown. Cho-Shen estimates (dots) converge to a stable value by around 10,000 complex verb tokens. Empirical estimates (crosses) overestimate IP, especially in the more unpredictable phrase construction. Given that 6,041 tokens were available for Murrinhpatha complex verbs, and assuming that the laws of sample size would apply similarly to Murrinhpatha as to German, we can see that Chao-Shen estimates for Murrinhpatha are likely to be accurate within a few percentage points.

Figure A3:

Internal predictability estimates for German complex verb construction types, using different corpus sample sizes.

References

Aikhenvald, Alexandra. 2006. Serial verbs constructions in a typological perspective. In Alexandra Y. Aikhenvald, Robert M. W. Dixon, Eric Adell, Natalia Bermúdez & Gladys Camacho (eds.), Serial verb constructions: A cross-linguistic typology, 1–68. Oxford: Oxford University Press.10.1093/oso/9780199279159.003.0001Search in Google Scholar

Attneave, Fred. 1959. Applications of information theory to psychology: A summary of basic concepts, methods and results. New York: Holt Rinehart & Winston.Search in Google Scholar

Baayen, Harald. 1993. On frequency, transparency and productivity. In Geert Booij & Jaap van Marle (eds.), Yearbook of morphology 1992, 181–208. Dordrecht: Springer Netherlands.10.1007/978-94-017-3710-4_7Search in Google Scholar

Baayen, R. Harald. 2010. Demythologizing the word frequency effect: A discriminative learning perspective. The Mental Lexicon 5(3). 436–461. https://doi.org/10.1075/ml.5.3.10baa.Search in Google Scholar

Bannard, Colin & Danielle Matthews. 2008. Stored word sequences in language learning: The effect of familiarity on children’s repetition of four-word combinations. Psychological Science 19(3). 241–248. https://doi.org/10.1111/j.1467-9280.2008.02075.x.Search in Google Scholar

Barðdal, Jóhanna. 2008. Productivity: Evidence from case and argument structure in Icelandic. Amsterdam & Philadelphia: John Benjamins.10.1075/cal.8Search in Google Scholar

Bauer, Laurie. 2017. Compounds and compounding. Cambridge: Cambridge University Press.10.1017/9781108235679Search in Google Scholar

Belica, Cyril, Marc Kupietz, Harald Lüngen, Rainer Perkuhn & Anna Schächtele. 2014. DeReWo – Corpus-based lemma and word form lists. Leibniz Institute for the German Language. https://www1.ids-mannheim.de/s/corpus-linguistics/projects/methods-of-analysis/corpus-based-lemma-and-word-form-lists.html?L=1 (accessed 30 April 2020).Search in Google Scholar

Bickel, Balthasar, Goma Banjade, Martin Gaenszle, Elena Lieven, Netra Prasad Paudyal, Ichichha Purna Rai, Manoj Rai, Novel Kishore Rai & Sabine Stoll. 2007. Free prefix ordering in Chintang. Language 83(1). 43–73. https://doi.org/10.1353/lan.2007.0002.Search in Google Scholar

Bickel, Balthasar, Kristine A. Hildebrandt & Rene Schiering. 2009. The distribution of phonological word domains: A probabilistic typology. In Janet Grijzenhout (ed.), Phonological domains: Universals and deviations, 47–78. Berlin & New York: Mouton de Gruyter.10.1515/9783110219234.1.47Search in Google Scholar

Bickel, Balthasar & Fernando Zúñiga. 2017. The “word” in polysynthetic languages: Phonological and syntactic challenges. In Michael Fortescue, Marianne Mithun & Nicholas Evans (eds.), The Oxford handbook of polysynthesis, 158–185. Oxford: Oxford University Press.10.1093/oxfordhb/9780199683208.013.52Search in Google Scholar

Biskup, Petr, Michael Putnam & Laura Catharine Smith. 2011. German particle and prefix verbs at the syntax phonology interface. Leuvense Bijdragen 97. 106–135.Search in Google Scholar

Blevins, James P. 2016. Word and paradigm morphology. Oxford: Oxford University Press.10.1093/acprof:oso/9780199593545.001.0001Search in Google Scholar

Bloomfield, Leonard. 1933. Language. New York: Henry Holt.Search in Google Scholar

Blumenthal-Dramé, Alice. 2012. Entrenchment in usage-based theories: What corpus data do and do not reveal about the mind. Berlin & Boston: De Gruyter Mouton.10.1515/9783110294002Search in Google Scholar

Blythe, Joe. 2009. Doing referring in Murriny Patha conversation. Sydney: University of Sydney dissertation.Search in Google Scholar

Booij, Geert & Ans van Kemenade. 2003. Preverbs: An introduction. In Geert Booij & Jaap van Marle (eds.), Yearbook of morphology 2003, 1–11. Dordrecht: Springer Netherlands.10.1007/978-1-4020-1513-7_1Search in Google Scholar

Boyd, Jeremy K. & Adele Goldberg. 2011. Learning what not to say: The role of statistical preemption and categorization in “a”-adjective production. Language 81(1). 1–29.10.1353/lan.2011.0012Search in Google Scholar

Brent, Michael R. 1999. An efficient, probabilistically sound algorithm for segmentation and word discovery. Machine Learning 34(1). 71–105. https://doi.org/10.1023/a:1007541817488.10.1023/A:1007541817488Search in Google Scholar

Bresnan, Joan & Sam A. Mchombo. 1995. The lexical integrity principle: Evidence from Bantu. Natural Language & Linguistic Theory 13(2). 181–254. https://doi.org/10.1007/bf00992782.Search in Google Scholar

Brinton, Laurel J. & Elizabeth Closs Traugott. 2005. Lexicalization and language change. Cambridge: Cambridge University Press.10.1017/CBO9780511615962Search in Google Scholar

Bruening, Benjamin. 2018. The lexicalist hypothesis: Both wrong and superfluous. Language 94(1). 1–42. https://doi.org/10.1353/lan.2018.0000.Search in Google Scholar

Bybee, Joan L. 2006. From usage to grammar: The mind’s response to repetition. Language 82. 711–733. https://doi.org/10.1353/lan.2006.0186.Search in Google Scholar

Christiansen, Morten H. & Nick Chater. 2015. The now-or-never bottleneck: A fundamental constraint on language. Behavioral and Brain Sciences 39. 1–52. https://doi.org/10.1017/S0140525X1500031X.Search in Google Scholar

Coupé, Christophe, Yoon Mi Oh, Dan Dediu & François Pellegrino. 2019. Different languages, similar encoding efficiency: Comparable information rates across the human communicative niche. Science Advances 5(9). eaaw2594. https://doi.org/10.1126/sciadv.aaw2594.Search in Google Scholar

Cover, Thomas A. & Joy A. Thomas. 2002. Elements of information theory, 2nd edn. London: Wiley.10.1002/0471200611Search in Google Scholar

Croft, William. 2001. Radical construction grammar: Syntactic theory in typological perspective. Oxford: Oxford University Press.10.1093/acprof:oso/9780198299554.001.0001Search in Google Scholar

Culbertson, Jennifer, Marieke Schouwstra & Simon Kirby. 2020. From the world to word order: Deriving biases in noun phrase order from statistical properties of the world. Language 96(3). https://doi.org/10.1353/lan.2020.0045.Search in Google Scholar

Di Sciullo, Anna-Maria & Edwin Williams. 1987. On the definition of word. Cambridge, MA: MIT Press.Search in Google Scholar

Divjak, Dagmar. 2019. Frequency in language: Memory, attention and learning. Cambridge: Cambridge University Press.10.1017/9781316084410Search in Google Scholar

Dixon, R. M. W. & Alexandra Y. Aikhenvald. 2002. Word: A typological framework. In R. M. W. Dixon & Alexandra Y. Aikhenvald (eds.), Word: A cross-linguistic typology, 1–41. Cambridge: Cambridge University Press.10.1017/CBO9780511486241.002Search in Google Scholar

Dodd, Bill, Christine Eckhard-Black, John Klapper & Ruth Whittle. 2003. Modern German grammar: A practical guide, 2nd edn. London: Routledge.Search in Google Scholar

Eisenberg, Peter. 2013. Grundriss der deutschen Grammatik, Band 1: Das Wort. Stuttgart: J. B. Metzler.10.1007/978-3-476-00743-8_1Search in Google Scholar

Ellis, Nick C. & Fernando Ferreira-Junior. 2009. Construction learning as a function of frequency, frequency distribution, and function. The Modern Language Journal 93(3). 370–385. https://doi.org/10.1111/j.1540-4781.2009.00896.x.Search in Google Scholar

Futrell, Richard, Peng Qian, Edward Gibson, Evelina Fedorenko & Idan Blank. 2019. Syntactic dependencies correspond to word pairs with high mutual information. In Proceedings of the Fifth International Conference on Dependency Linguistics (Depling, SyntaxFest 2019), 3–13. Paris: Association for Computational Linguistics. Available at: https://aclanthology.org/W19-7700.pdf.10.18653/v1/W19-7703Search in Google Scholar

Geertzen, Jeroen, James P. Blevins & Petar Milin. 2016. Informativeness of linguistic unit boundaries. Italian Journal of Linguistics 28(1). 25–48.Search in Google Scholar

Gibson, Edward, Richard Futrell, Steven T. Piantadosi, Isabelle Dautriche, Kyle Mahowald, Leon Bergen & Roger Levy. 2019. How efficiency shapes human language. Trends in Cognitive Sciences 23(5). 389–407. https://doi.org/10.1016/j.tics.2019.02.003.Search in Google Scholar

van Gijn, Rik & Fernando Zúñiga. 2014. Word and the Americanist perspective. Morphology 24(3). 135–160. https://doi.org/10.1007/s11525-014-9242-z.Search in Google Scholar

Goddard, Cliff. 1985. A grammar of Yankunytjatjara. Alice Springs: Institute for Aboriginal Development.Search in Google Scholar

Gotelli, Nicholas J. & Anne Chao. 2013. Measuring and estimating species richness, species diversity, and biotic similarity from sampling data. In Encyclopedia of biodiversity, 195–211. Cambridge, MA: Academic Press.10.1016/B978-0-12-384719-5.00424-XSearch in Google Scholar

Gries, Stefan Th. 2013. 50-something years of work on collocations. International Journal of Corpus Linguistics 18(1). 137–166. https://doi.org/10.1075/ijcl.18.1.09gri.Search in Google Scholar

Hafer, Margaret A. & Stephen F. Weiss. 1974. Word segmentation by letter successor varieties. Information Storage and Retrieval 10(11). 371–385. https://doi.org/10.1016/0020-0271(74)90044-8.Search in Google Scholar

Harris, Zellig S. 1955. From phoneme to morpheme. Language 31(2). 190–222. https://doi.org/10.2307/411036.Search in Google Scholar

ten Hacken, Pius. 2017. Compounding in morphology. In Mark Aronoff (ed.), Oxford research encyclopedia of linguistics. Oxford: Oxford University Press.10.1093/acrefore/9780199384655.013.251Search in Google Scholar

Haspelmath, Martin. 2011. The indeterminacy of word segmentation and the nature of morphology and syntax. Folia Linguistica 45(1). 31–80. https://doi.org/10.1515/flin.2011.002.Search in Google Scholar

Haspelmath, Martin. 2015. Defining vs diagnosing linguistic categories: A case study of clitic phenomena. In Joanna Blaszczak, Dorota Klimek-Jankowska & Krzysztof Migdalski (eds.), How categorical are categories: New approaches to the old questions of noun, verb, and adjective, 273–304. Berlin & Boston: De Gruyter Mouton.10.1515/9781614514510-009Search in Google Scholar

Haspelmath, Martin. 2020. The morph as a minimal linguistic form. Morphology 30(2). 117–134. https://doi.org/10.1007/s11525-020-09355-5.Search in Google Scholar

Hay, Jennifer. 2002. From speech perception to morphology: Affix ordering revisited. Language 78(3). 527–555. https://doi.org/10.1353/lan.2002.0159.Search in Google Scholar

Hillert, Dieter & Farrell Ackerman. 2002. Accessing and parsing phrasal predicates. In Nicole Dehé, Ray Jackendoff, Andrew McIntyre & Silke Urban (eds.), Verb-particle explorations. Berlin & Boston: De Gruyter Mouton.10.1515/9783110902341.289Search in Google Scholar

Kilgarriff, Adam. 2005. Language is never, ever, ever, random. Corpus Linguistics and Linguistic Theory 1(2). 263–276. https://doi.org/10.1515/cllt.2005.1.2.263.Search in Google Scholar

Langacker, Ronald W. 2017. Entrenchment in cognitive grammar. In Hans-Jörg Schmid (ed.), Entrenchment and the psychology of language learning: How we reorganize and adapt linguistic knowledge (Language and the Human Lifespan), 39–56. Berlin & Boston: De Gruyter Mouton.10.1037/15969-003Search in Google Scholar

Los, Bettelou, Corrien Blom, Geert Booij, Marion Elenbaas & Ans van Kemenade. 2012. Morphosyntactic change: A comparative study of particles and prefixes. Cambridge: Cambridge University Press.10.1017/CBO9780511998447Search in Google Scholar

Mansfield, John Basil. 2015. Morphotactic variation, prosodic domains and the changing structure of the Murrinhpatha verb. Asia-Pacific Language Variation 1(2). 162–188. https://doi.org/10.1075/aplv.1.2.03man.Search in Google Scholar

Mansfield, John Basil. 2016. Intersecting formatives and inflectional predictability: How do speakers and learners predict the correct form of Murrinhpatha verbs? Word Structure 9(2). 183–214. https://doi.org/10.3366/word.2016.0093.Search in Google Scholar

Mansfield, John Basil. 2019. Murrinhpatha morphology and phonology. Berlin & Boston: De Gruyter Mouton.10.1515/9781501503306Search in Google Scholar

Mansfield, John Basil, Joe Blythe, Rachel Nordlinger, Chester Street. 2020. Murrinhpatha morpho-corpus. Available at: langwidj.org/Murrinhpatha-morpho-corpus.Search in Google Scholar

van Marle, Jaap. 2002. Dutch separable compound verbs: Words rather than phrases? In Nicole Dehé, Ray Jackendoff, Andrew McIntyre & Silke Urban (eds.), Verb-particle explorations, 211–232. Berlin & New York: Mouton de Gruyter.10.1515/9783110902341.211Search in Google Scholar

Matthews, Danielle & Colin Bannard. 2010. Children’s production of unfamiliar word sequences is predicted by positional variability and latent classes in a large sample of child-directed speech. Cognitive Science 34(3). 465–488. https://doi.org/10.1111/j.1551-6709.2009.01091.x.Search in Google Scholar

McDonald, Scott. A. & Richard C. Shillcock. 2001. Rethinking the word frequency effect: The neglected role of distributional information in lexical processing. Language and Speech 44(Pt 3). 295–323. https://doi.org/10.1177/00238309010440030101.Search in Google Scholar

McGregor, William. 2002. Verb classification in Australian languages. Berlin & New York: Mouton de Gruyter.10.1515/9783110870879Search in Google Scholar

Menzel, Wolfgang. 2019. The Hamburg dependency treebank. http://hdl.handle.net/11022/0000-0000-7FC7-2 (accessed March 2020).Search in Google Scholar

Mithun, Marianne. 2020. Where is morphological complexity? In Francesco Gardani & Peter M. Arkadiev (eds.), Morphological complexity, 306–328. Oxford: Oxford University Press.10.1093/oso/9780198861287.003.0012Search in Google Scholar

Montemurro, Marcelo A. & Damián H. Zanette. 2011. Universal entropy of word ordering across linguistic families. PLoS ONE 6(5). https://doi.org/10.1371/journal.pone.0019875.Search in Google Scholar

Mugdan, Joachim. 1994. Morphological units. In Ronald E. Asher (ed.), The encyclopedia of language and linguistics, 2543–2553. Oxford: Pergamon Press.Search in Google Scholar

Müller, Stefan. 2002. Syntax or morphology: German particle verbs revisited. In Nicole Dehé, Ray Jackendoff, Andrew McIntyre & Silke Urban (eds.), Verb-particle explorations, 119–140. Berlin & New York: Mouton de Gruyter.10.1515/9783110902341.119Search in Google Scholar

Nordlinger, Rachel. 2015. Inflection in Murrinh-Patha. In Matthew Baerman (ed.), The Oxford handbook of inflection, 491–519. Oxford: Oxford University Press.10.1093/oxfordhb/9780199591428.013.21Search in Google Scholar

Nordlinger, Rachel. 2017. The languages of the daly river region (Northern Australia). In Michael Fortescue, Marianne Mithun & Nicholas Evans (eds.), Oxford handbook of polysynthesis, 782–807. Oxford: Oxford University Press.10.1093/oxfordhb/9780199683208.013.44Search in Google Scholar

O’Donnell, Timothy J. 2015. Productivity and reuse in language: A theory of linguistic computation and storage. Cambridge, MA: MIT Press.10.7551/mitpress/9780262028844.001.0001Search in Google Scholar

Packard, Jerome L. 2000. The morphology of Chinese: A linguistic and cognitive approach. Cambridge: Cambridge University Press.10.1017/CBO9780511486821Search in Google Scholar

Pellegrino, François, Christophe Coupé & Egidio Marsico. 2011. A cross-language perspective on speech information rate. Language 87(3). 539–558. https://doi.org/10.1353/lan.2011.0057.Search in Google Scholar

Plag, Ingo & R. Harald Baayen. 2009. Suffix ordering and morphological processing. Language 85(1). 109–152. https://doi.org/10.1353/lan.0.0087.Search in Google Scholar

Ramscar, Michael & Robert F. Port. 2016. How spoken languages work in the absence of an inventory of discrete units. Language Sciences 53. 58–74. https://doi.org/10.1016/j.langsci.2015.08.002.Search in Google Scholar

Rice, Sally, Gary Libben & Bruce Derwing. 2002. Morphological representation in an endangered, polysynthetic language. Brain and Language 81(1–3). 473–486. https://doi.org/10.1006/brln.2001.2540.Search in Google Scholar

Russell, Kevin. 1999. The “word” in two polysynthetic languages. In Ursula Kleinhenz & T. Alan Hall (eds.), Studies on the phonological word, 203–221. Amsterdam & Philadelphia: John Benjamins.10.1075/cilt.174.08rusSearch in Google Scholar

Saenger, Paul. 1997. Space between words: The origins of silent reading. Stanford, CA: Stanford University Press.10.1515/9781503619081Search in Google Scholar

Sapir, Edward. 1921. Language: An introduction to the study of speech. New York: Harcourt, Brace.Search in Google Scholar

Schmid, Hans-Jörg & Helmut Küchenhoff. 2013. Collostructional analysis and other ways of measuring lexicogrammatical attraction: Theoretical premises, practical problems and cognitive underpinnings. Cognitive Linguistics 24(3). 531–577. https://doi.org/10.1515/cog-2013-0018.Search in Google Scholar

Schultze-Berndt, Eva. 2003. Preverbs as an open word class in Northern Australian languages: Synchronic and diachronic correlates. In Geert Booij & Jaap van Marle (eds.), Yearbook of morphology 2003, 145–177. Dordrecht: Kluwer.10.1007/978-1-4020-1513-7_7Search in Google Scholar

Shannon, Claude E. 1948. A mathematical theory of communication. Bell System Technical Journal 27(3). 379–423. https://doi.org/10.1002/j.1538-7305.1948.tb01338.x.Search in Google Scholar

Shannon, Claude E. 1951. Prediction and entropy of printed English. Bell System Technical Journal 30. 50–64. https://doi.org/10.1002/j.1538-7305.1951.tb01366.x.Search in Google Scholar

Sosa, Anna Vogel & James MacFarlane. 2002. Evidence for frequency-based constituents in the mental lexicon: Collocations involving the word of. Brain and Language 83(2). 227–236. https://doi.org/10.1016/s0093-934x(02)00032-9.Search in Google Scholar

Spencer, Andrew & Ana R. Luis. 2012. Clitics: An introduction. Cambridge: Cambridge University Press.10.1017/CBO9781139033763Search in Google Scholar

Street, Chester. 1987. An introduction to the language and culture of the Murrinh-Patha. Darwin: Summer Institute of Linguistics.Search in Google Scholar

Street, Chester. 2012. Murrinhpatha to English dictionary. Wadeye Literacy Production Centre.Search in Google Scholar

Tallman, Adam J. R. 2020. Beyond grammatical and phonological words. Language and Linguistics Compass 14(2). e12364. https://doi.org/10.1111/lnc3.12364.Search in Google Scholar

Tallman, Adam J. R. 2021. Constituency and coincidence in Chácobo (Pano). Studies in Language 45(2). 321–383. https://doi.org/10.1075/sl.19025.tal.Search in Google Scholar

Tallman, Adam J., Dennis Wylie, E. Adell, N. Bermudez, G. Camacho, Patience Epps, & Anthony Woodbury. 2018. Constituency and the morphology‐syntax divide in the languages of the Americas: Towards a distributional typology. Paper presented at the 21st Annual Workshop on American Indigenous Languages. UCSB, Santa Barbara, 20–21 April.Search in Google Scholar

Tersis, Nicole. 2009. Lexical polysynthesis: Should we treat lexical bases and their affixes as a continuum? In Marc-Antoine Mahieu & Nicole Tersis (eds.), Variations on polysynthesis: The Eskaleut languages, 51–64. Amsterdam & Philadelphia: John Benjamins.10.1075/tsl.86.04lexSearch in Google Scholar

Walsh, Michael. 1976. The Murinypata language of north-west Australia. Canberra: Australian National University dissertation.Search in Google Scholar

Widmer, Manuel, Sandra Auderset, Johanna Nichols, Paul Widmer & Balthasar Bickel. 2017. NP recursion over time: Evidence from Indo-European. Language 93(4). 799–826. https://doi.org/10.1353/lan.2017.0058.Search in Google Scholar

Williams, Edwin. 2007. Dumping lexicalism. In Gillian Ramchand & Charles Reiss (eds.), The Oxford handbook of linguistic interfaces, 353–381. Oxford: Oxford University Press.10.1093/oxfordhb/9780199247455.013.0012Search in Google Scholar

Wittgenstein, Ludwig. 1953. Philosophical investigations, 3rd edn., [trans. G. E. M. Anscombe]. Oxford: Blackwell.Search in Google Scholar

Wray, Alison. 2002. Formulaic language and the lexicon. Cambridge: Cambridge University Press.10.1017/CBO9780511519772Search in Google Scholar

Wray, Alison. 2015. Why are we so sure we know what a word is? In John R. Taylor (ed.), The Oxford handbook of the word, 725–750. Oxford: Oxford University Press.10.1093/oxfordhb/9780199641604.013.032Search in Google Scholar

Yang, Charles. 2005. On productivity. Linguistic Variation Yearbook 5. 265–302. https://doi.org/10.1075/livy.5.09yan.Search in Google Scholar

Zwicky, Arnold M. & Geoffrey K. Pullum. 1983. Cliticization vs. inflection: English N’T. Language 59(3). 502–513. https://doi.org/10.2307/413900.Search in Google Scholar

Received: 2020-06-12

Accepted: 2020-12-01

Published Online: 2021-10-01

Published in Print: 2021-11-25

The word as a unit of internal predictability

Abstract

Acknowledgments

Appendix: Sample size effects on entropy and internal predictability

References

Journal and Issue

Articles in the same Issue