The words in children’s language learning environments are strongly predictive of cognitive development and school achievement. But how do we measure language environments and do so at the scale of the many words that children hear day-in and day-out? The quantity and quality of words in a child’s input is typically measured in terms of total amount of talk and the lexical diversity in that talk. There are disagreements in the literature whether amount or diversity is the more critical measure of the input. Here we analyze the properties of a large corpus (6.5 million words) of speech to children and simulate learning environments that differ in amount of talk per unit time, lexical diversity, and the contexts of talk. The central conclusion is that what researchers need to theoretically understand, measure, and change is not the total amount of words, or the diversity of words, but the function that relates total words to the diversity of words, and how that function changes across different contexts of talk.
Humans may retrieve words from memory by exploring and exploiting in linguistic "space" similar to hownon-human animals forage for resources in physical space. This has been studied using the verbalfluency test (VFT), in which participants generate words belonging to a semantic or phonetic category in alimited time. The foraging in mind model proposes that individuals performing VFT monitor their responseproduction rate as they search through and deplete a local patch (subcategory) of items in memory andthen switch to a new patch in another part of semantic or phonetic space. An alternative model holds thatparticipants use a random walk process, and switches are merely epiphenomenal long steps reflectingthe tail of the random walk step size distribution. This study tests these competing theories by examiningwhether there is distinct neural activity during exploring between ("switching") versus exploiting within("clustering") related response groupings (foraging), or no neural differences between search phases(random walk). Thirty participants performed category and letter VFT during functional magneticresonance imaging. Responses were categorized as cluster or switch events based on computationalmetrics of similarity and participant evaluations. Findings provide neural evidence of a cognitive foragingprocess, with greater hippocampal and cerebellar activation during switching compared to clustering,even while controlling for greater semantic and phonetic distance and response times. Furthermore,these regions exhibited ramping activity leading up to switch events. These results clarify how neuralswitch processes may guide memory searches in a manner akin to foraging in patchy spatialenvironments.
When concepts are retrieved from memory, this process occurs within a rich search space where multiple sources of information interact with each other. Although the mapping from wordform to meaning is generally considered to be arbitrary, there is recent evidence to suggest that form and meaning may be correlated in natural language, and semantic and phonological cues may interact during retrieval. However, whether phonology interacts with meaning-related information in deeper semantic retrieval tasks, and whether this interaction has broader implications for how we conceptualize semantic retrieval remains relatively understudied. We examined these questions within the framework of the semantic fluency task, where individuals were asked to retrieve as many exemplars as they could from a given category (e.g., animals) within a fixed period of time. Responses were more phonologically similar during later stages of retrieval, and greater phonological similarity across responses was associated with greater number of items produced. We formulated a nested set of optimal foraging models to evaluate the combined influence of semantic and phonological information on retrieval likelihood. Model comparisons revealed that a model that combined frequency, semantic, and phonological information locally to make within-category transitions but relied on only frequency as a global cue to make between-category transitions produced the best explanation of the behavioral data.
The field of cognitive aging has seen considerable advances in describing the linguistic and semantic changes that happen during the adult life span to uncover the structure of the mental lexicon (i.e., the mental repository of lexical and conceptual representations). Nevertheless, there is still debate concerning the sources of these changes, including the role of environmental exposure and several cognitive mechanisms associated with learning, representation, and retrieval of information. We review the current status of research in this field and outline a framework that promises to assess the contribution of both ecological and psychological aspects to the aging lexicon. Cognitive Aging and the Mental LexiconThere is consensus in the cognitive sciences that human development extends well beyond childhood and adolescence, and there has been remarkable empirical progress in the field of cognitive aging in past decades [1]. Nevertheless, the role of environmental and cognitive factors in age-related changes in the structure and processing of lexical and semantic representations (see Glossary) is still under debate. For example, age-related memory decline is commonly attributed to a decline in cognitive abilities [2,3], yet some researchers have proposed that massive exposure to language over the course of one's life leads to knowledge gains that may contribute to, if not fully account for, age-related memory deficits [4][5][6]. We argue that to resolve such debates we require an interdisciplinary approach that captures how information exposure across adulthood may change the way that we acquire, represent, and recall information. We summarize recent developments in the field (Figure 1, Table 1) and propose a conceptual framework (Figure 2, Key Figure) and associated research agenda that argues for combining ecological analyses, formal modeling, and large-scale empirical studies to shed light on the contents, structure, and neural basis of the aging mental lexicon in both health and disease. Mental Lexicon: Aging and Cognitive PerformanceThe mental lexicon can be thought of as a repository of lexical and conceptual representations, composed of organized networks of semantic, phonological, orthographic, morphological, and other types of information [7]. The cognitive sciences have provided considerable knowledge about the computational (Box 1; [8][9][10][11]) and neural basis (Box 2; [12,13]) of lexical and semantic cognition, and there has been considerable interest in how such aspects of cognition change across adulthood and aging [14,15].Past work on the aging lexicon emphasized the amount of information acquired across the life span (e.g., vocabulary gains across adulthood; [15]); however, new evaluations using graphbased approaches suggest that both quantity and structural aspects of representations differ between individuals [16] and change across the life span [17][18][19]. Such insights were gathered, for example, from a large-scale analysis of free association data from thousands of individuals [17], rangi...
The field of psycholinguistics has recently questioned the primacy of word frequency (WF) in influencing word recognition and production, focusing on the importance of a word’s contextual diversity (CD). WF is operationalized by counting the number of occurrences of a word in a corpus, while a word’s CD is a count of the number of contexts that a word occurs in, with repetitions in a context being ignored. Numerous studies have converged on the conclusion that CD is a better predictor of word recognition latency and accuracy than frequency (see Jones, Johns, & Dye, 2017 for a review). These findings support a cognitive mechanism based on the principle of likely need over the principle of repetition in lexical organization. In the current study, we trained the semantic distinctiveness model of Johns (2021) on communication patterns in social media platforms consisting of over 55-billion-word tokens and examined the ability of theoretically distinct models to explain word recognition latency and accuracy data from over 250,000 participants from the Brysbaert, et al. (2019) norms, consisting of approximately 57,000 words across six age bands ranging from ages 10-60. There was a clear quantitative trend across the age bands, where there is a shift from a social environment-based attention mechanism in the “younger” models, to a clear dominance for a discourse-based attention mechanism as models “aged.” This pattern suggests that there is a dynamical interaction between the cognitive mechanisms of lexical organization and environmental information across aging.
Measures of contextual diversity seek to replace word frequency by counting the number of contexts in which a word occurs rather than the raw number of occurrences (Adelman, Brown, & Quesada, 2006). It has repeatedly been shown that contextual diversity measures outperform word frequency on word recognition datasets (Adelman & Brown, 2008; Brysbaert & New, 2009). Recently, Hollis (2020) has questioned the importance of contextual diversity by demonstrating that when other variables of contextual occurrences are controlled for, diversity accounts for relatively small amounts of unique variance over word frequency. However, the analysis of Hollis (2020) did not take into account the semantic content of the contexts that words occur in. Johns, Dye, and Jones (2020) and Johns (2021) have recently shown that defining linguistic contexts at larger, and more ecologically valid, levels lead to contextual diversity measures that provide very large improvements over word frequency, especially when implemented with principles from the Semantic Distinctiveness Model of Jones, Johns, and Recchia (2012). Across a series of simulations, we demonstrate that the advantages of contextual diversity measures are dependent upon the usage of semantic representations of words to determine the uniqueness of contextual occurrences, where unique contextual occurrences provide a greater impact to a word’s lexical strength than redundant contextual occurrences.
Analyzing data from the verbal fluency task (e.g., “name all the animals you can in a minute”) is of interest to both memory researchers and clinicians due to its broader implications for memory search and retrieval. Recent work has proposed several computational models to examine nuanced differences in search behavior, which can provide insights into the mechanisms underlying memory search. A prominent account of memory search within the fluency task was proposed by Hills, Jones, and Todd (2012), where mental search is modeled after how animals forage for food in physical space. Despite the broad potential utility of these models to scientists and clinicians, there is currently no open-source program to apply and compare existing foraging models or clustering algorithms without extensive, often redundant programming. To remove this barrier to studying search patterns in the fluency task, we created forager, a Python package (https://github.com/thelexiconlab/forager) and web interface (https://forager.research.bowdoin.edu/). forager provides multiple automated methods to designate clusters and switches within a fluency list, implements a novel set of computational models that can examine the influence of multiple lexical sources (semantic, phonological, and frequency) on memory search using semantic embeddings, and also enables researchers to evaluate relative model performance at the individual and group level. The package and web interface cater to users with various levels of programming experience. In this work, we introduce forager’s basic functionality and use cases that demonstrate its utility with pre-existing behavioral and clinical data sets of the semantic fluency task.
Failing to come up with a word or name is a fairly common experience that is exacerbated in older adulthood and among populations with language impairments, and yet the mechanisms underlying lexical retrieval remain fairly understudied. In this work, we introduce and evaluate a series of nested computational models of lexical retrieval that combine semantic representations derived from a distributional semantic model with a process model to account for behavioral performance in a primed lexical retrieval task. The models were tested on a behavioral data set where participants attempted to retrieve answers to descriptions of low-frequency words and were provided a semantically and/or phonologically related prime word before the retrieval attempt. Model comparisons indicated that a model that emphasized semantic activations from the description and phonological activations from the prime word best accounted for the overall data. Additionally, incorrect responses and metacognitive judgments indicating that participants had other words in mind were associated with models that instead emphasized semantic activations from the prime word. Taken together, these results identify the locus of lexical retrieval failures and offer the opportunity to investigate broader questions about semantic memory retrieval.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.