Predicting Lexical Norms: A Comparison between a Word Association Model and Text-Based Word Co-occurrence Models

Vankrunkelsven, Hendrik; Verheyen, Steven; Storms, Gerrit; Deyne, Simon De

doi:10.5334/joc.50

Cited by 35 publications

(36 citation statements)

References 55 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The specialization method involved with the popular Paragram-SL999 vectors(Wieting et al, 2015) may have actually degraded performance in the present setting. As we observe similar results in the following sections, we save interpretation of the relative performance of specialized and unspecialized text-based vectors for the General Discussion.Finally, considering the best similarity functions for a given representation, vectors based on free association norms presented a small advantages over vectors based on text, an effect consistent with previous work comparing these two sources of representation when modeling semantic judgments(De Deyne et al, 2015;Vankrunkelsven et al, 2018). Even more striking was that Spearman correlation, when combined with SWOW-RW-SVD vectors, achieved the best overall performance (r = .61), .04 points better than the next best representation-function pair (SWOW-RW with Pearson correlation), and .08 points better than the best (text-based vector, function) pair.…”

supporting

confidence: 82%

“…Pearson correlations between these predictions and actual judgments were substantially higher (advantages of ~ r =.1 to r =.26) than correlations between actual judgments and predictions obtained from cosine similarity between vectors from a model based on word-word cooccurrences in syntactic dependencies (e.g., noun-verb dependencies like "We need some more coffee"). Similarly, Vankrunkelsven, Verheyen, Storms, & De Deyne (2018) found that affective word properties (e.g., valence and arousal) of English and Dutch words were better predicted from k-Nearest Neighbors regression of such a PPMI-transformed cue-response matrix than they were from k-NN regression of a similar syntactic-dependency-based text model. One of the goals of the present work, therefore, is to examine whether representations from free association norms make better predictions of similarity judgments than do various representations based on lexical co-occurrence statistics in text, including several not considered by De Deyne et al 2015or Vankrunkelsven et al (2018).…”

Section: Representationsmentioning

confidence: 86%

“…Similarly, Vankrunkelsven, Verheyen, Storms, & De Deyne (2018) found that affective word properties (e.g., valence and arousal) of English and Dutch words were better predicted from k-Nearest Neighbors regression of such a PPMI-transformed cue-response matrix than they were from k-NN regression of a similar syntactic-dependency-based text model. One of the goals of the present work, therefore, is to examine whether representations from free association norms make better predictions of similarity judgments than do various representations based on lexical co-occurrence statistics in text, including several not considered by De Deyne et al 2015or Vankrunkelsven et al (2018).…”

Section: Representationsmentioning

confidence: 86%

See 2 more Smart Citations

Similarity judgment within and across categories: A comprehensive model comparison

Richie¹,

Bhatia²

2020

Preprint

View full text Add to dashboard Cite

Similarity is one of the most important relations humans perceive, arguably subserving category learning and categorization, generalization and discrimination, judgment and decision making, and other cognitive functions. Researchers have proposed a wide range of representations and processes that could be at play in similarity judgment, yet have not comprehensively compared the power of these representations and processes for predicting similarity within and across different semantic categories. We performed such a comparison by pairing eight prominent vector semantic representations with seven established similarity metrics that could operate on these representations, as well as supervised methods for dimensional weighting in the similarity function. This approach yields a factorial model structure with 56 distinct representation-process pairs, which we tested on a novel dataset of similarity judgments between pairs of co-hyponymic words in eight categories. We found that cosine similarity and Pearson correlation were the overall best performing unweighted similarity functions, and that word vectors derived from free association norms often outperformed word vectors derived from text (including those specialized for similarity). Importantly, models that used human similarity judgments to learn category-specific weights on dimensions yielded substantially better predictions than all unweighted approaches across all types of similarity functions and representations, although dimension weights did not generalize well across semantic categories, suggesting strong category context effects in similarity judgment. We discuss implications of these results for cognitive modeling and natural language processing, as well as for theories of representations and processes involved in similarity.

show abstract

supporting

confidence: 82%

Section: Representationsmentioning

confidence: 86%

Section: Representationsmentioning

confidence: 86%

See 1 more Smart Citation

Similarity judgment within and across categories: A comprehensive model comparison

Richie¹,

Bhatia²

2020

Preprint

View full text Add to dashboard Cite

show abstract

“…Word association data display assortativity for valence, arousal, and dominance: cues of a particular affective quality tend to elicit responses with a similar affective quality (Pollio, 1964;Staats & Staats, 1959;Van Rensbergen, Storms, & De Deyne, 2015b). Accurate predictions of words' standings on all three affective dimensions can also be obtained from word association data (Vankrunkelsven, Verheyen, Storms, & De Deyne, 2018;Van Rensbergen, De Deyne, & Storms, 2015a). Therefore, word association data have the potential to uncover the extent to which there are systematic relationships between the manner in which words are organized in the mental lexicon and the words' affective dimensions, which have been claimed to be an integral part of the stored word meaning (Osgood, Suci, & Tannenbaum, 1957;Samsonovich & Ascoli, 2010).…”

Section: Applicationmentioning

confidence: 99%

Lexicosemantic, affective, and distributional norms for 1,000 Dutch adjectives

et al. 2019

Self Cite

View full text Add to dashboard Cite

The research of the word is still very much the research of the noun. Adjectives have been largely overlooked, despite being the second-largest word class in many languages and serving an important communicative function, because of the rich, nuanced qualifications they afford. Adjectives are also ideally suited to study the interface between cognition and emotion, as they naturally cover the entire range of lexicosemantic variables such as imageability (infinite-green), and affective variables such as valence (sad-happy). We illustrate this by showing how the centrality of words in the mental lexicon varies as a function of the words' affective dimensions, using newly collected norms for 1,000 Dutch adjectives. The norms include the lexicosemantic variables age of acquisition, familiarity, concreteness, and imageability; the affective variables valence, arousal, and dominance; and a variety of distributional variables, including network statistics resulting from a large-scale word association study. The norms are freely available from https://osf.io/nyg8v/, for researchers studying adjectives specifically or for whom adjectives constitute convenient stimuli to study other topics, such as vagueness, inference, spatial cognition, or affective word processing.

show abstract

“…Still, taking affective information into account might not suffice to capture representation of intangible abstracta. In this regard, recent multimodal models suggest that supplementing affective information with information related to the statistical distribution of concepts in language (i.e., distributional models of semantic representation; Landauer & Dumais 1997) drastically improves prediction of human affective judgments (Bestgen & Vincze 2012;Recchia & Louwerse 2015;Vankrunkelsven et al 2018). More importantly, recent work by Lenci et al (2018) reveals a strong link between distributional statistics and emotion: intangible representations have more affective content and tend to co-occur with contexts with higher emotive value.…”

Section: Prospection Does Not Imply Predictive Processingmentioning

confidence: 99%

Shared reality and abstraction: The social nature of predictive models

Rossignac-Milon¹,

Pinelli²,

Higgins³

2020

Behav Brain Sci

View full text Add to dashboard Cite

We propose that abstraction is an interpersonal process and serves a social function. Research on shared reality shows that in communication, people raise their level of abstraction in order to create a common understanding with their communication partner, which can subsequently distort their mental representation of the object of communication. This work demonstrates that, beyond building accurate models, abstraction also functions to build accurate models but also to build socially shared models – to create a shared reality.

show abstract

Predicting Lexical Norms: A Comparison between a Word Association Model and Text-Based Word Co-occurrence Models

Cited by 35 publications

References 55 publications

Similarity judgment within and across categories: A comprehensive model comparison

Similarity judgment within and across categories: A comprehensive model comparison

Lexicosemantic, affective, and distributional norms for 1,000 Dutch adjectives

Shared reality and abstraction: The social nature of predictive models

Contact Info

Product

Resources

About