C.Psyd meetings consist of paper presentations, project workshopping, invited speakers, brainstorming sessions, etc. The format frequently changes between terms. If you’d like to join C.Psyd meetings, please send me an email.

Invited Speakers


  • Grusha Prasad (Colgate University)
    Generating and testing *quantitative* predictions of human language processing
    Decades of psycholinguistic research has focused on evaluating theories of human sentence processing by testing qualitative behavioral predictions. The advent of broad-coverage computational models of human sentence processing has made it possible to move beyond these qualitative predictions and derive quantitative predictions from theories. In the first part of this talk, I discuss the importance of large-scale datasets for testing such quantitative predictions, and present data from one such large-scale dataset: SAP Benchmark (Huang et al, 2023). These data suggest that word predictability alone, as estimated from neural network language models trained on uncurated datasets from the internet, cannot explain syntactic disambiguation difficulty. New modeling work suggests that this conclusion holds for models trained on datasets curated to be more developmentally plausible. In the second part of this talk, I discuss the factors that can impact our empirical estimates of processing difficulty. Focusing in on the web-based platform used for data collection, I present some data which demonstrates that the quantitative, and in some cases qualitative, pattern of results differs between participants recruited via Prolific vs. Amazon’s Mechanical Turk. This suggests that web-based platform choice could be an important variable to include in meta-analyses and a factor to consider future experimental design.

  • Kanishka Misra (UT Austin)
    Analyzing Robust Conceptual Knowledge and Property Inheritance in Language Models
    A characteristic feature of human semantic cognition is the ability to not only store and retrieve the properties of concepts observed through experience, but to also facilitate the inheritance of properties (e.g., can breathe) from superordinate concepts (animal) to their subordinates (dog)—i.e. demonstrate property inheritance. In this work, I will present COMPS, a collection of minimal pair sentences that jointly tests pre-trained language models (PLMs) on their ability to attribute properties to concepts and their ability to demonstrate property inheritance behavior. Analyses of 31 different PLMs on COMPS reveal that they can easily distinguish between concepts on the basis of a property when they are trivially different but find it relatively difficult when concepts are related on the basis of explicit knowledge representations. Further analyses find that PLMs can show behaviors suggesting successful property inheritance in simple contexts, but fail in the presence of distracting information, which decreases the performance of many models, sometimes even below chance. This lack of robustness in demonstrating simple reasoning raises important questions about PLMs’ capacity to make correct inferences even when they appear to possess the prerequisite knowledge. I will also discuss preliminary results on a potential few-shot setting of this task, complemented with additional controls for positional heuristics.

  • Nan-Jiang Jiang (The Ohio State University)
    Understanding and Predicting Human Label Variation in Natural Language Inference through Explanations
    Human label variation (Plank, 2022), or annotation disagreement, exists in many natural language processing (NLP) tasks.To be robust and trusted, NLP models need to identify such variation and be able to explain it. To this end, we created the first ecologically valid explanation dataset with diverse reasoning, LiveNLI. LiveNLI contains annotators’ highlights and free-text explanations for the label(s) of their choice for 122 English Natural Language Inference items, each with at least 10 annotations. We used its explanations for chain-of-thought prompting, and found there is still room for improvement in GPT-3’s ability to predict label distribution with in-context learning.

  • Julia Mendelsohn (University of Michigan)
    Computational analysis of nuanced political rhetoric
    When discussing politics, people often use subtle linguistic strategies to influence how their audience thinks about issues, which can then impact public opinion and policy. For example, anti-immigration activists may describe immigrants with dehumanizing flood metaphors, frame immigration as a threat to native-born citizens’ jobs, or even use coded expressions to covertly connect migration with antisemitic conspiracy theories. In this presentation, I will focus on the latter two strategies: framing and dogwhistle communication. I will discuss how we (1) draw from multiple social science disciplines to develop typologies and curate data resources, (2) build and evaluate NLP models for detecting these strategies, (3) computationally analyze how these strategies are used in political discussions across several domains, and (4) assess the implications of such nuanced rhetoric for both people and language technology systems.


  • Isabel Papadimitriou (Stanford University)
    What can we learn about language from language models?
    The development of successful language models has provided us with an exciting test bed: we have learners that can learn language from data, and we can watch them do it. In this talk I’ll go over two sets of experiments that examine language representation and learning in language models, and discuss what we can learn from them. Firstly, I’ll go over experiments that use transformers to approach a cognitive question: what are the inductive biases that help a learner in acquiring human language from data? To do this, we set up an experimental test bed where we create learners with different structural biases (nesting parentheses, crossing dependencies, regular patterns) and test their ability to transfer this knowledge to English. Secondly, I’ll look at subjecthood (the property of being the subject or the object of a sentence) in language model embedding spaces. By probing the embedding space, we show how a discrete feature like subjecthood can be encoded in a continuous space, affected but not fully determined by prototype effects, and also how these properties come into play with a feature being universally shared among many languages. Insofar as computational models of cognition act as hypothesis generators for inspiring and guiding our research into understanding human language, language models are a very exciting tool to work with and understand.

  • Cory Shain (MIT)
    Incremental Story Comprehension in the Human Brain
    A major goal of psycholinguistics is to understand the mental algorithms that enable human language comprehension. But the component processes are challenging to isolate experimentally. As a result, researchers are increasingly complementing controlled experiments by evaluating word-by-word predictors from computational cognitive models against data from humans processing naturalistic language input. In this talk, I will present two such studies evaluating theoretical predictions of language processing demand against brain activity during naturalistic story listening in a large fMRI cohort (n=78). These studies address three core outstanding questions: (1) Does sentence comprehension involve rich syntactic structure building? (2) Is structural integration (in working memory) dissociable from prediction? (3) Are the neural mechanisms that support prediction and working memory for language shared with other domains of cognition? Based on results from rigorous model comparisons, I will argue that the answers are respectively yes, yes, and no: the evidence supports rich syntactic structure processing during passive story listening, robust dissociation of structural integration from predictive processing, and a primarily language-selective neural locus for these effects, with no evidence that domain-general neural circuits are recruited for either prediction or structural integration.

  • Rachael Elizabeth Weissler (University of Oregon)
    Leveraging African American English Knowledge: Cognition and Multidialectal Processing
    In order to better understand how American listeners cognitively interact with Black and White voices, I engage theories of language variation and social cognition from the sociolinguistic and psycholinguistic perspectives. Previous research has shown that this kind of hierarchical treatment of language varieties leads to negative perceptions of non-standard languages, which in turn makes them stigmatized, and ultimately perpetuates dialect discrimination. This kind of discrimination results in the mistreatment of users of non-standard varieties, which negatively affects the way those speakers can move through the U.S. context (Rickford 1999, Eckert and Rickford 2001, Schilling 2004, Rickford and King 2016). This research investigates how listeners alter their linguistic expectations when hearing Standardized American English (SdAE) and African American English (AAE) through two Electroencephalography (EEG) experiments. I ask whether listeners have specific knowledge of the dialect that is not their own, or whether listeners more generally reduce expectations across the board when listening to a dialect or variant that they themselves do not speak. Experimental sentences were constructed in order to reflect a variant that is grammatical in SdAE, that is grammatical uniquely to AAE, and a grammatical variant that is ungrammatical in all varieties of English. Experiment 1 includes stimuli from a single Black man employing both SdAE and AAE speech. Experiment 2 includes the same AAE stimuli from that speaker, as well as stimuli from one SdAE speaker, both male from the Midwest. The results reflect a nuanced combination of both perspectives: Listeners show differential processing depending on the guise used by the Black man, and also show processing results in alignment with SdAE grammar in Experiment 2 only. These studies indicate that a speaker’s identity and language variety may both be taken into account during processing. They also indicate the potential that listeners aren’t interacting with or processing SdAE when they interact with it coming from an African American person; though standard features may be evoked, the speaker still “Sounds Black.” Through analysis of American Englishes, this work contributes to further understanding of how social information interfaces with online processing, and expectations that may be formed depending on the perceived identity of a voice. Future work seeks to ask how listeners of varied linguistic knowledges of AAE specifically process this syntactic variation, and also disentangling grammatical processing of AAE versus perceiving “Sounding Black.”


  • Ariel James (Macalester College)
    Capturing and explaining individual differences in language processing: Triumphs and Challenges
    Large inter-individual variability in experimental effects can be an exciting discovery, a nuisance to explain, or some combination of the two. In this talk, I will talk about my approach to the study of individual differences in language processing. As an illustration, I will provide an overview of a study that I completed with my colleagues (James et al., 2018) in which we replicated three major syntactic phenomena in the psycholinguistic literature: use of verb distributional statistics, difficulty of object- versus subject-extracted relative clauses, and resolution of relative clause attachment ambiguities. We examined whether any individual differences in these phenomena could be predicted by language experience or more general cognitive abilities. We found correlations between individual differences and offline, but not online, syntactic phenomena. I will discuss these findings in the context of Cronbach’s “two disciplines” problem of combining experimental and correlational approaches, and describe my in-progress work to explore these problems and hopefully find some solutions.

  • Allyson Ettinger (University of Chicago)
    “Understanding” and prediction: Controlled examinations of meaning sensitivity in pre-trained models
    In recent years, NLP has made what appears to be incredible progress, with performance even surpassing human performance on some benchmarks. How should we interpret these advances? Have these models achieved language “understanding”? Operating on the premise that “understanding” will necessarily involve the capacity to extract and deploy meaning information, in this talk I will discuss a series of projects leveraging targeted tests to examine NLP models’ ability to capture meaning in a systematic fashion. I will first discuss work probing model representations for compositional meaning, with a particular focus on assessing compositional information beyond encoding of lexical properties. I’ll then explore models’ ability to extract and deploy meaning information during word prediction, applying tests inspired by psycholinguistic methods to examine the types of information that models encode and use from input text. In all cases, these investigations apply tests that prioritize control of unwanted cues, so as to target the desired meaning capabilities with greater precision. The results of these studies suggest that although models show a good deal of sensitivity to word-level information, and to a number of semantic and syntactic distinctions, they show little sign of capturing higher-level compositional meaning, of handling logical impacts of meaning components like negation, or of retaining access to detailed representations of information conveyed in prior context. I will discuss potential implications of these findings with respect to the goals of achieving “understanding” with the currently dominant pre-training paradigms.

  • Sidharth Ranjan (IIT Delhi)
    Expectation Adaptation Effects in Hindi Preverbal Constituent Ordering
    In this study, we investigate the extent to which adapting a neural language model’s expectation using preceding context (viz., lexical items or syntactic structures) influences preverbal constituent ordering in Hindi, a predominantly SOV language with flexible word order. Prior work has shown that Hindi optimizes for surprisal (Agrawal et al. 2017, Ranjan et al. 2019). Nevertheless, in Hindi, the effects of priming (Bock 1986, Chang et al. 2012, Tooley and Traxler 2010) have been under-explored. In a recent work, van Schijndel and Linzen (2018) showed that adaptive LSTM language models (LMs) significantly improved the ability to predict human reading times. Furthermore, Prasad et al. (2019) demonstrated that neural LMs model abstract properties of sentences such that learned representations can be organized in a linguistically interpretable manner.
    First we set up a framework to generate grammatical variants corresponding to sentences in Hindi-Urdu Treebank (HUTB) corpus of written text (Bhatt et al. 2009) by permuting their preverbal constituents. Subsequently, we deployed a logistic regression model to predict HUTB reference sentences (amidst variants expressing the same idea) using various cognitively motivated features viz., dependency length (Gibson 1998; 2000), surprisal (Hale 2001, Levy 2008), and adaptive surprisal (van Schijndel and Linzen 2018). Corroborating the findings in the literature, our results provide evidence for the role of adaptation effects in determining word-order preferences in Hindi. We further show that adapting LSTM LMs using abstract representations learned from the newswire domain helped to correctly classify the corpus reference sentence amidst competing variants. Further linguistic analyses revealed that the adaptive LSTM LM performed well not just overall, but also across different constructions viz., active and passive voice sentences, conjunct verbs, and non-canonical sentences. The efficacy of adapted surprisal for Hindi syntactic choice indicates the primacy of accessibility-based considerations over memory costs. Overall, we demonstrate that adaptation captures not only the stylistic patterns, syntactic structures but also discourse effects.

  • Najoung Kim (Johns Hopkins University)
    Compositional linguistic generalization in contemporary neural models of language
    In this talk, I will present two evaluation methods for testing compositionality in neural models of language exploiting systematic gaps between training and evaluation sets. The first is a sentence-to-logical form interpretation task in which the evaluation set consists of examples that require novel compositions of familiar syntactic structures or familiar lexical items and syntactic structures (e.g., if a model can assign a correct interpretation to The hedgehog saw the cat, can it also interpret The cat saw the hedgehog?). The second evaluation targets pretrained language models, testing whether exposure to novel lexical items in contexts that unambiguously signal their grammatical category membership facilitates grammatical category-based inferences in different contexts that do not have any lexical overlap with the exposure contexts (e.g., having seen I love the blick and The cats dax, does the model prefer blick over dax in A ___ was dancing?). In light of the partial success of the models tested (first task: LSTM and Transformer, second task: BERT-large), I will discuss several ideas for future work.

  • Aline Villavicencio (University of Sheffield)
    Probing for idiomaticity in vector space models
    Contextualised word representation models have been successfully used for capturing different word usages, and they may be an attractive alternative for representing idiomaticity in language. In this paper, we propose probing measures to assess if some of the expected linguistic properties of noun compounds, especially those related to idiomatic meanings, and their dependence on context and sensitivity to lexical choice, are readily available in some standard and widely used representations. For that, we constructed the Noun Compound Senses Dataset, which contains noun compounds and their paraphrases, in context neutral and context informative naturalistic sentences, in two languages: English and Portuguese. Results obtained using four types of probing measures with models like ELMo, BERT and some of its variants, indicate that idiomaticity is not yet accurately represented by contextualised models. Paper accepted for EACL 2021.

  • Robert Hawkins (Princeton University)
    Coordinating on meaning in communication
    Languages are powerful solutions to coordination problems: they provide stable, shared expectations about how the words we say correspond to the beliefs and intentions in our heads. However, in an non-stationary environment with new things to talk about and new partners to talk with, linguistic knowledge must be flexible: old words acquire new ad hoc or partner-specific meanings on the fly. In this talk, I’ll share some recent work investigating the cognitive mechanisms that support this balance between stability and flexibility in human communication, which motivates the development of more adaptive, interactive language models in NLP. First, I’ll introduce a computational framework re-casting communication as a hierarchical meta-learning problem: community-level conventions and norms provide stable priors for communication, while rapid learning within each interaction allows for partner- and context-specific common ground. I’ll evaluate this model using a new corpus of natural-language communication in a communication task where participants are grouped in small communities and take turns referring to ambiguous tangram objects and describe how we scaled up this framework to neural architectures that can be deployed in real-time interactions with human partners. Taken together, this line of work aims to build a computational foundation for a more dynamic and socially-aware view of linguistic meaning in communication.

  • Vera Demberg (Saarland University)
    Investigating individual differences in discourse comprehension through crowd-sourcing annotation
    Disagreements between annotators in discourse relation annotation are a commonly observed problem in discourse bank creation, and subsequent inconsistencies in annotation may negatively affect discourse relation classification results. In my talk, I will present our recent work on crowd-sourcing discourse relation annotations. I will present our data collection methodology, and argue that crowd-sourcing discourse annotations can help us to better understand whether discrepancies in interpretation should be continued to be considered “random noise” or whether these discrepancies are systematic. I will then proceed to discuss our studies on individual differences in discourse relation interpretation, with specific focus on the interpretation of specification and instantiation relations, as well as predictive processing of list signal cues. We find that differences in interpretation are related to individual biases, which can in turn be related to depth of processing and to linguistic experience.


  • Stefan Frank (Radboud University)
    Neural models of bilingual sentence processing
    A bilingual’s two grammars do not form independent systems but interact with each other, as is clear from phenomena such as syntactic transfer, cross-linguistic structural priming, and code-switching. I will present neural network models of bilingual sentence processing that show how (some of) these phenomena can emerge from mere exposure to two languages, that is, without requiring cognitive mechanisms that are specific to bilingualism.

  • Aurelie Herbelot (University of Trento)
    Modelling the acquisition of linguistic competences from small data
    There is currently much optimism in the field of Natural Language Processing (NLP): some basic linguistic tasks are considered ‘solved’, while others have tremendously benefited from the introduction of novel neural architectures. However, the data, training regimes and system architectures required to obtain top performance are often unrealistic from the point of view of human cognition. It is therefore questionable whether current NLP systems can ever earn the name of ‘models’ of language learning. In this talk, we will subject well-known algorithms to one specific constraint on human acquisition: limited input. The first part of the talk will focus on RNN architectures and analyse their level of grammatical competence when trained over 3 million tokens from child-directed language. The second part will investigate the issue of semantic competence, looking at the behaviour of word embedding systems with respect to three aspects of meaning: lexical knowledge, reference, distributional properties. We will conclude that NLP systems can actually adapt well to small data, but that their success may be highly dependent on the nature of the data they receive, as well as the underlying representations they learn from. (Work with Ludovica Pannitto)

  • Maayan Keshev (Tel-Aviv University)
    Noisy is better than rare: Evidence from processing ambiguous relative clauses in Hebrew
    During sentence processing readers may utilize their top-down knowledge to overcome possible noise in their input. Thus, the interpretation of improbable strings could be pulled towards a likely near-neighbour. In the current study, I exhibit this kind of rational noisy-channel inference in processing of Hebrew relative clauses which are ambiguous between SR and OR readings. I suggest that readers may be willing to compromise agreement information in order to construct a SR, depending on the prior probability of the OR structure. Thus, a corrupted SR (with mismatching verbal agreement) is preferred over a grammatical OR with a rare word order. Yet, readers opt for the OR parse if it is not extremely rare (though presumably less frequent than the SR structure).