The following speakers have graciously agreed to give keynotes at EMNLP 2022.
Speaker: Mona Diab
Towards a Responsible NLP: Walking the walk
In a world of racing to get the best systems on leaderboards, winning best shared tasks, building the largest LM, are we losing our soul as a scientific enterprise? Do we need to re-orient and re-pivot NLP? If so, what is needed to make this happen? Can we chart together a program where we ensure that science is the pivotal ingredient in CL/NLP? Could Responsible NLP be an avenue that could lead us back towards that goal? In this talk, in the spirit of EMpirical NLP, I will explore some “practical” ideas around framing a Responsible NLP vision hoping to achieve a higher scientific standard for our field, addressing issues from the “how” we conduct our research and venturing into the “what” we work on and produce using tenets from responsible mindset perspective. I will pose more questions than answers. This is a call to action, an invitation to start a real global community conversation, hopefully engaging all stakeholders: academia, industry, government and civic society.
Mona Diab is the Lead Responsible AI Research Scientist with Meta. She is also a full Professor of Computer Science at the George Washington University (on leave) where she directs the CARE4Lang NLP Lab. Before joining Meta, she led the Lex Conversational AI project within Amazon AWS AI. Her current focus is on Responsible AI and how to operationalize it for NLP technologies. Her interests span building robust technologies for low resource scenarios with a special interest in Arabic technologies, (mis) information propagation, computational socio-pragmatics, computational psycholinguistics, NLG evaluation metrics, Language modeling and resource creation. Mona has served the community in several capacities: Elected President of SIGLEX and SIGSemitic, and she currently serves as the elected VP for ACL SIGDAT, the board supporting EMNLP conferences. She has delivered tutorials and organized numerous workshops and panels around Arabic processing, Responsible NLP, Code Switching, etc. She is a cofounder of CADIM (Consortium on Arabic Dialect Modeling, previously known as Columbia University Arabic Dialects Modeling Group), in 2005, which served as a world renowned reference point on Arabic Language Technologies. Moreover she helped establish two research trends in NLP, namely computational approaches to Code Switching and Semantic Textual Similarity. She is also a founding member of the *SEM conference, one of the top tier conferences in NLP. Mona has published more than 250 peer reviewed articles.
Affiliation: Meta Responsible AI
Speaker: Neil Cohn
The multimodal language faculty and the visual languages of comics
Contrary to the notions of language as an amodal system, natural human communication is multimodal and combines speech, gestures, writing, and pictures. To account for this, recent work has proposed that our vocal, bodily, and graphic modalities persist in parallel in a multimodal language faculty, and both unimodal and multimodal expressions arise out of emergent states of a shared architecture. Such a model carries different expectations for the ways in which modalities may be similar or different from each other, and how they may interact. I will highlight these properties specifically for our graphic modality, which I argue can manifest in full visual languages when displaying both a systematic lexicon and complex grammar. I will use analysis of a corpus of several hundred annotated comics to show distinctive patterns that suggest they are drawn in different visual languages. Yet, I will also show that consistent “universal” linguistic principles persist across this structural diversity. Finally, I will argue that a multimodal language faculty requires us to change our conception of linguistic relativity, and I will show how subtle structures of spoken languages permeate across to visual languages. Altogether, this work argues for a multimodal basis of linguistic structure, and heralds a reconsideration of what constitutes the language system.
Neil Cohn is an American cognitive scientist best known for his research on the overlap in structure and cognition between language and graphic communication like comics and emoji. He is the author of 80+ academic papers, 4 academic books, and 2 graphic novels. He received his PhD in cognitive psychology at Tufts University and is currently an associate professor at the Department of Cognition and Communication at Tilburg University in The Netherlands. His work can be found online at
Affiliation: Tilburg University, Department of Communication and Cognition
Speaker: Gary Marcus
Towards a Foundation for AGI
Large pretrained language models like GPT-3 and PaLM have generated enormous enthusiasm, and are capable of producing remarkably fluent language. But they have also been criticized on many grounds, and described as “stochastic parrots.” Are they adequate as a basis for artificial general intelligence [AGI], and if not, what would a better foundation for general intelligence look like?
Gary Marcus is a leading voice in artificial intelligence. He is a scientist, best-selling author, and serial entrepreneur (Founder of Robust.AI and Geometric.AI, acquired by Uber). He is well-known for his challenges to contemporary AI, anticipating many of the current problems decades in advance, and for his research in human language development and cognitive neuroscience. An Emeritus Professor of Psychology and Neural Science at NYU, he is the author of five books, including, The Algebraic Mind, Kluge, The Birth of the Mind, and the New York Times Bestseller Guitar Zero. He has often contributed to The New Yorker, Wired, and The New York Times. His most recent book, Rebooting AI, with Ernest Davis, is one of Forbes’s 7 Must Read Books in AI.
Affiliation: NYU (Emeritus) NYU (Emeritus)
Speaker: Nazneen Rajani
Takeaways from a systematic study of 75K models on Hugging Face
Abstract: Language models trained using transformers dominate the NLP model landscape, making Hugging Face (HF) the defacto hub for sharing, benchmarking, and evaluating NLP models. The HF hub provides a rich resource for understanding how language models evolved, opening up research questions such as ‘Is there a correlation between model documentation and its usage?’, ’How have the models evolved?’, ‘What do users document about their models?’. In the first part of my talk, I’ll give a macro-level view of how the NLP model landscape has evolved based on our systematic study of 75K HF models. In the second part, I’ll discuss advances, challenges and opportunities in evaluating and documenting NLP models developed in an industry setting. Based on the results, do we see a paradigm shift from model-centric to data-centric evaluation and documentation?
Nazneen is a Research Lead at HuggingFace, a startup with a mission to democratize ML, leading data-centric ML research which involves systematically analyzing, curating, and automatically annotating data. Before HF, she worked at Salesforce Research with Richard Socher and led a team of researchers focused on building robust natural language generation systems based on LLMs. She completed her Ph.D. in CS at UT-Austin with Prof. Ray Mooney.
Nazneen has over 30 papers accepted at ACL, EMNLP, NAACL, NeurIPs, and ICLR and has her research covered by Quanta magazine, VentureBeat, SiliconAngle, ZDNet, and Datanami. She is also teaching a course on interpreting ML models with Corise – http://corise.com/go/nazneen. More details about her work can be found here https://www.nazneenrajani.com/