Abstract: With increasingly vast storehouses of textual data readily available, the field of Natural Language Processing offers the potential to extract, organize, and repackage knowledge revealed either directly or indirectly. Though for decades one of the holy grails of the field has been the vision of accomplishing these tasks with minimal human knowledge engineering through machine learning, with each new wave of machine learning research, the same tensions are experienced between investment in knowledge engineering and integration know-how on the one hand and production of knowledge/insight on the other hand. This talk explores techniques for injecting insight into data representations to increase effectiveness in model performance, especially in a cross-domain setting. Recent work in neural-symbolic approaches to NLP is one such approach, in some cases reporting advances from incorporation of formal representations of language and knowledge and in other cases revealing challenges in identifying high utility abstractions and strategic exceptions that frequently require exogenous data sources and the interplay between these formal representations and bottom-up generalities that are apparent from endogenous sources. More recently, Large Language Models (LLMs) have been used to produce textual augmentations to data representations, with more success. Couched within these tensions, this talk reports on recent work towards increased availability of both formal and informal representations of language and knowledge as well as explorations within the space of tensions to use this knowledge in effective ways.
Attendees will learn about new techniques for leveraging LLMs in their approaches to text mining and conversational data mining.
Bio: Her research focuses on better understanding the social and pragmatic nature of conversation, and using this understanding to build computational systems that improve the efficacy of conversation both between people, and between people and computers. In order to pursue these goals, she invokes approaches from computational discourse analysis and text mining, conversational agents, and computer-supported collaborative learning. Carolyn grounds her research in the fields of language technologies and human-computer interaction, and am fortunate to work closely with students and post-docs from the LTI and the Human-Computer Interaction Institute, as well as to direct my own lab, TELEDIA.