Abstract: This advanced Natural Language Processing (NLP) workshop is focused on text summarization and allows you to automatically generate news headlines powered by Reuters News. Moreover, you’ll get a glimpse into the emerging field of Explainable AI.
NLP is one of the fastest-growing fields within AI. A wide variety of tasks can be tackled with NLP such as text classification, question-answering (e.g. chatbots), translation, topic modeling, sentiment analysis, summarization, and so on. In this workshop, we focus on text summarization, as it is not commonly showcased in tutorials despite being a powerful and challenging application of NLP.
We see a trend towards pre-training Deep Learning models on a large text corpus and fine-tuning them for a specific downstream task (also known as transfer learning). In this hands-on workshop, you’ll get the opportunity to apply a state-of-the-art summarization model to generate news headlines. We finetuned this model on Reuters news data, which is professionally produced by journalists and strictly follows rules of integrity, independence and freedom from bias.
The move towards more complex models for NLP tasks makes the need for AI explainability more apparent. How can we increase trust in what the model generated? With this workshop, we’ll bring you a step closer to answering this question.
The Python programming language will be used as it has a huge community across various industries and has become a standard in applied NLP. We chose Google Colab to host our code and training material to avoid any technical challenges.
The introduced NLP topics around text summarization and explainable AI are strengthened through guided hands-on exercises, supervised by mentors with several years of industry experience. At the end of this session, you will walk away with an interactive notebook to get a head start in applying the learned concepts to your own challenges.
Bio: Nadja Herger is a Data Scientist at Thomson Reuters Labs, based in Switzerland. She is primarily focusing on Deep Learning PoCs within the Labs, where she is working on applied NLP projects in the legal and news domains, applying her skills to text classification, metadata extraction, and summarization tasks. Before joining Thomson Reuters, she obtained her Ph.D. in Climate Science from the University of New South Wales, Australia. She has successfully made the transition from working with Spatio-temporal data to working with text-based data on the job. Nadja is passionate about education, which is reflected in her ongoing mentorship of students within Thomson Reuters, as well as South African students from previously disadvantaged groups who are aspiring to get into Data Science.