Abstract: Natural language generation is one of the key areas of Natural Language Processing with a range of applications such as dialogue generation, question-answering, machine translation, summarisation, etc. Most recently, controlled text generation techniques have been actively applied for data augmentation purposes in the general NLP domain notorious for its data sparsity issue. This makes this task the principal tool in the toolkit of any Data Science or AI practitioner. Current state-of-the-art in language generation predominantly uses pre-trained Transformer-based language models. Despite the progress of these powerful models, the task of controlling text generation remains a challenge and mainly relies on best practices.
By completing this workshop, you will gain practical skills of controlling the generation of text as produced by popular Transformer-based models by conditioning on some prompt text or keywords, as well as adjusting the diversity of the output with various sampling approaches.
We will build Python code to fine-tune three state-of-the-art text generation models (GPT-2, DialoGPT and T5 ) from the Hugging face library (https://huggingface.co) for the controlled generation of text using prompts, previous dialogue utterances or keywords respectively. You will also learn how to adjust the diversity of the generated text using different sampling techniques (greedy search, temperature sampling, top-k sampling, top-p sampling, beam search). We will use the publicly available data from movie subtitles (https://opus.nlpl.eu/OpenSubtitles.php).
Bio: Julia Ive is a Lecturer in Natural Language Processing at Queen Mary University of London, UK. She is the author of many mono- and multimodal text generation approaches in Machine Translation and Summarisation. Currently, she is working on the theoretical aspects of style preservation and privacy-safety in artificial text generation.