Controlled Text Generation with Transformer-based Language Models

Abstract: 

Natural language generation is one of the key areas of Natural Language Processing with a range of applications such as dialogue generation, question-answering, machine translation, summarisation, etc. Most recently, controlled text generation techniques have been actively applied for data augmentation purposes in the general NLP domain notorious for its data sparsity issue. This makes this task the principal tool ​​in the toolkit of any Data Science or AI practitioner. Current state-of-the-art in language generation predominantly uses pre-trained Transformer-based language models. Despite the progress of these powerful models, the task of controlling text generation remains a challenge and mainly relies on best practices.

By completing this workshop, you will gain practical skills of controlling the generation of text as produced by popular Transformer-based models by conditioning on some prompt text or keywords, as well as adjusting the diversity of the output with various sampling approaches.

Session Outline:
We will build Python code to fine-tune three state-of-the-art text generation models (GPT-2, DialoGPT and T5 ) from the Hugging face library (https://huggingface.co) for the controlled generation of text using prompts, previous dialogue utterances or keywords respectively. You will also learn how to adjust the diversity of the generated text using different sampling techniques (greedy search, temperature sampling, top-k sampling, top-p sampling, beam search). We will use the publicly available data from movie subtitles (https://opus.nlpl.eu/OpenSubtitles.php).

Background Knowledge:
Python

Bio: 

Julia Ive is a Lecturer in Natural Language Processing at Queen Mary University of London, UK. She is the author of many mono- and multimodal text generation approaches in Machine Translation and Summarisation. Currently, she is working on the theoretical aspects of style preservation and privacy-safety in artificial text generation.

Open Data Science

 

 

 

Open Data Science
One Broadway
Cambridge, MA 02142
info@odsc.com

Privacy Settings
We use cookies to enhance your experience while using our website. If you are using our Services via a browser you can restrict, block or remove cookies through your web browser settings. We also use content and scripts from third parties that may use tracking technologies. You can selectively provide your consent below to allow such third party embeds. For complete information about the cookies we use, data we collect and how we process them, please check our Privacy Policy
Youtube
Consent to display content from - Youtube
Vimeo
Consent to display content from - Vimeo
Google Maps
Consent to display content from - Google