Abstract: In this 90-minute workshop, we will describe how to apply Deep Learning (DL) to a few Natural Language Processing (NLP) tasks. It is an introductory workshop, where the target audience is expected to have basic knowledge of neural networks but little or no NLP experience. The format is a mix of presenting theory and walking through programming examples.
You will learn about language models and word embeddings and get an idea of why they work. You will see examples of how to implement a seemingly complicated task like natural language translation using 300 lines of code, both in TensorFlow and PyTorch. The overall goal of this workshop is to connect the dots between basic neural networks and more complicated NLP architectures like BERT and GPT.
Lessons 1 and 2:
We start with reviewing neural network fundamentals, including recurrent networks. We move on to describing language models, which are used for many NLP tasks, and show how to implement one using a neural network.
When using DL for NLP tasks, it is common to represent words using word embeddings, which is a concept that was popularized by the word2vec algorithm. You will learn the basics of word embeddings, how they work, and their relationship to language models.
Language models and word embeddings are two fundamental building blocks used for natural language translation systems. We demonstrate how to combine two neural networks into a sequence-to-sequence model that can translate simple sentences from one language to another. We provide example implementations in both TensorFlow and PyTorch.
We conclude the workshop with describing how to improve language translation using a mechanism known as attention and describe how attention is used in the transformer architecture. This architecture is the basis for popular language models such as BERT and GPT.
The difficulty level of this workshop is Beginner from an NLP perspective but Intermediate from a DL perspective. The following prerequisite skills are needed:
- Basic understanding of how a neural network operates and how it is trained
- Some exposure to recurrent networks
- Basic Python skills to enable following the programming examples
Bio: Magnus Ekman is a Director of Architecture at NVIDIA, where he leads an engineering team working on CPU performance and power efficiency. As the deep learning (DL) field exploded in the past few years, fueled by NVIDIA’s GPU technology and CUDA, he found himself in the midst of a company expanding beyond computer graphics and becoming a DL powerhouse. As a part of that journey, he challenged himself to stay up to date with the most recent developments in the field. In collaboration with NVIDIA Deep Learning Institute (DLI) he recently published the book “Learning Deep Learning: Theory and Practice of Neural Networks, Computer Vision, Natural Language Processing, and Transformers Using TensorFlow.”