Abstract: BERT has revolutionized the field of Natural Language Processing (NLP)--with BERT, you can achieve high accuracy on a variety of tasks in NLP with low effort in design.
In this workshop, I’ll be taking us through some illustrations and example Python code to learn the fundamentals of applying BERT to text applications.
- BERT’s strengths, applications, and weaknesses
- The concepts of “pre-training” and “fine-tuning”
- The basics of BERT’s architecture
- How to format text to feed into BERT
- How to “fine-tune” BERT for text classification with PyTorch and the Huggingface “transformers” library
'== Part 1: Overview of the BERT model ==
To motivate our discussion, we’ll start by looking at the significance of BERT and where you’ll find it the most powerful and useful. Despite its broad applicability, BERT isn’t *always* the right answer, so I’ll also cover its limitations and weaknesses.
Next, we’ll get a basic understanding of the components of BERT’s architecture and how data flows through it. This should let us work with the model without it being a complete “black box” mystery.
== Part 2: Preparing Text for BERT ==
To use BERT effectively, you’ll want to understand how a text string gets converted to BERT’s required format.
This will mark the start of our example code. We’ll take an example text classification dataset and walk through the steps for tokenizing, encoding, and padding the text samples.
== Part 3: Fine-Tuning BERT ==
With our text prepared, we’ll next implement the training loop for fine-tuning BERT on our text data. This includes batching the training samples, performing a forward pass, calculating the error, and back propagation to calculate and apply weight updates. Fortunately, PyTorch makes all of the above pretty straight forward!
After training BERT, we’ll apply it to our test set and see how it does!
== Part 4: Trainer Class ==
Once you have a good understanding of this whole process, the ‘transformers’ library includes a “Trainer” class which allows you to implement most of the above in just a few lines of code.
So why didn’t we just start with that!? There’s a catch--it has a long list of arguments, and setting these properly requires first understanding the details of the training process.
With the knowledge you’ve gained in the workshop, you’ll be ready, and we’ll conclude by re-implementing the example using this streamlined approach.
We will be programming in Python, and running our code using Google Colab. Colab is essentially a Jupyter Notebook running in Google’s cloud (rather than on your laptop) and with free access to a GPU!
So, Python is a requirement, and familiarity with Jupyter Notebooks is recommended.
I’ll also assume that you are familiar with neural network concepts such as “layers”, “embeddings”, and “backpropagation”.
Knowledge of PyTorch is *not* required--you'll get to pick up some of the basics in this workshop!
Bio: Chris is an author of eBooks, tutorial videos, and example code on a variety of Machine Learning topics--particularly on challenging subjects in NLP. He’s best known for his word2vec blog posts (recommended reading for Stanford's NLP class), BERT architecture YouTube series, and example code for a variety of BERT applications.
Chris earned his B.S. from Stanford in 2006 as a software engineer, and has been working in the areas of computer vision, machine learning, and NLP since 2012.
His writing and speaking styles are characterized by levity and positioning himself as a fellow learner rather than an authority. Chris loves to create the tutorials that he wishes he could have read--with an emphasis on thoroughness, while still being easy-to-follow. You’ll often find his simple and colorful illustrations reused around the web. His example code follows the same principles--working code is always a great start, but he further prioritizes explanation and readability, with thoughtful organization and detailed comments at every step.