Abstract: In this workshop, I will introduce the basics of Natural Language Processing, including the structure of a typical NLP project, with a focus on topic modeling. We will build a topic modeling system using the BBC news dataset. After the workshop you will have a good grasp on the structure of an NLP project, methods used in NLP, and will have built a topic model project by preprocessing and vectorizing the data, building the topic model, visualizing and evaluating it.
Lesson 1. Learn about the structure of an NLP project and approaches currently used in NLP. At the end of this lesson, you will be able to tell which NLP architecture and which approach should be used for different NLP tasks.
Lesson 2. Learn about preprocessing text data before it can be used in a model. At the end of this lesson, you will be able to clean, preprocess and vectorize the data we will be using for the topic modeling project.
Lesson 3. Learn about different topic modeling approaches, including LDA, and how to choose the number of topics. At the end of this lesson, you will be able to build a topic model using LDA.
Lesson 4. Learn about topic modeling visualization and evaluation. At the end of this lesson, you will be able to create a graphical visualization of your topic model and evaluate it using different methods.
Python, Basics of Machine Learning
Bio: Zhenya Antić is an NLP consultant and founder of Practical Linguistics Inc. Her projects include document summarization, information extraction, topic modeling and sentiment analysis of consumer reviews, and document similarity. She is the author of the recently published Python Natural Language Processing Cookbook. Zhenya holds a PhD in Linguistics from the University of California Berkeley and a BS in Computer Science from the Massachusetts Institute of Technology.