Exploratory Text Analysis in Python Using spaCy and textacy
Exploratory Text Analysis in Python Using spaCy and textacy


The Python ecosystem has many libraries for natural language processing (NLP), which can make it confusing to get started analyzing text as data. This workshop will introduce spaCy as a powerful, opinionated library for NLP that facilitates analysis of text data, along with textacy, a library that adds information retrieval and corpus analysis features.

By completing this workshop, you will develop core skills in asking questions of text and identifying interesting features through spaCy's tokenization, part-of-speech tagging, and named entity recognition. You will also learn to expand that analysis and scale it to many documents through textacy.

Session Outline
Lesson 1: Working with a single document, perform word and sentence tokenization, part-of-speech tagging, and named entity recognition while forming analytical questions.

Lesson 2: Working with a small set of documents and the textacy library, learn to extract information at corpus level based on the same grammatical features identified in lesson 1.

Background Knowledge
Participants should have a reasonable grasp of basic Python syntax, including control flow, functions, and list operations. Knowledge of English syntax, such as parts of speech, will be helpful but not necessary for successful participation.


Scott Bailey is the Digital Research and Scholarship Librarian at the NC State University Libraries, where he collaborates with faculty and other scholars in applying digital and computational tools and methods to open new possibilities in research and learning. He regularly teaches workshops using programming languages like Python and R to introduce data analysis and visualization, machine learning, and computational approaches to text data. He was previously the Head of Social Science Data and Software in the Center for Interdisciplinary Digital Research (CIDR) at Stanford Libraries, where he oversaw a group of Ph.D. students in delivering expert consultation on statistical computing, organized and taught in the CIDR workshop series, and collaborated with colleagues across Stanford University to provide better access to data and support for data-driven research.

Open Data Science




Open Data Science
One Broadway
Cambridge, MA 02142

Privacy Settings
We use cookies to enhance your experience while using our website. If you are using our Services via a browser you can restrict, block or remove cookies through your web browser settings. We also use content and scripts from third parties that may use tracking technologies. You can selectively provide your consent below to allow such third party embeds. For complete information about the cookies we use, data we collect and how we process them, please check our Privacy Policy
Consent to display content from - Youtube
Consent to display content from - Vimeo
Google Maps
Consent to display content from - Google