Abstract: spaCy is an open-source library for advanced Natural Language Processing in Python. Central to its design is developer productivity – we want to empower developers by giving them the necessary tools to implement custom NLP solutions. On the one hand, this means providing them with pre-trained models and reasonable default settings so they can hit the ground running. On the other hand, spaCy provides a powerful configuration system that allows developers to easily plug in custom models or algorithms they’ve developed.
In this talk, I will first give an overview of the built-in functionality available in spaCy, using pretrained models. I will showcase how linguistic information such as part-of-speech tags and dependency parses can help you identify interesting patterns or phrases in your documents and ultimately perform document classification or other information retrieval tasks.
In the second part of the talk, I will switch gears and explain how to fully customize your own NLP solution – ranging from training your own models to implementing a custom component or even defining your own model architecture. This functionality will be illustrated by implementing a custom relation extraction component within the spaCy framework, using our own Machine Learning library Thinc.
Finally, I will provide a quick overview of our new open-source workflow management system called ‘weasel’, which allows you to manage end-to-end data science workflows, such as orchestrating a project that involves data preprocessing, training, evaluation, packaging and serving your models. Using spaCy in combination with weasel ensures that data scientists have a smooth path from prototype to production for all their NLP solutions.
Bio: Sofie is a machine learning and NLP engineer who firmly believes in the power of data to transform decision making in industry. She has a Master in Computer Science (software engineering) and a PhD in Sciences (Bioinformatics), and more than 16 years of experience in Natural Language Processing and Machine Learning, including in the pharmaceutical industry and the food industry. In 2019, she joined Explosion to work on the open-source NLP library spaCy. She is currently leading the open-source team developing and maintaining spaCy, as well as various other open-source developer tools for data scientists.
Sofie Van Landeghem, PhD
Natural Language Processing & Machine Learning Expert | OxyKodit / Explosion