spaCy: a customizable NLP toolkit designed for developers

Abstract: 

spaCy is an open-source library for advanced Natural Language Processing in Python. Central to its design is developer productivity – we want to empower developers by giving them the necessary tools to implement custom NLP solutions. On the one hand, this means providing them with pre-trained models and reasonable default settings so they can hit the ground running. On the other hand, spaCy provides a powerful configuration system that allows developers to easily plug in custom models or algorithms they’ve developed.

In this talk, I will first give an overview of the built-in functionality available in spaCy, using pretrained models. I will showcase how linguistic information such as part-of-speech tags and dependency parses can help you identify interesting patterns or phrases in your documents and ultimately perform document classification or other information retrieval tasks.

In the second part of the talk, I will switch gears and explain how to fully customize your own NLP solution – ranging from training your own models to implementing a custom component or even defining your own model architecture. This functionality will be illustrated by implementing a custom relation extraction component within the spaCy framework, using our own Machine Learning library Thinc.

Finally, I will provide a quick overview of our new open-source workflow management system called ‘weasel’, which allows you to manage end-to-end data science workflows, such as orchestrating a project that involves data preprocessing, training, evaluation, packaging and serving your models. Using spaCy in combination with weasel ensures that data scientists have a smooth path from prototype to production for all their NLP solutions.

Bio: 

Sofie is a machine learning and NLP engineer who firmly believes in the power of data to transform decision making in industry. She has a Master in Computer Science (software engineering) and a PhD in Sciences (Bioinformatics), and more than 16 years of experience in Natural Language Processing and Machine Learning, including in the pharmaceutical industry and the food industry. In 2019, she joined Explosion to work on the open-source NLP library spaCy. She is currently leading the open-source team developing and maintaining spaCy, as well as various other open-source developer tools for data scientists.

Open Data Science

 

 

 

Open Data Science
One Broadway
Cambridge, MA 02142
info@odsc.com

Privacy Settings
We use cookies to enhance your experience while using our website. If you are using our Services via a browser you can restrict, block or remove cookies through your web browser settings. We also use content and scripts from third parties that may use tracking technologies. You can selectively provide your consent below to allow such third party embeds. For complete information about the cookies we use, data we collect and how we process them, please check our Privacy Policy
Youtube
Consent to display content from Youtube
Vimeo
Consent to display content from Vimeo
Google Maps
Consent to display content from Google