Abstract: spaCy is an open-source library for advanced Natural Language Processing in Python. Central to its design is developer productivity – we want to empower developers by giving them the necessary tools to implement custom NLP solutions. On the one hand, this means providing them with pre-trained models and reasonable default settings so they can hit the ground running. On the other hand, spaCy provides a powerful configuration system that allows developers to easily plug in custom models or algorithms they’ve developed.
In this talk, I will first give an overview of the built-in functionality available in spaCy, using pretrained supervised models. I will showcase how linguistic information such as part-of-speech tags and dependency parses can help you identify interesting patterns or phrases in your documents and ultimately perform document classification or other information retrieval tasks.
In the second part of the talk, I will switch gears and showcase how Large Language Models (LLMs) can be integrated into your NLP pipelines. Due to their impressive natural language capabilities, recent LLMs like GPT-4 are paving the way for fast prototyping of NLP applications in any business domain. Most practical use-cases however will benefit from a structured, pipeline approach in which LLMs can be complemented with supervised models or even rule-based approaches. I'll showcase how to build such pipelines for a realistic business application, using spaCy and its recently published extension 'spacy-llm'.
Finally, I will discuss how to manage different (and often conflicting) performance features such as accuracy, speed, memory usage, reliability, maintainability and customizability of your NLP solutions, and how you can transform a quick prototype into a robust production-ready solution.
Bio: Sofie is a machine learning and NLP engineer who firmly believes in the power of data to transform decision making in industry. She has a Master in Computer Science (software engineering) and a PhD in Sciences (Bioinformatics), and more than 16 years of experience in Natural Language Processing and Machine Learning, including in the pharmaceutical industry and the food industry. In 2019, she joined Explosion to work on the open-source NLP library spaCy. She is currently leading the open-source team developing and maintaining spaCy, as well as various other open-source developer tools for data scientists.