Training a Fast, Accurate & Scalable Natural Language Processing Pipeline
Training a Fast, Accurate & Scalable Natural Language Processing Pipeline


This talk explains how to build an end-to-end system that makes non-trivial deductions from natural language text, using only open source software. Infrastructure components include Kafka, Spark Streaming, Spark, and Elasticsearch. Data science components include spaCy, custom annotators, machine learned annotators, and deep learned ontologies. Source code will be made freely available.


David Talby has been building real-world big data analytics systems in healthcare, finance and e-commerce for over a decade. David has extensive experience in building and operating web-scale data science and business platforms, as well as building world-class, Agile, distributed teams. Prior to joining the startup world, he was with Microsoft’s Bing group, where he led business operations for Bing Shopping in the US and Europe. Earlier, he worked at Amazon both in Seattle and the UK, where he built and ran distributed teams that helped scale Amazon’s financial systems. David holds a PhD in computer science and master’s degrees in both computer science and business administration.

Privacy Settings
We use cookies to enhance your experience while using our website. If you are using our Services via a browser you can restrict, block or remove cookies through your web browser settings. We also use content and scripts from third parties that may use tracking technologies. You can selectively provide your consent below to allow such third party embeds. For complete information about the cookies we use, data we collect and how we process them, please check our Privacy Policy
Consent to display content from - Youtube
Consent to display content from - Vimeo
Google Maps
Consent to display content from - Google