Beyond Word2vec: Recent Developments in Document Embedding
Beyond Word2vec: Recent Developments in Document Embedding


It is easy to be amazed by they seemingly magical power of word2vec. But in real business use cases we rarely need to understand single words. So how do we apply the power of word2vec to phrases, sentences, paragraphs or entire documents? In this workshop we will go through various techniques of generating useful representations of documents of indeterminate length, and look at ways of comparing methods.

We will start with bag of words approaches and TFIDF. From there we will look at dimensionality reduction techniques like LSA or NMF. After that, we will look at word2vec and sense2vec and various ways to aggregate those word vectors, including summing, weighting, clustering, Gensim Doc2vec and developing parse tree representations. Finally we will look at RNN methods such as LSTMs using Keras. Along the way we will look at ways to evaluate each of these methods and discuss strengths and weaknesses.
WARNING: this workshop will run much smoother you download several large files beforehand, including a 3.6gb pre-trained word2vec


At Metis, Andrew has taught the fundamentals of Machine Learning and Data Science in a 3 month Bootcamp to over a 100 students and advised nearly 500 student projects.

Andrew came to Metis from LinkedIn, where he worked as a Data Scientist, on the Education, Skills and then the NLP teams. He is passionate about helping people make rational decisions and building cool data products.

Prior to that he worked on fraud modeling at IMVU (the lean startup) and studied applied physics at Cornell. He loves snowboarding, traveling, scotch and reading about all kinds of nerdy topics.

Privacy Settings
We use cookies to enhance your experience while using our website. If you are using our Services via a browser you can restrict, block or remove cookies through your web browser settings. We also use content and scripts from third parties that may use tracking technologies. You can selectively provide your consent below to allow such third party embeds. For complete information about the cookies we use, data we collect and how we process them, please check our Privacy Policy
Consent to display content from - Youtube
Consent to display content from - Vimeo
Google Maps
Consent to display content from - Google