Building a Semantic Search Engine

Abstract: 

Most production information retrieval systems are built on top of Lucene which use tf-idf and BM25.

Current state of the art techniques utilize embeddings for retrieval. This workshop aims to demystify what is involved in building such a system.

This tutorial is broken into four sections :

1) Intro
- Search retrieval concepts: approaches, evaluation metrics etc.
- Overview of common production retrieval stack
- Environment Setup (10 mins)
- Walk over the notebooks and environment setup

2) Non deep learning based retrieval
- Overview of tf-idf and BM-25
- How production systems use ElasticSearch / SOLR
- Hands-on lab experience:
-- Indexing some documents with PySolr
-- Reviewing Retrieval Results from tf-idf

3) Embeddings and Vector Similarity Overview (60 min)
- Brief review of common embedding techniques: word2vec, BERT
- Briefly talk about how to train own custom embeddings
- Vector Similarity and Evaluation metrics
- Hands-on lab experience:
-- Use a pre-trained BERT embedding from HuggingFace transformers library
-- Compare results of Non deep learning and Vector Similarity


4) Serving Vector Similarity using Approximate Nearest Neighbors
- Why Vector Similarity needs ANN
- Review common Approximate Nearest Neighbors techniques in FAISS
- Overview of managed services: VertexAI, Pinecone, Milvus
- Hands-on lab experience:
-- Building FAISS Index
-- Load Index into Milvus
-- Compare Recall vs latency tradeoff


By the end of the session, a user will be able to build a production information retrieval system leveraging Embeddings and Vector Similarity using ANN.
This will allow users to utilize state of the art technologies / techniques on top of the traditional information retrieval systems.

Bio: 

Nidhin is an Machine Learning Engineer at Walmart where he works on Walmart's E-commerce Search Engine. Before Walmart, he worked for two startups.

Open Data Science

 

 

 

Open Data Science
One Broadway
Cambridge, MA 02142
info@odsc.com

Privacy Settings
We use cookies to enhance your experience while using our website. If you are using our Services via a browser you can restrict, block or remove cookies through your web browser settings. We also use content and scripts from third parties that may use tracking technologies. You can selectively provide your consent below to allow such third party embeds. For complete information about the cookies we use, data we collect and how we process them, please check our Privacy Policy
Youtube
Consent to display content from - Youtube
Vimeo
Consent to display content from - Vimeo
Google Maps
Consent to display content from - Google