Analyzing Unstructured Data at 10B Scale with a Vector Database


Powered by the popularity of ChatGPT, Llama2, and other LLMs, we've seen a huge surge in interest for vector databases through 2023 and 2024. Vector databases are commonly used to connect relevant documents with LLMs, through a process called retrieval augmented generation (RAG). RAG has seen widespread adoption, from single-person startups to Fortune 500 companies.

Despite the popularity of vector databases for LLMs, they are more broadly applicable for a variety of different types of unstructured data, i.e. any type of data that does not conform to a predefined data model (such as text, images, video, molecules, and graphs). In this talk, we'll discuss challenges associated with embedding and analyzing unstructured data at 10B scale, diving in the vector database features that are important and the pitfalls to watch out for. We'll also discuss the myriad of applications for vector search beyond RAG.


Frank Liu is Head of AI & ML at Zilliz, with a decade of industry experience in machine learning and hardware engineering. He presents at major industry events like the Open Source Summit and writes tech content for leading publications such as Towards Data Science, The New Stack, the Sequence, and DZone. Frank holds MS and BS degrees in Electrical Engineering from Stanford University.

Open Data Science




Open Data Science
One Broadway
Cambridge, MA 02142

Privacy Settings
We use cookies to enhance your experience while using our website. If you are using our Services via a browser you can restrict, block or remove cookies through your web browser settings. We also use content and scripts from third parties that may use tracking technologies. You can selectively provide your consent below to allow such third party embeds. For complete information about the cookies we use, data we collect and how we process them, please check our Privacy Policy
Consent to display content from - Youtube
Consent to display content from - Vimeo
Google Maps
Consent to display content from - Google