Founder & CTO at Nomic AI
Andriy is the co-founder and CTO of Nomic AI - a venture backed start-up that is on a mission to democratize access to artificial intelligence. Prior to Nomic, Andriy was an early engineer at RadAI where he trained multi-billion parameter LLMs to assist radiologists and a Ph.D. student at NYU's Courant Institute for Mathematical Sciences. He cares about making AI systems and the data they are trained on more accessible to everyone.
All Sessions by Andriy Mulyar
Training an OpenAI Quality Text Embedding Model from ScratchLLMs | Intermediate-Advanced
Text embeddings are an integral component of modern NLP applications powering retrieval-augmented-generation (RAG) for LLMs and semantic search. High quality text embeddings models are closed source and access to them is gated via the API's of leading AI companies. This talk describes how Nomic AI trained nomic-embed-text-v1 - the first fully auditable open-data, open-weights and open-training code text embedding model that outperforms the performance of OpenAI Ada-002. You will learn how text embedding models are trained, the various training decisions that impact model capabilities and tips for successfully using them in your production applications.