Andriy Mulyar

Andriy Mulyar

Founder & CTO at Nomic AI

    Andriy is the co-founder and CTO of Nomic AI - a venture backed start-up that is on a mission to democratize access to artificial intelligence. Prior to Nomic, Andriy was an early engineer at RadAI where he trained multi-billion parameter LLMs to assist radiologists and a Ph.D. student at NYU's Courant Institute for Mathematical Sciences. He cares about making AI systems and the data they are trained on more accessible to everyone.

    All Sessions by Andriy Mulyar

    Day 3 04/25/2024
    11:00 am - 11:30 am

    Training an OpenAI Quality Text Embedding Model from Scratch

    <span class="etn-schedule-location"> <span class="firstfocus">LLMs</span> </span>

    Text embeddings are an integral component of modern NLP applications powering retrieval-augmented-generation (RAG) for LLMs and semantic search. High quality text embeddings models are closed source and access to them is gated via the API's of leading AI companies. This talk describes how Nomic AI trained nomic-embed-text-v1 - the first fully auditable open-data, open-weights and open-training code text embedding model that outperforms the performance of OpenAI Ada-002. You will learn how text embedding models are trained, the various training decisions that impact model capabilities and tips for successfully using them in your production applications.

    Open Data Science

     

     

     

    Open Data Science
    One Broadway
    Cambridge, MA 02142
    info@odsc.com

    Privacy Settings
    We use cookies to enhance your experience while using our website. If you are using our Services via a browser you can restrict, block or remove cookies through your web browser settings. We also use content and scripts from third parties that may use tracking technologies. You can selectively provide your consent below to allow such third party embeds. For complete information about the cookies we use, data we collect and how we process them, please check our Privacy Policy
    Youtube
    Consent to display content from - Youtube
    Vimeo
    Consent to display content from - Vimeo
    Google Maps
    Consent to display content from - Google