Matt Dzugan

Matt Dzugan

Director of Data at Muck Rack

    Matt Dzugan is the director of data at Muck Rack, the software platform enabling thousands of organizations including Google, Golin and Duolingo to build trust, tell their stories and demonstrate the unique value of earned media. In this role, he oversees teams of engineers furthering data insight and delivery architectures across Muck Rack’s various AI-powered data platforms. Additionally, he manages customer activity and workflow data to assist PR teams in harnessing data effectively to maximize outputs using AI. Before joining Muck Rack, Matt was the director of data science at project44, a supply chain technology company, where he dedicated two years to constructing and broadening the organization’s Data Science department. Prior to this role, he held the position of data science manager at Uptake Technologies and worked as a systems engineer at The Boeing Company. Matt holds Master’s and Bachelor’s Degrees in Electrical Engineering and Computer Science from Northwestern University. He lives in Chicago.

    All Sessions by Matt Dzugan

    Day 2 extra event 04/24/2024
    5:45 pm - 9:30 pm

    Roulette Lightning Talks

    The East 2024 Roulette Lightning Talks offer a unique twist on presentations, injecting a dose of surprise and fun into the conference, as speakers must be prepared to speak about any of the slides that will be on the presentation, regardless of whether it is their own slide or not.

    Day 3 04/25/2024
    11:00 am - 11:30 am

    Trial, Error, Triumph: Lessons Learned using LLMs for Creating Machine Learning Training Data

    <span class="etn-schedule-location"> <span class="firstfocus">Deep Learning</span> </span>

    We've all been in situations where we'd like to build a model but lack the labeled training data to do so. I plan to discuss how the advent of Large Language Models (LLMs) like GPT-4 has opened new avenues for generating training data. Traditionally, the creation of NLP datasets relied heavily on manual, crowdsourced handlabeling, often resorting to platforms like Mechanical Turk. This approach, while effective, presented significant challenges in terms of cost, time, and scalability. In this talk, I will share a comprehensive narrative of our journey from initial trials and errors to eventual triumphs in using LLMs for NLP data generation. The shift from manual to AI-assisted data creation marks a pivotal change in how we approach NLP model training. My team and I navigated through various challenges, experimenting with different strategies and learning valuable lessons along the way. I will discuss how we harnessed the power of LLMs to generate vast amounts of diverse, nuanced data, significantly reducing the time and cost compared to traditional methods. The talk will cover practical insights into fine-tuning these models for specific domains, ensuring data quality, and avoiding common pitfalls such as biases and overfitting. Moreover, I will highlight how LLMs can be creatively used to simulate real-world scenarios, providing richer and more contextually relevant training data. This not only improves the performance of traditional NLP models but also opens up possibilities for exploring new problem spaces within NLP. Attendees will leave with a deeper understanding of the potential and limitations of using LLMs in NLP data generation. They will gain actionable insights and strategies that can be applied in their own NLP projects, accelerating their journey from trial to triumph in the realm of AI-powered data science.

    Open Data Science




    Open Data Science
    One Broadway
    Cambridge, MA 02142

    Privacy Settings
    We use cookies to enhance your experience while using our website. If you are using our Services via a browser you can restrict, block or remove cookies through your web browser settings. We also use content and scripts from third parties that may use tracking technologies. You can selectively provide your consent below to allow such third party embeds. For complete information about the cookies we use, data we collect and how we process them, please check our Privacy Policy
    Consent to display content from - Youtube
    Consent to display content from - Vimeo
    Google Maps
    Consent to display content from - Google