TRAINING & WORKSHOPS
TRAINING & WORKSHOPS
Learn the latest data science concepts, tools, and techniques from the best. Forge a connection with these rockstars from industry and academia, who are passionate about molding the next generation of data scientists.

Featured World-Class Data Science Experts

Oliver Zeigermann
Oliver Zeigermann has been developing software with different approaches and programming languages for more than 3 decades. In the past decade, he has been focusing on Machine Learning and its interactions with humans.
MLOps: Monitoring and Managing Drift(Training)

Matt Harrison
Matt Harrison has been using Python since 2000. He runs MetaSnake, a Python and Data Science consultancy and corporate training shop. In the past, he has worked across the domains of search, build management and testing, business intelligence, and storage.
He has presented and taught tutorials at conferences such as Strata, SciPy, SCALE, PyCON, and OSCON as well as local user conferences.

Hao Zhang, PhD
Hao is currently a postdoctoral researcher at the Sky Lab, UC Berkeley, working with Prof. Ion Stoica. He is recently working on the Alpa project and the Sky project, aiming at democratizing large models like GPT-3. He is an Assistant Professor at Halıcıoğlu Data Science Institute and Department of Computer Science and Engineering (affiliate) at UC San Diego in Fall 2023.
He research is primarily focused on large-scale distributed ML in the joint context of ML and systems, concerning performance, usability, cost, and privacy. His work spans across distributed ML algorithms, large models, parallelisms, performance optimizations, system architectures, ML privacy, and AutoML, with applications in computer vision, natural language processing, and healthcare.

Jeff Tao
Jeff Tao is the founder and CEO of TDengine. He has a background as a technologist and serial entrepreneur, having previously conducted research and development on mobile Internet at Motorola and 3Com and established two successful tech startups. Foreseeing the explosive growth of time-series data generated by machines and sensors now taking place, he founded TDengine in May 2017 to develop a high-performance time-series database purpose-built for modern IoT and IIoT businesses.
What is a Time-series Database and Why do I Need One?(Workshop)

Alison Cossette
Alison Cossette is a dynamic Data Science Strategist, Educator, and Podcast Host. As a Developer Advocate at Neo4j specializing in Graph Data Science, she brings a wealth of expertise to the field. With her strong technical background and exceptional communication skills, Alison bridges the gap between complex data science concepts and practical applications.
Alison’s passion for responsible AI shines through in her work. She actively promotes ethical and transparent AI practices and believes in the transformative potential of responsible AI for industries and society. Through her engagements with industry professionals, policymakers, and the public, she advocates for the responsible development and deployment of AI technologies.
Alison’s academic journey includes pursuing her Master of Science in Data Science program, specializing in Artificial Intelligence, at Northwestern University and research with Stanford University Human-Computer Interaction Crowd Research Collective. Alison combines academic knowledge with real-world experience. She leverages this expertise to educate and empower individuals and organizations in the field of data science.
Overall, Alison Cossette’s multifaceted background, commitment to responsible AI, and expertise in data science make her a respected figure in the field. Through her role as a Developer Advocate at Neo4j and her podcast, she continues to drive innovation, education, and responsible practices in the exciting realm of data science and AI.
Bridging the Gap: Light Code Solutions to Uniting Social Science and Modern Knowledge Graphs(Workshop)
From Nodes to Natural Language: Grounding LLMs with Graphs & Graph Data Science(Talk)

Dr. Andre Franca
Andre joined causaLens from Goldman Sachs, where he was an executive director in the Model Risk Management group in Hong Kong and Frankfurt. Today he is working with industry leading, global organisations to apply cutting edge Causal AI research in production level solutions that empower individuals and teams to make better decisions. Andre received his PhD in theoretical physics from the University of Munich, where he studied the interplay between quantum mechanics and general relativity in black-holes.
Causal AI: from Data to Action(Workshop)

Geeta Shankar
Geeta Shankar is a software engineer who specializes in leveraging data for business success. With expertise in computer science, data science, machine learning, and artificial intelligence, she stays updated with the latest data-driven innovations. Her Indian classical music background has taught her the value of sharp thinking, spontaneity, and connecting with diverse individuals. Geeta uses these skills to translate complex data into meaningful insights that enhance performance and customer experiences.

Rajiv Shah, PhD
Rajiv Shah is a machine learning engineer at Hugging Face who focuses on enabling enterprise teams to succeed with AI. Rajiv is a leading expert in the practical application of AI. Previously, he led data science enablement efforts across hundreds of data scientists at DataRobot. He was also a part of data science teams at Snorkel AI, Caterpillar, and State Farm. Rajiv is a widely recognized speaker on AI, published over 20 research papers, and received over 20 patents, including sports analytics, deep learning, and interpretability. Rajiv holds a PhD in Communications and a Juris Doctor from the University of Illinois at Urbana Champaign. While earning his degrees, he received a fellowship in Digital Government from the John F. Kennedy School of Government at Harvard University. He also has a large following on AI-related short videos on Tik Tok and Instagram at @rajistics.
Evaluation Techniques for Large Language Models(Tutorial)

Brian Lucena, PhD
Brian Lucena is Principal at Numeristical, where he advises companies of all sizes on how to apply modern machine learning techniques to solve real-world problems with data. He is the creator of three Python packages: StructureBoost, ML-Insights, and SplineCalib. In previous roles he has served as Principal Data Scientist at Clover Health, Senior VP of Analytics at PCCI, and Chief Mathematician at Guardian Analytics. He has taught at numerous institutions including UC-Berkeley, Brown, USF, and the Metis Data Science Bootcamp.
Uncertainty Quantification: Approaches and Methods(Training)

Jeffrey Yau, PhD
Jeffrey Yau is currently Chief Data & A.I. Officer at Fanatics Collectibles. Most recently, he served as Global Head of Data Science, Analytics & Engineering at Amazon Music where he oversaw multiple teams who developed both insights-packed analytics and end-to-end statistical and machine learning systems. Prior to Amazon, Jeffrey worked at WalmartLabs as the VP of Data Science & Engineering where he led the team responsible for powering Walmart store mobile apps and the entire store finance system. Further, his team created end-to-end machine learning systems for key business initiatives and had a multi-billion dollar impact annually on Walmart U.S.
Over the years, he has held various senior level positions in quantitative finance at global investment management firm AllianceBernstein, consulting firm Data Science at Silicon Valley Data Science, multinational financial services company Charles Schwab Corporation, and the world’s leading professional services firm KPMG. He began his career as a tenure-track Assistant Professor of Economics at Virginia Tech, and he was an adjunct professor at UC Berkeley, Cornell, and NYU, teaching machine learning and advanced statistical modeling for finance and business.

Vincent Granville
Vincent Granville is a pioneering GenAI scientist and machine learning expert, co-founder of Data Science Central (acquired by a publicly traded company in 2020), Chief AI Scientist at MLTechniques.com, former VC-funded executive, author and patent owner — one related to LLM. Vincent’s past corporate experience includes Visa, Wells Fargo, eBay, NBC, Microsoft, and CNET.
Vincent is also a former post-doc at Cambridge University, and the National Institute of Statistical Sciences (NISS). He published in Journal of Number Theory, Journal of the Royal Statistical Society (Series B), and IEEE Transactions on Pattern Analysis and Machine Intelligence. He is the author of multiple books, including “Synthetic Data and Generative AI” (Elsevier, 2024). Vincent lives in Washington state, and enjoys doing research on stochastic processes, dynamical systems, experimental math and probabilistic number theory. He recently launched a GenAI certification program, offering state-of-the-art, enterprise grade projects to participants.
GenAI Breakthrough: Fast, High Quality Tabular Data Synthetization(Tutorial)

Jerry Liu
Jerry is the co-founder/CEO of LlamaIndex, an open-source tool that provides a central data management/query interface for your LLM application. Before this, he has spent his career at the intersection of ML, research, and startups. He led the ML monitoring team at Robust Intelligence, did self-driving AI research at Uber ATG, and worked on recommendation systems at Quora. He graduated from Princeton in 2017 with a degree in CS.
Building LLM-powered Knowledge Workers over your Data with LlamaIndex(Workshop)

Parul Pandey
Parul Pandey has a background in Electrical Engineering and currently works as a Principal Data Scientist at H2O.ai. Prior to this, she was working as a Machine Learning Engineer at Weights & Biases. Parul is one of the co-authors of Machine Learning for High-Risk Applications book, which focuses on the responsible implementation of AI. She is also a Kaggle Grandmaster in the notebooks category and was one of Linkedin’s Top Voices in the Software Development category in 2019. Parul has written multiple articles focused on Data Science and Software development for various publications and mentors, speaks, and delivers workshops on topics related to Responsible AI.
Machine Learning for High-Risk Applications – Techniques for Responsible AI(Tutorial)

Sinan Ozdemir
Sinan Ozdemir is a mathematician, data scientist, NLP expert, lecturer, and accomplished author. He is currently applying my extensive knowledge and experience in AI and Large Language Models (LLMs) as the founder and CTO of LoopGenius, transforming the way entrepreneurs and startups market their products and services.
Simultaneously, he is providing advisory services in AI and LLMs to Tola Capital, an innovative investment firm. He has also worked as an AI author for Addison Wesley and Pearson, crafting comprehensive resources that help professionals navigate the complex field of AI and LLMs.
Previously, he served as the Director of Data Science at Directly, where my work significantly influenced their strategic direction. As an official member of the Forbes Technology Council from 2017 to 2021, he shared his insights on AI, machine learning, NLP, and emerging technologies-related business processes.
He holds a B.A. and an M.A. in Pure Mathematics (Algebraic Geometry) from The Johns Hopkins University, and he is an alumnus of the Y Combinator program. Sinan actively contribute to society through various volunteering activities.
Sinan’s skill set is strongly endorsed by professionals from various sectors and includes data analysis, Python, statistics, AI, NLP, theoretical mathematics, data science, function analysis, data mining, algorithm development, machine learning, game-theoretic modeling, and various programming languages.
Aligning Open-source LLMs Using Reinforcement Learning from Feedback(Workshop)

Martin Musiol
Long before the buzz surrounding generative AI, Martin Musiol was already advocating for its significance in 2015. Since then, he has been a frequent speaker at conferences, podcasts, and panel discussions, addressing the technological advancements, practical applications, and ethical considerations of generative AI. Martin Musiol is a founder of generativeAI.net, a lecturer on AI to over 3000 students, and publisher of the newsletter ‘Generative AI: Short & Sweet’. As the lead for GenAI Projects in Europe at Infosys Consulting (previously at IBM), Martin Musiol helps companies globally harness the power of generative AI to gain a competitive advantage. -> https://www.linkedin.com/in/martinmusiol1/ and his webpage: https://generativeai.net/

Sandeep Singh
Sandeep Singh is a leader in applied AI and computer vision in Silicon Valley’s mapping industry, and he is at the forefront of developing cutting-edge technology to capture, analyze and understand satellite imagery, visual and location data. With a deep expertise in computer vision algorithms, machine learning and image processing and applied ethics, Sandeep is responsible for creating innovative solutions that enable mapping and navigation software to accurately and efficiently identify and interpret features to remove inefficiencies of logistics and mapping solutions. His work includes developing sophisticated image recognition systems, building 3D mapping models, and optimizing visual data processing pipelines for use in logistics, telecommunications and autonomous vehicles and other mapping applications. With a keen eye for detail and a passion for pushing the boundaries of what’s possible with AI and computer vision, Sandeep’s leadership is driving the future of applied AI forward.
Stable Diffusion: A New Frontier for Text-to-Image Paradigm(Workshop)

Mark Saroufim
Mark Saroufim is an engineer on PyTorch at Meta working on open infrastructure, compilers and community. Mark is fond of hot takes and shares them on his blog https://marksaroufim.substack.com/. Prior to Meta, Mark worked as a Machine Learning engineer at Graphcore, Microsoft and yuri.ai.

Jonas Mueller
Jonas Mueller is Chief Scientist and Co-Founder at Cleanlab, a software company providing data-centric AI tools to efficiently improve ML datasets. Previously, he was a senior scientist at Amazon Web Services developing AutoML and Deep Learning algorithms which now power ML applications at hundreds of the world’s largest companies. In 2018, he completed his PhD in Machine Learning at MIT, also doing research in NLP, Statistics, and Computational Biology.
Jonas has published over 30 papers in top ML and Data Science venues (NeurIPS, ICML, ICLR, AAAI, JASA, Annals of Statistics, etc). This research has been featured in Wired, VentureBeat, Technology Review, World Economic Forum, and other media. He has also contributed open-source software, including the fastest-growing open-source libraries for AutoML (https://github.com/awslabs/autogluon) and Data-Centric AI (https://github.com/cleanlab/cleanlab).
How to Practice Data-Centric AI and Have AI improve its Own Dataset(Tutorial)

Suhas Pai
Suhas Pai is a NLP researcher and co-founder/CTO at Bedrock AI, a Toronto based startup. At Bedrock AI, he works on text ranking, representation learning, and productionizing LLMs. He is also currently writing a book on Designing Large Language Model Applications with O’Reilly Media. Suhas has been active in the ML community, being the Chair of the TMLS (Toronto Machine Learning Summit) conference since 2021 and also NLP lead at Aggregate Intellect (AISC). He was also co-lead of the Privacy working group at Big Science, as part of the BLOOM project.
Beyond Demos and Prototypes: How to Build Production-Ready Applications Using Open-Source LLMs(Workshop)

Fabiana Clemente
Fabiana Clemente is the co-founder and CDO of YData, combining Data Understanding, Causality, and Privacy as her main fields of work and research, with the mission to make data actionable for organizations. Passionate for data, Fabiana has vast experience leading data science teams in startups and multinational companies. Host of “When Machine Learning meets privacy” podcast and a guest speaker at Datacast and Privacy Please, the previous WebSummit speaker, was recently awarded “Founder of the Year” by the South Europe Startup Awards.
Missing Data: A Synthetic Data Approach for Missing Data Imputation(Workshop)

Amy Hodler
Amy Hodler is an evangelist for graph analytics and responsible AI. She’s the co-author of O’Reilly books on Graph Algorithms and Knowledge Graphs as well as a contributor to the Routledge book, Massive Graph Analytics and Bloomsbury book, AI on Trial. Amy has decades of experience in emerging tech at companies such as Microsoft, Hewlett-Packard (HP), Hitachi IoT, Neo4j, Cray, and RelationalAI. Amy is the founder of GraphGeeks.org promoting connections everywhere.

Mike Taylor
Mike is a data-driven, technical marketer who built a 50 person marketing agency (Ladder), and 300k people have taken his online courses (LinkedIn, Udemy, Vexpower). He now works freelance on generative AI projects, and is writing a book on Prompt Engineering for O’Reilly Media.

Shashank Prasanna
Shashank is an engineer, educator and doodler. He writes and talks about machine learning, specialized machine learning hardware (AI Accelerators) and AI Infrastructure in the cloud. He previously worked at Meta, AWS, NVIDIA, MathWorks (MATLAB) and Oracle in developer relations and marketing, product management, and software development roles and hold an M.S. in electrical engineering.

Thomas Nield
Thomas Nield is the founder of Nield Consulting Group and Yawman Flight, as well as an instructor at University of Southern California. He enjoys making technical content relatable and relevant to those unfamiliar or intimidated by it. Thomas regularly teaches classes on data analysis, machine learning, mathematical optimization, and practical artificial intelligence. At USC he teaches AI System Safety, developing systematic approaches for identifying AI-related hazards in aviation and ground vehicles. He’s authored three books, including Essential Math for Data Science (O’Reilly) and Getting Started with SQL (O’Reilly)
He is also the founder and inventor of Yawman Flight, a company developing universal handheld flight controls for flight simulation and unmanned aerial vehicles.
Introduction to Math for Data Science(Bootcamp)

Wes Madrigal
Wes is a machine learning expert with over a decade of experience delivering business value with AI. Wes’s experience spans multiple industries, but always with an MLOps focus. His recent areas of focus and interest are graphs, distributed computing, and scalable feature engineering pipelines.
Using Graphs for Large Feature Engineering Pipelines(Workshop)

Michelle Yi
Michelle is a technology leader that specializes in machine learning and cloud computing. She has 15 years of experience in the technology industry, contributed to the original IBM Watson showcased on Jeopardy, and enjoys building and leading teams that develop and deploy AI solutions to solve real-world problems. Michelle is passionate about diversity, STEM education/careers for our minority communities, and serves both on the board of Women in Data and as an avid volunteer for Girls Who Code.
Building Generative AI Applications: An LLM Case Study(Talk)

James Phoenix
James is a full-stack engineer that specialises in automating marketing and business processes with AI based solutions.

Serg Masis
Serg Masís has been at the confluence of the internet, application development, and analytics for the last two decades. He’s an Agronomic Data Scientist at Syngenta, a leading agribusiness company with a mission to improve global food security. Before that role, he co-founded a search engine startup, incubated by Harvard Innovation Labs, that combined the power of cloud computing and machine learning with principles in decision-making science to expose users to new places and events efficiently. Whether concerning leisure activities, plant diseases, or customer lifetime value, Serg is passionate about providing the often-missing link between data and decision-making. He wrote the bestselling book “Interpretable Machine Learning with Python” and is currently working on a new book titled “DIY AI” with do-it-yourself projects for AI hobbyists and practitioners alike.

Ayush Thakur
Ayush Thakur is a MLE at Weights and Biases and Google Developer Expert in Machine Learning. He is interested in everything computer vision and representation learning. For the past 8 months he’s been working with LLMs and have covered RLHF and how and what of building LLM-based systems.
Deep Dive into End-to-end MLOps using Weights and Biases(Training)

Gwendolyn D. Stripling, Ph.D.
Gwendolyn Stripling, Ph.D., is an Artificial Intelligence and Machine Learning Content Developer at Google Cloud. Stripling is author of the widely popular YouTube video, “Introduction to Generative AI” and of the O’Reilly Media book “Low-Code AI: A Practical Project Driven Approach to Machine Learning”. They are also the author of the LinkedIn Learning video “Introduction to Neural Networks”. Stripling is an Adjunct Professor and member of Golden Gate University’s Masters in Business Analytics Advisory Board. Stripling enjoys speaking on AI/ML, having presented at Dominican University of California’s Barowsky School of Business Analytics, Golden Gate University’s Ageno School of Business Analytics, and numerous Tech conferences.
No-Code and Low-Code AI: A Practical Project Driven Approach to ML(Tutorial)

Andrew Dai
Andrew Dai did his PhD at the University of Edinburgh before joining Google Brain 9 years ago in 2014 where he did research on language models, story generation and conversational agents and products including SmartReply. He moved to Google Health in 2017 to research deep learning for medical records. He then returned to continue research at Google Brain (now Google Deepmind) in 2020 and since then has co-led the development and training of LLMs including PaLM 2 and GLaM. Andrew also is a lead for Google SGE modelling, Gemini and data research and is excited by the new abilities we see from LLMs.
A Background to LLMs and Intro to PaLM 2: A Smaller, Faster and More Capable LLM(Tutorial)

Bob Foreman
Bob has worked with the HPCC Systems technology platform and the ECL programming language for over a decade and has been a technical trainer for over 30 years. He is the developer and designer of the HPCC Systems Online Training Courses and is the Senior Instructor for all classroom and remote based training.
Data for Social Good – Find Your Paradise!(Workshop)
HPCC Systems – The Kit and Kaboodle for Big Data and Data Science(Solution Showcase)
Abstract:
Learn why the truly open source HPCC Systems platform is better at Big Data and offers an end-to-end solution for Developers and Data Scientists. Learn how ECL can empower you to build powerful data queries with ease. HPCC Systems, a comprehensive and dedicated data lake platform makes combining different types of data easier and faster than competing platforms — even data stored in massive, mixed schema data lakes — and it scales very quickly as your data needs grow. Topics include HPCC Architecture, Embedded Languages and external datastores, Machine Learning Library, Visualization, Application Security and more.

Ramon Perez
Ramon is a data scientist, researcher, and educator currently working in the Developer Relations team at Seldon in London. Prior to joining Seldon, he worked as a freelance data professional and as a Senior Product Developer at Decoded, where he created custom data science tools, workshops, and training programs for clients in various industries. Before freelancing, Ramon wore different research hats in the areas of entrepreneurship, strategy, consumer behavior, and development economics in industry and academia. Outside of work, he enjoys giving talks and technical workshops and has participated in several conferences and meetup events. In his free time, you will most likely find him traveling to new places, mountain biking, or both.
Architecting Data: A Deep Dive Into the World of Synthetic Data(Training)

Valentina Alto
Valentina is a Data Science MSc graduate and Cloud Specialist at Microsoft, focusing on Analytics and AI workloads within the manufacturing and pharmaceutical industry since 2022. She has been working on customers’ digital transformations, designing cloud architecture and modern data platforms, including IoT, real-time analytics, Machine Learning, and Generative AI. She is also a tech author, contributing articles on machine learning, AI, and statistics, and recently published a book on Generative AI and Large Language Models.
In her free time, she loves hiking and climbing around the beautiful Italian mountains, running, and enjoying a good book with a cup of coffee.
The AI Paradigm Shift: Under the Hood of a Large Language Models(Workshop)

Philip Wauters
Philip Wauters is Customer Success Manager and Value engineer at Tangent Works working on practical applications of time series machine learning at customers from various industries such as Siemens, BASF, Borealis and Volkswagen. With a commercial background and experience with data engineering, analysis and data science his goal is to find and extract the business value in the enormous amounts of time-series data that exists at companies today.
Learn how to Efficiently Build and Operationalize Time Series Models in 2023(Workshop)
The Tangent Information Modeler, time series modeling reinvented(Solution Showcase)
Abstract:
Modeling time series data is difficult due to its large quantities and constantly evolving nature. Existing techniques have limitations in scalability, agility, explainability, and accuracy. Despite 50 years of research, current techniques often fall short when applied to time series data. The Tangent Information Modeler (TIM) offers a game-changing approach with efficient and effective feature engineering based on Information Geometry. This multivariate modeling co-pilot can handle a wider range of time series use cases with award-winning results and incredible performance.
During this demo session we will showcase how best-in-class and very transparent time series models can be built with just one iteration through the data. We will cover several concrete use cases for advanced time series forecasting, anomaly detection and root cause analysis.

Greg Loughnane
Dr. Greg Loughnane is the Founder & CEO of AI Makerspace, where he serves as lead instructor for their LLM Ops: LLMs in Production course. Since 2021 he has built and led industry-leading Machine Learning & AI bootcamp programs. Previously, he has worked as an AI product manager, a university professor teaching AI, an AI consultant and startup advisor, and ML researcher. He loves trail running and is based in Dayton, Ohio.

Chris Alexiuk
Chris Alexiuk, is the Head of LLMs at AI Makerspace, where he serves as a programming instructor, curriculum developer, and thought leader for their flagship LLM Ops: LLMs in Production course. During the day, he’s a Founding Machine Learning Engineer at Ox. He is also a solo YouTube creator, Dungeons & Dragons enthusiast, and is based in Toronto, Canada.

Krishnaram Kenthapadi
Krishnaram Kenthapadi is the Chief AI Officer & Chief Scientist of Fiddler AI, an enterprise startup building a responsible AI and ML monitoring platform. Previously, he was a Principal Scientist at Amazon AWS AI, where he led the fairness, explainability, privacy, and model understanding initiatives in the Amazon AI platform. Prior to joining Amazon, he led similar efforts at the LinkedIn AI team, and served as LinkedIn’s representative in Microsoft’s AI and Ethics in Engineering and Research (AETHER) Advisory Board. Previously, he was a Researcher at Microsoft Research Silicon Valley Lab. Krishnaram received his Ph.D. in Computer Science from Stanford University in 2006. He serves regularly on the senior program committees of FAccT, KDD, WWW, WSDM, and related conferences, and co-chaired the 2014 ACM Symposium on Computing for Development. His work has been recognized through awards at NAACL, WWW, SODA, CIKM, ICML AutoML workshop, and Microsoft’s AI/ML conference (MLADS). He has published 50+ papers, with 7000+ citations and filed 150+ patents (70 granted). He has presented tutorials on privacy, fairness, explainable AI, model monitoring, responsible AI, and generative AI at forums such as ICML, KDD, WSDM, WWW, FAccT, and AAAI, given several invited industry talks, and instructed a course on responsible AI at Stanford.
Deploying Trustworthy Generative AI(Tutorial)

Amit Sangani
Amit Sangani is the Director of Partner Engineering leading the Applied AI Platforms team at Meta. Amit has been with Meta for 8+ years and manages developer-facing engineering teams working on Gen AI platforms such as Llama 2 and PyTorch. Amit’s mission is to democratize AI and increase the adoption of these platforms by making it easier for developers to integrate them into their products and spur innovation and increased productivity.
Building Using Llama 2(Workshop)

Vino Duraisamy
Vino is a Developer Advocate for Snowflake. She started as a software engineer at NetApp, and worked on data management applications for NetApp data centers when on-prem data centers were still a cool thing. She then hopped onto the cloud and big data world and landed at the data teams of Nike and Apple. There she worked mainly on batch processing workloads as a data engineer, built custom NLP models as an ML engineer and even touched upon MLOps a bit for model deployments. When she is not working with data, you can find her doing yoga or strolling the golden gate park and ocean beach.
The Rise of a Full Stack Data Scientist: Powered by Python(Workshop)

Sheamus McGovern
Sheamus McGovern is the founder of ODSC (The Open Data Science Conference). He is also a software architect, data engineer, and AI expert. He started his career in finance by building stock and bond trading systems and risk assessment platforms and has worked for numerous financial institutions and quant hedge funds. Over the last decade, Sheamus has consulted with dozens of companies and startups to build leading-edge data-driven applications in finance, healthcare, eCommerce, and venture capital. He holds degrees from Northeastern University, Boston University, Harvard University, and a CQF in Quantitative Finance.
Data Wrangling with Python(Bootcamp)

Fabio Buso
Fabio Buso is a co-founder and VP of Engineering at Hopsworks, leading the Feature Store development team. Fabio holds a master’s degree in Cloud Computing and Services with a focus on data intensive applications.
Personalizing LLMs with a Feature Store(Workshop)
More Instructors Coming Soon
Past Training & Workshop Sessions
Half-Day Training: MLOps: Monitoring and Managing Drift
Workshop: What is a Time-series Database and Why do I Need One?
Workshop: Causal AI: from Data to Action
Workshop: Building LLM-powered Knowledge Workers over your Data with LlamaIndex
Workshop: Aligning Open-source LLMs Using Reinforcement Learning from Feedback
Workshop: Stable Diffusion: A New Frontier for Text-to-Image Paradigm
Tutorial: Evaluation Techniques for Large Language Models
Workshop: Idiomatic Pandas
Tutorial: How to Practice Data-Centric AI and Have AI improve its Own Dataset
Workshop: Overview of Mojo🔥: Usability of Python, Performance of C
Tutorial: Machine Learning for High-Risk Applications – Techniques for Responsible AIO
Workshop: Using Graphs for Large Feature Engineering Pipelines
Tutorial: Automating Business Processes Using LangChain
Tutorial: No-Code and Low-Code AI: A Practical Project Driven Approach to ML
Workshop: Data for Social Good – Find Your Paradise!
Workshop: The AI Paradigm Shift: Under the Hood of a Large Language Models
Workshop: Learn how to Efficiently Build and Operationalize Time Series Models in 2023
Tutorial: Deploying Trustworthy Generative AI
Workshop: Building Using Llama 2
Half-Day Training: Introduction to Math for Data Science
Workshop: Personalizing LLMs with a Feature Store
Workshop: Bridging the Gap: Light Code Solutions to Uniting Social Science and Modern Knowledge Graphs
Workshop: Anomaly Detection for CRM Production Data
Half-Day Training: Generative AI, Autonomous AI Agents, and AGI – How new Advancements in AI will Improve the Products we Build
Tutorial: Massively Speed-Up your Learning Algorithm, with Stochastic Thinning
Workshop: Machine Learning with XGBoost
Half-Day Training: Uncertainty Quantification: Approaches and Methods
Tutorial: Machine Learning Has Become Necromancy
Workshop: Beyond Demos and Prototypes: How to Build Production-Ready Applications Using Open-Source LLMs
Tutorial: Prompt Optimization with GPT-4 and Langchain
Workshop: Missing Data: A Synthetic Data Approach for Missing Data Imputation
Workshop: Graphs: The Next Frontier of GenAI Explainability
Workshop: Facial Recognition from Scratch with Python and JS
Tutorial: A Background to LLMs and Intro to PaLM 2: A Smaller, Faster and More Capable LLM
Half-day Training: Architecting Data: A Deep Dive Into the World of Synthetic Data
Workshop: The Rise of a Full Stack Data Scientist: Powered by Python
Half-day Training: Retrieval Augmented Generation (RAG) 101: Building an Open-Source “ChatGPT for Your Data” with Llama 2, LangChain, and Pinecone
Half-day Training: Statistic for Data Science
Machine Learning
Meta-learning for Machine Learning
Self Supervised learning; new techniques
Federated Learning for Data Privacy
Explainable AI and Bias in machine learning
Machine Learning at Scale using Apache Spark
Safety & Robustness in Machine Learning Modeling
Semi-supervised learning
Causal Inference with Machine Learning
Deep Learning
Deep Reinforcement learning
Deep Learning with PyTorch & Tensorflow
Deep Learning Deep Dive
Computer Vision 1/2 Day Training
Deep Learning with Keras
Introduction to Deep learning
Deepfakes Tutorial
Graph Representation Learning
NLP & LLMs and Generative AI
Pretraining and Fine tuning LLMS
Prompt Engineering
Self Supervised learning; new techniques
LLMS: LangChain and AI Agents
NLP with LSTMs (Deep learning)
Transfer Learning in NLP
Introduction to NLP and Topic Modeling
State-of-the-Art NLP with PyTorch and Tensorflow
Semi-supervised learning
Hugging Face Transformer Library Workshop
Applications of NLP; Sentiment Analysis, Dialog Systems, and Semantic Search
ADDITIONAL TUTORIALS & WORKSHOPS
Machine Learning for Cyber Security
Real-time Streaming Analytics
MLOps and Machine Learning Pipelines
Introduction to Machine Learning Using scikit-learn
Auto Machine Learning (AutoML)
Distributed Machine Learning
Introduction to Data Analysis with Python Pandas
Machine Learning Workflow with Kubeflow & Kubernetes
Challenges in Deployable Generative AI
Foundational models and LLMs
Reinforcement Learning from Human Feedback (RLHF)
Open source LLM Chat Models
Need A Refresher?
Check out our pre-conference warmup workshops. Free to training pass holders (gold pass and above)






Hosted on Ai+ Training and included FREE as part of your ODSC Mini-Bootcamp Pass.
Pre-Bootcamp Workshop Dates *
Pre-Bootcamp Warmup Workshops are available both live and on-demand (post-date) * schedule is subject to change
Data Primer – available on-demand;
SQL – available on-demand;
Programming Primer Course with Python – available on-demand;
AI Primer – Thursday, October 5th, 2023
Data Wrangling with Python – Thursday, October 19th, 2023
LLMs, Prompt Engineering, & Gen AI – Data to be Announced
Ai+ is the only hands-on training platform solely developed for AI practitioners. Keep training with the top names in the industry.
Need More Reason To Sign Up?
ODSC Training Includes
Opportunities to form working relationships with some of the world’s top data scientists.
Access to 40+ training sessions and 70+ workshops.
Hands-on experience with the latest frameworks and breakthroughs in data science.
Get Certified – Showcase your new skill sets with certificate courses from ODSC and Ai+. Ai+ Training Certification Courses (Included in Bootcamp/VIP Passes)
Affordable training–equivalent training at other conferences costs much more.
Professionally prepared learning materials, custom- tailored to each course.
Opportunities to connect with other ambitious, like-minded data scientists.
ODSC Newsletter
Stay current with the latest news and updates in open source data science.
In addition, we’ll inform you about our many upcoming Virtual and in person events in Boston, NYC, Sao Paulo, San Francisco, and London.
And keep a lookout for special discount codes, only available to our newsletter subscribers!