TRAINING & WORKSHOPS
Learn the latest data science concepts, tools, and techniques from the best. Forge a connection with these rockstars from industry and academia, who are passionate about molding the next generation of data scientists.
Taught By World-Class Data Science Experts

Irina Rish, PhD
Irina Rish is a Full Professor in the Computer Science and Operations Research Department at the Université de Montréal (UdeM) and a core faculty member of MILA – Quebec AI Institute. She holds Canada Excellence Research Chair (CERC) in Autonomous AI and a Canadian Institute for Advanced Research (CIFAR) Canada AI Chair. She received her MSc and PhD in AI from University of California, Irvine and MSc in Applied Mathematics from Moscow Gubkin Institute. Dr. Rish’s research focus is on machine learning, neural data analysis and neuroscience-inspired AI. Before joining UdeM and MILA in 2019, Irina was a research scientist at the IBM T.J. Watson Research Center, where she worked on various projects at the intersection of neuroscience and AI, and led the Neuro-AI challenge. She received multiple IBM awards, including IBM Eminence & Excellence Award and IBM Outstanding Innovation Award in 2018, IBM Outstanding Technical Achievement Award in 2017, and IBM Research Accomplishment Award in 2009. Dr. Rish holds 64 patents, has published over 80 research papers in peer-reviewed conferences and journals, several book chapters, three edited books, and a monograph on Sparse Modeling.

Matt Harrison
Matt Harrison has been using Python since 2000. He runs MetaSnake, a Python and Data Science consultancy and corporate training shop. In the past, he has worked across the domains of search, build management and testing, business intelligence, and storage.
He has presented and taught tutorials at conferences such as Strata, SciPy, SCALE, PyCON, and OSCON as well as local user conferences.
Machine Learning with XGBoost(Workshop)
Idiomatic Pandas(Workshop)

Dr. Jon Krohn
Jon Krohn is Chief Data Scientist at the machine learning company untapt. He authored the book Deep Learning Illustrated, which was released by Addison-Wesley in 2019 and became an instant #1 bestseller that was translated into six languages. Jon is renowned for his compelling lectures, which he offers in-person at Columbia University, New York University, and the NYC Data Science Academy, as well as online via O’Reilly, YouTube, and his A4N podcast on A.I. news. Jon holds a doctorate in neuroscience from Oxford and has been publishing on machine learning in leading academic journals since 2010.

Stefanie Molin
Stefanie Molin is a software engineer and data scientist at Bloomberg in New York City, where she tackles tough problems in information security, particularly those revolving around data wrangling/visualization, building tools for gathering data, and knowledge sharing. She is also the author of “Hands-On Data Analysis with Pandas,” which is currently in its second edition. She holds a bachelor’s of science degree in operations research from Columbia University’s Fu Foundation School of Engineering and Applied Science, as well as a master’s degree in computer science, with a specialization in machine learning, from Georgia Tech. In her free time, she enjoys traveling the world, inventing new recipes, and learning new languages spoken among both people and computers.

Thomas J. Fan
Thomas J. Fan is a Senior Software Engineer at Quansight Labs, working to sustain and evolve the PyData open-source ecosystem. He is a maintainer for scikit-learn, an open-source machine learning library written for Python. Previously, he worked at Columbia University, improving the interoperability between scikit-learn and AutoML systems. Thomas holds a Masters in Physics from Stony Brook University and a Masters in Mathematics from New York University.

Leonardo De Marchi
Leonardo De Marchi holds a Master in Artificial intelligence and has worked as a Data Scientist in the sports world, with clients such as the New York Knicks. He now works in Thomson Reuters as VP of Labs, and also provides consultancy and training for small and large companies. His previous experience includes being Head of Data Science and Analytics in Bumble, the largest dating site with over 500 million users, heading the team through acquisition and an IPO.

Brian Lucena, PhD
Brian Lucena is Principal at Numeristical, where he advises companies of all sizes on how to apply modern machine learning techniques to solve real-world problems with data. He is the creator of three Python packages: StructureBoost, ML-Insights, and SplineCalib. In previous roles he has served as Principal Data Scientist at Clover Health, Senior VP of Analytics at PCCI, and Chief Mathematician at Guardian Analytics. He has taught at numerous institutions including UC-Berkeley, Brown, USF, and the Metis Data Science Bootcamp.
Advanced Gradient Boosting (I): Fundamentals, Interpretability, and Categorical Structure(Training)
Advanced Gradient Boosting (II): Calibration, Probabilistic Regression and Conformal Prediction(Training)

Tejaswini Pedapati
Tejaswini Pedapati works at IBM Research. Her research is focused on interpretability and automating deep learning. To that end, she was involved in developing tools and algorithms to provide these capabilities for IBM products. She has a masters’ degree from Columbia University.
Introduction to AutoML: Hyperparameter Optimization and Neural Architecture Search(Tutorial)

Aric LaBarr, PhD
A Teaching Associate Professor in the Institute for Advanced Analytics, Dr. Aric LaBarr is passionate about helping people solve challenges using their data. There he helps design the innovative program to prepare a modern workforce to wisely communicate and handle a data-driven future at the nation’s first Master of Science in Analytics degree program. He teaches courses in predictive modeling, forecasting, simulation, financial analytics, and risk management. Previously, he was Director and Senior Scientist at Elder Research, where he mentored and led a team of data scientists and software engineers. As director of the Raleigh, NC office he worked closely with clients and partners to solve problems in the fields of banking, consumer product goods, healthcare, and government. Dr. LaBarr holds a B.S. in economics, as well as a B.S., M.S., and Ph.D. in statistics — all from NC State University.
Advanced Fraud Modeling & Anomaly Detection with Python & R(Training)

Jacob Andreas, PhD
Jacob Andreas is the X Consortium Assistant Professor at MIT. His research aims to build intelligent systems that can communicate effectively using language and learn from human guidance. Jacob earned his Ph.D. from UC Berkeley, his M.Phil. from Cambridge (where he studied as a Churchill scholar) and his B.S. from Columbia. As a researcher at Microsoft Semantic Machines, he founded the language generation team and helped develop core pieces of the technology that powers conversational interaction in Microsoft Outlook. He has been the recipient of Samsung’s AI Researcher of the Year award, MIT’s Kolokotrones teaching award, and paper awards at NAACL and ICML.

Daniel Lenton, PhD
Daniel Lenton is the creator of Ivy, which is an open-source framework with an ambitious mission to unify all other ML frameworks. Prior to starting Ivy, Daniel was a PhD student at Imperial College London, where he published research in the areas of machine learning, robotics and computer vision.
Unifying ML With One Line of Code(Tutorial)

Adam Breindel
Adam Breindel consults and teaches widely on Apache Spark and other technologies. Adam’s experience includes work with banks on neural-net fraud detection, streaming analytics, cluster management code, and web apps, as well as development at a variety of startup and established companies in the travel, productivity, and entertainment industries. He is excited by the way that Spark and other modern big-data tech remove so many old obstacles to system design and make it possible to explore new categories of interesting, fun, hard problems.

Akash Tandon
Akash Tandon is co-founder and CTO of Looppanel where he builds software to help product teams record, store and analyze user research data. He is a co-author of Advanced Analytics with PySpark, published by O’Reilly. Previously, Akash worked as a senior data engineer at Atlan, SocialCops and RedCarpet where he built data infrastructure for enterprise, government and finance use-cases. He has also been a participant and mentor in the Google Summer of Code program with the R Project for Statistical Computing.
Introduction to Large-scale Analytics with PySpark(Workshop)

Nikolay Manchev, PhD
Nikolay is an experienced Data Science professional who currently leads the EMEA Data Science team at Domino Data Lab. He holds an MSc in Software Technologies, an MSc in Data Science, and is currently undertaking postgraduate research at King’s College London. His area of expertise is Statistics, Mathematics, and Data Science in general, and his research interests are in Neural Networks with emphasis on biological plausibility. He writes articles and blogs regularly and speaks at various European conferences (ODSC, Big Data Spain, Strata, Big Data London etc.) to build awareness about data science and artificial intelligence. He is also the organizer of the London Data Science and Machine Learning meetup and recipient of several technical mastery awards like the Oracle ACE Award and the IBM Outstanding Technical Achievement Award.

Moez Ali
Innovator, Technologist, and a Data Scientist turned Product Manager with proven track record of building and scaling data products, platforms, and communities. Experienced in building and leading teams of data scientists, data engineers, and product managers. Strongly opinionated tech visionary and a thought partner to C-level leadership.
Moez Ali is an inventor and creator of PyCaret. PyCaret is an open-source, low-code, machine learning software. Ranked in top 1%, 8M+ downloads, 7K+ GitHub stars, 100+ contributors, and 1000+ citations.
Globally recognized personality for open-source work on PyCaret. Keynote speaker and top ten most-read writer in the field of artificial intelligence. Teaching AI and ML courses at Cornell, NY and Queens University, CA. Currently building world’s first hyper-focused Data and ML Platform.

Benjamin Batorsky, PhD
Ben is a Senior Data Scientist at the Institute for Experiential AI at Northeastern University. He obtained his Masters in Public Health (MPH) from Johns Hopkins and his PhD in Policy Analysis from the Pardee RAND Graduate School. Since 2014, he has been working in data science for government, academia and the private sector. His major focus has been on Natural Language Processing (NLP) technology and applications. Throughout his career, he has pursued opportunities to contribute to the larger data science community. He has presented his work at conferences, published articles, taught courses in data science and NLP, and is co-organizer of the Boston chapter of PyData. He also contributes to volunteer projects applying data science tools for public good.

Freddy Boulton
Freddy Boulton started his career as a data scientist for Nielsen where he built predictive models of television viewing behavior to make television ratings more accurate. This gave him a first hand-view of one of the biggest challenges faced by industry data scientists – being able to easily communicate and share machine learning models with stakeholders. He is currently solving that problem by working on Gradio, an open-source python library that lets data scientists create fully interactive demos of machine learning models with just a few lines of code.
A Practical Tutorial on Building Machine Learning Demos with Gradio(Workshop)

Panos Alexopoulos, PhD
Panos Alexopoulos has been working since 2006 at the intersection of data, semantics, and software, building intelligent systems that deliver value to business and society. Born and raised in Athens, Greece, he currently works as Head of Ontology at Textkernel, in Amsterdam, Netherlands, where he leads a team of Data Professionals in developing and delivering a large cross-lingual Knowledge Graph in the HR and Recruitment domain. Panos holds a PhD in Knowledge Engineering and Management from National Technical University of Athens, and has published more than 60 papers at international conferences, journals and books. He is the author of the book “Semantic Modeling for Data – Avoiding Pitfalls and Breaking Dilemmas” (O’Reilly, 2020), and a regular speaker and trainer in both academic and industry venues.
Mastering Adversarial Evaluation for NLP: A Practical Workshop(Workshop)

Julien Simon
Julien is currently Chief Evangelist at Hugging Face. He’s recently spent 6 years at Amazon Web Services where he was the Global Technical Evangelist for AI & Machine Learning. Prior to joining AWS, Julien served for 10 years as CTO/VP Engineering in large-scale startups.

Chandra Khatri
Chandra Khatri is the Chief Scientist and Head of AI at Got It AI, wherein, his team is transforming AI space by leveraging state-of-the-art technologies to deliver the world’s first fully autonomous Conversational AI system. Under his leadership, Got It AI is democratizing Conversational AI and related ecosystems through automation. Prior to Got-It, Chandra was leading various AI applied and research groups at Uber, Amazon Alexa and eBay.
At Uber, he was leading Conversational AI, Multi-modal AI, and Recommendation Systems. At Amazon he was the founding member of the Alexa Prize Competition and Alexa AI, wherein he was leading the R&D and got the opportunity to significantly advance the field of Conversational AI, particularly Open-domain Dialog Systems, which is considered as the holy-grail of Conversational AI and is one of the open-ended problems in AI. And at eBay he was driving NLP, Deep Learning, and Recommendation Systems related applied research projects.
He graduated from Georgia Tech with a specialization in Deep Learning in 2015 and holds an undergraduate degree from BITS Pilani, India. His current areas of research include Artificial and General Intelligence, Democratization of AI, Reinforcement Learning, Language and Multi-modal Understanding, and Introducing Common Sense within Artificial Agents.
More instructors added weekly
More Instructors Coming Soon

Track Spotlight


The Topic:
Deepfakes: How’re They Made, Detected, and How They Impact Society
Deepfake photos and videos are already impacting many industries and sectors of society, in both positive and negative ways. In this session, I’ll weave between the social context of deepfakes (how they’ve been used and what impact they’ve had) and the technical side of them (how they’re made, and some approaches to detecting them). This is the multifaceted story of deepfakes. No technical background is needed—the discussion of how they’re made and detected will be done at a broad overview level focusing on the concepts, with a brief tour at the end of more specific resources for those interested in digging in deeper and exploring some relevant Python tools.
The Instructor:
Noah Giansiracusa, PhD – Associate Professor of Mathematics and Data Science at Bentley University
See Training/Workshop Sessions
Confirmed Workshop/Training Sessions
(more sessions added weekly)
Advanced Fraud Modeling & Anomaly Detection with Python & R
Intro to Deep Learning with PyTorch and TensorFlow
NLP Fundamentals
Anomaly Detection with Python and R
Unifying ML With One Line of Code
Bagging to BERT – A Tour of Applied NLP
Advanced Gradient Boosting: Probabilistic Regression and Categorical Structure
Hyper-productive NLP with Hugging Face Transformers
Machine Learning with XGBoost
Introduction to Large-scale Analytics with PySpark
Beyond the Basics: Data Visualization in Python
Mastering Adversarial Evaluation for NLP: A Practical Workshop
A Practical Tutorial on Building Machine Learning Demos with Gradio
Getting Started with Hyperparameter Optimisation
Beginner to Advanced Level Training
From the Leading Instructors in the Industry
Machine Learning
Meta-learning for Machine Learning
Self Supervised learning; new techniques
Federated Learning for Data Privacy
Explainable AI and Bias in machine learning
Machine Learning at Scale using Apache Spark
Safety & Robustness in Machine Learning Modeling
Semi-supervised learning
Causal Inference with Machine Learning
Deep Learning
Deep Reinforcement learning
Deep Learning with PyTorch & Tensorflow
Deep Learning Deep Dive
Computer Vision 1/2 Day Training
Deep Learning with Keras
Introduction to Deep learning
Deepfakes Tutorial
Graph Representation Learning
NLP
Self Supervised learning; new techniques
Transfer Learning in NLP
Introduction to NLP and Topic Modeling
NLP Pre-trained Transformer Models with Bert, Ernie,, and GPT-2
State-of-the-Art NLP with PyTorch and Tensorflow
Semi-supervised learning
Hugging Face Transformer Library Workshop
Applications of NLP; Sentiment Analysis, Dialog Systems, and Semantic Search
ADDITIONAL TUTORIALS & WORKSHOPS
Machine Learning for Cyber Security
Real-time Streaming Analytics
MLOps and Machine Learning Pipelines
Introduction to Machine Learning Using scikit-learn
Auto Machine Learning (AutoML)
Distributed Machine Learning
Introduction to Data Analysis with Python Pandas
Machine Learning Workflow with Kubeflow & Kubernetes
Training Tracks








Ai+ is the only hands-on training platform solely developed for AI practitioners. Keep training with the top names in the industry.
EARLY BIRD OFFER | SAVE 60%
Choose your Pass
Virtual Bootcamp Orientation Sessions for in-person and virtual attendees
Monday | Virtual Mini-Bootcamp Training Sessions
Premium 1-Year Subscription to AI+ Training (value = $700)
Access to All Virtual Sessions & Events (Tue-Thu)
ODSC Keynotes & Talks (Wed-Thu)
4 Prep-Bootcamp live tutorials on Data Literacy, AI Literacy, Programming, and SQL (Value $796)
On-demand Access to All Conference recordings
Access to AI Solution Showcase Expo Area (Wed-Thu)
Access to ODSC In-Person Workshops & Training Sessions (Tue&Thu)
Access to In-person Mini-Bootcamp Training Sessions
Need More Reason To Sign Up?
ODSC Training Includes
Opportunities to form working relationships with some of the world’s top data scientists.
Access to 40+ training sessions and 70+ workshops.
Hands-on experience with the latest frameworks and breakthroughs in data science.
Affordable training–equivalent training at other conferences costs much more.
Professionally prepared learning materials, custom- tailored to each course.
Opportunities to connect with other ambitious, like-minded data scientists.
East 2022 Preliminary Training and Workshop Schedule
We are delighted to announce our East 2022 Schedule!
Training | Virtual | Bootcamp | Machine Learning | Intermediate-Advanced
Abstract Coming Soon!
Mona is a Data Science Manager at Greenhouse Software in New York City, where they contribute to data-informed decision making across the company and machine learning solutions to improve the hiring process for Greenhouse customers. They’ve previously worked in government, creating analytics and machine learning solutions to improve the lives of New Yorkers, and continue to be involved in civic projects through a number of volunteer and non-profit organizations. They’ve also been a statistics and data science educator with DataCamp, Emeritus, and in university settings. They hold a graduate degree in Developmental Psychology, and are passionate about contributing to the ethical use of data science methodology in the public and private sector.
Training | Virtual | NLP | Machine Learning | All Levels
In this course we will go through Natural Language Processing fundamentals, such as pre-processing techniques,tf-idf, embeddings, and more. It will be followed by practical coding examples, in python, to teach how to apply the theory to real use cases. The goal of this workshop is to provide the attendees all the basic tools and knowledge they need to solve real problems and understand the most recent and advanced NLP topics…more details
Leonardo De Marchi holds a Master in Artificial intelligence and has worked as a Data Scientist in the sports world, with clients such as the New York Knicks. He now works in Thomson Reuters as VP of Labs, and also provides consultancy and training for small and large companies. His previous experience includes being Head of Data Science and Analytics in Bumble, the largest dating site with over 500 million users, heading the team through acquisition and an IPO.
Training | In-person | Machine Learning | MLOps & Data Engineering | All Levels
Data professionals (analysts, scientists, operators, …) utilize data to extract insights from it and subsequently to make decisions that impact day-to-day operations as well as long term strategy for organizations. The process of going from data to insights and using decisions typically involve (a) extracting data from varied structured and unstructured sources; (b) normalizing, cleaning, stitching such varied data sources to obtain “ground truth”; (c) extracting structure within data, interacting with it, visualizing it to obtain insights; (d) predicting, optimization, doing scenario analysis to make decisions, and (e) automating all of the above while allowing for human in the loop intervention. In this hands-on workshop, we shall discuss all of these with the help of illustrative datasets in a low-code / no-code AI environment…more details
Devavrat Shah, PhD is the founding director of Statistics and Data Science at MIT. He is also a member of IDSS, LIDS, CSAIL and ORC at MIT. He co-founded Celect, Inc. (now part of Nike) in 2013 to help retailers decide what to put where by accurately predicting demand using omni-channel data. He is a co-founder and CTO of IkigaiLabs with the mission to build self-driving organizations by enabling data-driven operations with human-in-the-loop.
His research focuses on statistical inference and stochastic networks. His contributions span a variety of areas including resource allocation in communications networks, inference and learning on graphical models, algorithms for social data processing including ranking, recommendations and crowdsourcing and more recently causal inference. He has made foundational contributions to the development of “gossip” protocols and “message-passing” algorithms for statistical inference which have been the building blocks of modern distributed data processing systems. His work spans a range of areas across electrical engineering, computer science and operations research.
His work has received broad recognition, including prize paper awards in Machine Learning, Operations Research and Computer Science, and career prizes including 2010 Erlang prize from the INFORMS Applied Probability Society, awarded bi-annually to a young researcher who has made outstanding contributions to applied probability. He is a distinguished alumni of his alma mater IIT Bombay from where he graduated with the honor of President of India Gold Medal. His work has been covered in popular press including NY Times, Forbes, Wired and Redditt.
Tutorial | Virtual |Machine Learning | Deep Learning | Intermediate
In this presentation, we will share our experience on the topic. We will start by classical methods (single imputation, multiple imputation, likelihood based methods) developed in the inferential framework, where the aim is to estimate at best the parameters and their variance in the presence of missing data. Then we will present recent results in a supervised-learning setting…more details
Julie Josse is a senior researcher in statistics and machine learning applied to health at Inria, a French research institute in digital sciences, and Professor at Ecole Polytechnique (Paris). She is an expert in the treatment of missing values (inference, multiple imputation, matrix completion, MNAR, supervised learning with missing values) and has created a website on the topic (https://rmisstastic.netlify.app/) for users. Her research also focuses on causal inference techniques (causal inference with missing values, combining RCT and observational data) for personalized medicine. Julie Josse is dedicated to reproducible research with R statistical software: she has developed packages including FactoMineR and missMDA to transfer her work.
Gaël Varoquaux is a research director working on data science and health at Inria (French Computer Science National research). His research focuses on using data and machine learning for scientific inference, with applications to health and social science, as well as developing tools that make it easier for non-specialists to use machine learning. He has long applied it to brain-imaging data to understand cognition. Years before the NSA, he was hoping to make bleeding-edge data processing available across new fields, and he has been working on a mastermind plan building easy-to-use open-source software in Python. He is a core developer of scikit-learn, joblib, Mayavi and nilearn, a nominated member of the PSF, and often teaches scientific computing with Python using the scipy lecture notes.
Training | Virtual | Machine Learning | Intermediate
We will learn about evaluating, calibrating, and inspecting models during this training. Model evaluation is an essential piece of the ML workflow. We will cover multiple metrics and see how they behave on various combinations of datasets and models. We will explore scikit-learn’s plotting API to visualize a model’s performance. Next, we will learn how to calibrate a machine learning model with scikit-learn. We will see how models behave before and after calibrating by visualizing an estimator’s calibration. Next, we will explore techniques to inspect machine learning models. Specifically, we will see how to examine open-box machine learning models, such as linear models and random forests…more details
Thomas J. Fan is a Senior Software Engineer at Quansight Labs, working to sustain and evolve the PyData open-source ecosystem. He is a maintainer for scikit-learn, an open-source machine learning library written for Python. Previously, he worked at Columbia University, improving the interoperability between scikit-learn and AutoML systems. Thomas holds a Masters in Physics from Stony Brook University and a Masters in Mathematics from New York University.
Tutorial | Virtual | Deep Learning | Machine Learning | Intermediate – Advanced
In this tutorial, we will introduce RLlib (http://rllib.io/), an open-source RL library with a proven track record for solving real-life industry problems at scale. We will walk through different industrial RL use cases and the solutions RLlib offers for those. In particular, we will build a recommender system using offline RL, show how to train policies that master complex multi-agent games, and demonstrate how you can connect external simulators to RLlib at scale for faster learning…more details
Richard Liaw is an engineer manager at Anyscale, where he leads a team in building open source machine learning libraries on top of Ray. He is on leave from the PhD program at UC Berkeley, where he worked at the RISELab advised by Ion Stoica, Joseph Gonzalez, and Ken Goldberg. In his time in the PhD program, he was part of the Ray team, building scalable ML libraries on top of Ray.
Christy Bergman is a ML and RL Developer Advocate on the Ray AIR and Ray RLlib teams at Anyscale. Her work involves creating demos and tutorials on how to use Ray and Anyscale. Before that, she was a senior AI/ML specialist solutions architect at AWS and a data scientist at several other companies.
Avnish Narayan is an ML Engineer at Anyscale where he works on RLlib. He’s passionate about exploring where RL can improve upon existing solutions in industrial applications. He previously received his MS in Computer Science at USC, where he did research on the applications of RL in robotic manipulation problems.
Training | Virtual | Bootcamp | Data Visualization | Intermediate-Advanced
In this workshop, we will build an interactive data visualization from scratch using d3.js in the browser. The possibilities shown in d3 examples are exciting but the API surface of d3 and the various browser standards like HTML, CSS, SVG, and JavaScript, can be overwhelming. Think of this workshop as a guided tour that will point out the important things to pay attention to as we go step-by-step from CSV file to interactive visualization…more details
Ian Johnson is a User Experience Engineer at Google. He also organizes of Bay Area d3, starts with SVG and then dives deep into d3 including DOM manipulation, categorical and quantitative scales, axis, brushes, color schemes, events and histograms. Ian likes to make sense of data by exploring it visually with D3.js!
Workshop | Virtual | Machine Learning | All Levels
Data scientists can spend 60 to 80% of their time exploring and cleaning data. When they’re given an updated data set, this process should be repeated but often, it isn’t. This can lead to a model that poorly describes the system it represents. However, there is something that you can do about this. The “feature type” system in OCI Data Science’s Accelerated Data Science (ADS) SDK classifies data based on what they represent, not how they’re stored in memory. It also gives you the tools to compute custom statistics, create visualizations, use a validator and a warning system, and select columns based on the feature types…more details
A modern polymath, John holds advanced degrees in mechanical engineering, kinesiology and data science, with a focus on solving novel and ambiguous problems. As a senior applied data scientist at Amazon, John worked closely with engineering to create machine learning models to arbitrate chatbot skills, entity resolution, search, and personalization.
As a principal data scientist for Oracle Cloud Infrastructure, he is now defining tooling for data science at scale. John frequently gives talks on best practices and reproducible research. To that end, he has developed an approach to improve validation and reliability by using data unit tests and has pioneered Data Science Design Thinking. He also coordinates SoCal RUG, the largest R meetup group in Southern California.
Workshop | Virtual | MLOps & Data Engineering | Machine Learning | Intermediate
This session is all about vector databases. If you are a data scientist or data/software engineer this session would be interesting for you. You will learn how you can easily run your favourite ML models with the vector database Weaviate. You’ll get an overview of what a vector database like Weaviate can offer: such as semantic search, question answering, data classification, named entity recognition, multimodal search, and much more. After this session, you are able to load in your own data and query it with your preferred ML model!…more details
Laura is a Data Scientist at SeMI, where we build the open-source vector search engine Weaviate. She researches new machine learning features for Weaviate and works on everything UX/DX related to Weaviate. For example, she is responsible for the GraphQL API design. She is in close contact with our open source community. Additionally, she likes to solve custom use cases with Weaviate, and introduces Weaviate to other people by means of Meetups, talks and presentations.
Training | Virtual | Machine Learning | Beginner
We will start this training by learning about scikit-learn’s API for supervised machine learning. scikit-learn’s API mainly consists of three methods: fit to build models, predict to make predictions from models, and transform to modify data. This consistent and straightforward interface abstracts away the underlying algorithm, thus enabling us to focus on our particular problems. We will learn about the importance of splitting your data into train and test sets for model evaluation. Next, we will learn about combining preprocessing techniques with machine learning models using scikit-learn’s Pipeline…more details
Thomas J. Fan is a Senior Software Engineer at Quansight Labs, working to sustain and evolve the PyData open-source ecosystem. He is a maintainer for scikit-learn, an open-source machine learning library written for Python. Previously, he worked at Columbia University, improving the interoperability between scikit-learn and AutoML systems. Thomas holds a Masters in Physics from Stony Brook University and a Masters in Mathematics from New York University.
Workshop | Virtual | Big Data Analytics | Machine Learning | All Levels
This talk will preview some of the latest new features in Streamlit, and Streamlit Cloud, and will finish with our deploying a machine learning model in the cloud so that others can see and interact with our results…more details
Adrien is co-founder and CEO of Streamlit which is pioneering next-generation tools for machine learning engineers. Dr. Treuille has been VP of Simulation Zoox, lead a Google X project, and was a Professor of Computer Science and Robotics at Carnegie Mellon. He gives talks around the world, including to the President’s Council of Advisors on Science and Technology, and has won numerous scientific awards, including the MIT TR35. Adrien and his work have been featured in the documentaries “What Will the Future Be Like” by PBS/NOVA, and “Lo and Behold” by Werner Herzog.
Training | Virtual | NLP | Machine Learning | Intermediate-Advanced
The half-day training will train attendees on how to use Hugging Face’s Hub as well as the Transformers and Datasets library to efficiently prototype and productize machine learning models…more details
Patrick von Platen is a research engineer at Hugging Face and one of the core maintainers of the popular Transformers library. He specializes in speech recognition, encoder-decoder models and long-range sequence modeling. Before joining Hugging Face, Patrick conducted research in speech recognition at Uber AI, Cambridge University, and RWTH Aachen University.
Workshop | In-person | Deep Learning | Beginner – Intermediate
Thanks to packages like Keras and Torch, you can get started with neural networks with only a few lines of R code. Once you understand the basic concepts, you will be able to use deep learning to make AI-generated humorous content! In this workshop you’ll get to make a neural network on text data to generate pet names. By the end of the workshop you should feel comfortable using neural networks in a variety of contexts…more details
Dr. Jacqueline Nolis is a data science leader with 15 years of experience in running data science teams and projects at companies ranging from Airbnb to Boeing. She is the Chief Product Officer at Saturn Cloud where she helps design products for data scientists. Jacqueline has a PhD in Industrial Engineering and her academic research focused on optimization under uncertainty. Data science is also her hobby—like making an R package that mails physical postcards of your plots.
Workshop | In-person | Machine Learning | All Levels
This talk will cut through some of the biggest issues I’ve seen with Pandas code after working with the library for a while and writing three books on it…more details
Matt Harrison has been using Python since 2000. He runs MetaSnake, a Python and Data Science consultancy and corporate training shop. In the past, he has worked across the domains of search, build management and testing, business intelligence, and storage.
He has presented and taught tutorials at conferences such as Strata, SciPy, SCALE, PyCON, and OSCON as well as local user conferences.
Training | In-Person | Machine Learning Safety and Security | Responsible AI | Intermediate
In this workshop, which is directed to both a Data Science audience who may want to learn DFIR, and a DFIR audience who may want to learn Data Science, Jess Garcia will explain the fundamentals of Data Science and DFIR, and will lead the audience through all the different steps of an end-to-end investigation using exclusively Data Science tools and techniques. In the process, Jess will introduce multiple forensic artifacts and will explain the value they provide to the overall investigation….more details
Jess Garcia is the Founder of the global Cybersecurity/DFIR firm One eSecurity and a Senior Instructor with the SANS Institute. During his 25 years in the field, Jess has led a myriad of complex multinational investigations for Fortune 500 companies and global organizations. As a SANS Instructor, Jess stands as one of the most prolific and veteran ones, having taught 10+ different highly technical Cybersecurity/DFIR courses in hundreds of conferences world-wide over the last 19 years. Jess is also an active Cybersecurity/DFIR Researcher. With the mission of bringing Data Science/AI to the DFIR field, Jess launched in 2020 the DS4N6 initiative (www.ds4n6.io), under which he is leading the development of multiple open source tools, standards and analysis platforms for DS/AI+DFIR interoperability.
David Contreras is a Senior Forensic Analyst in One eSecurity, working in Incident Response, leading the Research team and Internal products development. David has more than six years in DFIR, working in multiple remarkable incidents in international organizations and many other projects related to Threat Hunting, SOCs, etc. He also collaborates in the research of the DS4N6 project (www.ds4n6.io), helping to provide Data Science and Machine Learning content to the Cybersecurity community.
Training | In-Person | Machine Learning | Deep Learning | Beginner
Have you ever wondered about how those data scientists at Facebook and LinkedIn make friend recommendations? Or how epidemiologists track down patient zero in an outbreak? If so, then this tutorial is for you. In this tutorial, we will use a variety of datasets to help you understand the fundamentals of network thinking, with a particular focus on constructing, summarizing, and visualizing complex networks…more details
Eric is an Investigator at the Novartis Institutes for Biomedical Research, where he solves biological problems using machine learning. He obtained his Doctor of Science (ScD) from the Department of Biological Engineering, MIT, and was an Insight Health Data Fellow in the summer of 2017. He has taught Network Analysis at a variety of data science venues, including PyCon USA, SciPy, PyData, and ODSC, and has also co-developed the Python Network Analysis curriculum on DataCamp. As an open-source contributor, he has made contributions to PyMC3, matplotlib, and bokeh. He has also led the development of the graph visualization package nxviz, and a data cleaning package pyjanitor (a Python port of the R package).
Workshop | In-Person | Machine Learning | MLOps & Data Engineering | Beginner-Intermediate
Although powerful, modern machine learning models can be sensitive. Seemingly subtle changes in a data distribution can destroy the performance of otherwise state-of-the art models, which can be especially problematic when ML models are deployed in production.. In this workshop, we will give a hands-on overview to drift detection, the discipline focused on detecting such changes. We will start by building an understanding of the ways in which drift can occur, and why it pays to detect it. We’ll then explore the anatomy of a drift detector, and learn how they can be used to detect drift in a principled manner…more details
Ed Shee, Head of Developer Relations at Seldon. Having previously led a tech team at IBM, Ed comes from a cloud computing background and is a strong believer in making deployments as easy as possible for developers. With an education in computational modelling and an enthusiasm for machine learning, Ed has blended his work in ML and cloud native computing together to cement himself firmly in the emerging field of MLOps.
Ashley is a data science research engineer at Seldon, where he works on developing production-ready tools for drift, adversarial and outlier detection. Prior to joining Seldon, he spent a number of years as a Research Fellow at The Alan Turing Institute. Here, he explored the use of machine learning for tackling aerospace engineering problems, with a focus on explainability and uncertainty quantification. Ashley also completed a PhD at the University of Cambridge, and is a keen proponent of open-source software
Workshop | In-person | Machine Learning | Intermediate
This workshop is a hands-on introduction to Bayesian Decision Analysis (BDA), which is a framework for using probability to guide decision-making under uncertainty. I start with Bayes’s Theorem, which is the foundation of Bayesian statistics, and work toward the Bayesian bandit strategy, which is used for A/B testing, medical tests, and related applications. For each step, I provide a Jupyter notebook where you can run Python code and work on exercises…more details
Allen Downey is a Professor of Computer Science at Olin College of Engineering in Needham, MA. He is the author of several books related to computer science and data science, including Think Python, Think Stats, Think Bayes, and Think Complexity. Prof Downey has taught at Colby College and Wellesley College, and in 2009 he was a Visiting Scientist at Google. He received his Ph.D. in Computer Science from U.C. Berkeley, and M.S. and B.S. degrees from MIT.
Training | In-person | Machine Learning | Data Visualization | Intermediate-Advanced
This session will equip you with the skills to make customized visualizations for your data using Python. While there are many plotting libraries to choose from, the prolific Matplotlib library is always a great place to start. Since various Python data science libraries utilize Matplotlib under the hood, familiarity with Matplotlib itself gives you the flexibility to fine tune the resulting visualizations (e.g., add annotations, animate, etc.). This session will also introduce interactive visualizations using HoloViz, which provides a higher-level plotting API capable of using Matplotlib and Bokeh (a Python library for generating interactive, JavaScript-powered visualizations) under the hood…more details
Stefanie Molin is a software engineer and data scientist at Bloomberg in New York City, where she tackles tough problems in information security, particularly those revolving around data wrangling/visualization, building tools for gathering data, and knowledge sharing. She is also the author of “Hands-On Data Analysis with Pandas,” which is currently in its second edition. She holds a bachelor’s of science degree in operations research from Columbia University’s Fu Foundation School of Engineering and Applied Science, as well as a master’s degree in computer science, with a specialization in machine learning, from Georgia Tech. In her free time, she enjoys traveling the world, inventing new recipes, and learning new languages spoken among both people and computers.
Workshop | In-person | Machine Learning | Deep Learning | Intermediate
Learn how to pair traditional AutoML workflows with orchestration (the automated configuration, management, and coordination of data and models) and experiment tracking (management of a system of record for our data and models) to provide yourself with the tools to turn your machine learning models into an operational machine learning workflow which can plug into common DevOps platforms…more details
Anish loves turning ML ideas into ML products. Anish started his career working with multiple Data Science teams within SAP, using traditional ML, deep learning, and recommendation systems before landing at Weights & Biases. With the art of programming and a little bit of magic, Anish crafts ML projects to help better serve our customers, turning “oh nos” to “a-ha”s!
Bootcamp | Virtual | Machine Learning | Beginner
In this workshop we cover the basics of modern R. You’ll learn how to read data from a CSV using the readr package, manipulate data with dplyr and make compelling visualizations with ggplot2…more details
Tutorial | In-person | Responsible AI | Intermediate
In this tutorial, Olivier Blais, Project Editor of the “ISO/IEC TS 5471 – Quality evaluation guidelines for AI Systems” technical specifications, will share modern best practices and techniques to improve the quality evaluation of AI systems. This session targets AI practitioners, AI project managers, and AI leads as AI system quality is paramount in every AI project, delivery processes and experts’ toolbox. Participants will learn about upcoming AI system quality evaluation approach and methods that will inspire future certifications…more details
Olivier is co-founder and VP of decision science at Moov AI. He is the editor of the international ISO standard that defines the quality of artificial intelligence systems, where he leads a team of 50 AI professionals from around the world.
His cutting-edge AI and machine learning knowledge have led him to implement a data culture in various industries and support digital transformation projects in many companies such as Pratt & Whitney, Metro, Sharethrough, Merck, and Premier Tech.
He is a mentor for AI for Creative Destruction Labs and coaches several start-ups. As a speaker, his topics of choice are adopting and applying AI and responsible AI.
Olivier is the recipient of the prestigious “30 under 30” award (2019) and is co-author of a patent for an advanced algorithm that evaluates a borrower’s creditworthiness.
Workshop | In-person
In this workshop, you’ll learn an easy way to incorporate data science and AI/ML into an OpenShift development workflow. As an example, you’ll use an object detection model to detect ‘dog(s)’ in an image…more details
Audrey Reznik has been in the IT industry (private and public sectors) for 27 years in multiple verticals. In the last 4 years, she worked as a Data Scientist at ExxonMobil where she created a Data Science Enablement team to help data scientists easily deploy ML models in a Hybrid Cloud environment. Audrey was instrumental in educating scientists about what the OpenShift platform was and how to use OpenShift containers (images) to organize, run, and visualize data analysis results. Audrey now works as a Data Scientist with the Red Hat Data Science Team where she is focused on next-generation applications. She is passionate about Data Science and in particular the current opportunities with ML and Federated Data.
Prasanth Anbalagan is a Senior Principal Software Engineer (QE and Analysis) on the Red Hat OpenShift Data Science team. Prasanth earned his M.S and Ph.D in Computer Science from North Carolina State University focusing on Software Reliability Engineering, Predictive Modeling and Automated Software Engineering. As a member of AI team at Red Hat, Prasanth focuses on development of services and tools to analyze, manipulate, and visualize data and execute automated operations as part of an Analytics, Machine Learning and AI platform.
Tutorial | In-person | Deep Learning | All Levels
This is an introductory and hands-on guided tutorial of Ray, which provides powerful yet easy-to-use abstractions for implementing distributed systems in Python. This tutorial includes a brief talk to provide an overview of concepts in Ray Core, why you should use Ray for distributing Python and machine learning workloads, and a brief discussion on Ray’s library ecosystem.
Primarily, the tutorial will focus on Ray Core APIs to write remote functions and actors. Attendees will walk away with an understanding of why distributed computing is a necessity today, the common design patterns for writing distributed Python applications in Ray, and a basic understanding of how Ray works under hood…more details
Stephanie is a final-year PhD student at UC Berkeley and a software engineer at Anyscale. She is interested in abstractions for distributed computing and problems in fault tolerance. Towards this end, she is also a maintainer for the open-source project Ray, which provides a simple, universal API for building distributed applications in Python.
Workshop | In-person | Deep Learning | Machine Learning
Some of the most exciting tech research happening now is in the area of deep learning, but how do we get started with hands-on practice and how do we gain a basic understanding of what is going on within all of those deep learning layers?
This lesson will help a beginner navigate this new landscape. We’ll start by identifying some of the major breakthroughs that make deep learning what it is today. Then we’ll get a chance to learn about the math and architecture behind neural nets. Lastly, we’ll talk about how you can create and tune your own neural networks via Keras…more details
Julia Lintern currently works as an instructor for the Metis Data Science Flex Program. Previously, she worked as a Data Scientist for the New York Times. Julia began her career as a structures engineer designing repairs for damaged aircraft. Julia holds an MA in applied math from Hunter College, where she focused on visualizations of various numerical methods and discovered a deep appreciation for the combination of mathematics and visualizations. During certain seasons of her career, she has also worked on creative side projects such as Lia Lintern, her own fashion label.
Workshop | In-person | Data Analytics | All Levels
This workshop will introduce you to optimization as a powerful tool in your analytics toolbox. You’ll learn what optimization is, how to think about a problem through an optimization lens, and how to formulate the problem. Having heard about supply chain in the past two years due to shipping delays and labor shortages, it’s only fitting to discuss an optimization problem within this domain. You will see how to formulate that problem mathematically, how to add complexity to it step by step, and how to solve it utilizing Python and Gurobi, a leading commercial optimization solver…more details
Ehsan is a Principal Operations Research Scientist at Decision Spot, with knowledge in logistics and transportation industries. Over the years, he has worked with several Fortune 500 companies, including GE, Norfolk Southern, and C.H. Robinson. Ehsan has worked on a variety of supply chain projects and has focused primarily on network optimization and routing. Before joining Decision Spot, he worked at Opex Analytics, which was acquired by Llamasoft, and later by Coupa.
He holds a PhD in Industrial Engineering and has been an Adjunct Lecturer at Northwestern Master of Science in Analytics (MSiA) program since Fall 2019.
Workshop | In-person | NLP | Machine Learning | Intermediate
In this workshop, I will demonstrate an end-to-end NER application to identify medications in social media data using the spaCy NLP library. We will begin with an overview of neural NER models, including recent transformer-based models (i.e. BERT). We will then dig into spaCy’s project structure and how to set up their base neural model for NER. Our first objective will be to identify medication entities in social media conversation on Twitter. Once we’ve run and examined the results of the model, we will see how we can achieve significant performance boosts by tweaking project parameters and using a transformer architecture…more details
Ben is a Senior Data Scientist at the Institute for Experiential AI at Northeastern University. He obtained his Masters in Public Health (MPH) from Johns Hopkins and his PhD in Policy Analysis from the Pardee RAND Graduate School. Since 2014, he has been working in data science for government, academia and the private sector. His major focus has been on Natural Language Processing (NLP) technology and applications. Throughout his career, he has pursued opportunities to contribute to the larger data science community. He has presented his work at conferences, published articles, taught courses in data science and NLP, and is co-organizer of the Boston chapter of PyData. He also contributes to volunteer projects applying data science tools for public good.
Workshop| Virtual | NLP | Machine Learning | Intermediate
In order to gain a proper understanding of modeling I will explain traditional NLP techniques using TFIDF approaches and go into details of different deep learning architectures such as feed-forward neural network and convolutional neural network (CNN). Along with these concepts, I will also show code snippets in Keras to build the classifier. I will conclude with some of the metrics commonly used in measuring the performance of the classifier…more details
Sanghamitra Deb is a Staff Data Scientist at Chegg, she works on problems related school and college education to sustain and improve the learning process. Her work involves recommendation systems, computer vision, graph modeling, deep NLP analysis , data pipelines and machine learning. Previously, Sanghamitra was a data scientist at a Accenture where she worked on a wide variety of problems related data modeling, architecture and visual story telling. She is an avid fan of python and has been programming for more than a decade.
Trained as an astrophysicist (she holds a PhD in physics) she uses her analytical mind to not only work in a range of domains such as: education, healthcare and recruitment but also in her leadership style. She mentors junior data scientists at her current organization and coaches students from various field to transition into Data Science. Sanghamitra enjoys addressing technical and non-technical audiences at conferences and encourages women into joining tech careers. She is passionate about diversity and has organized Women In Data Science meetups.
Workshop | Virtual
In this workshop, you’ll learn an easy way to incorporate data science and AI/ML into an OpenShift development workflow. As an example, you’ll use an object detection model to detect ‘dog(s)’ in an image…more details
Audrey Reznik has been in the IT industry (private and public sectors) for 27 years in multiple verticals. In the last 4 years, she worked as a Data Scientist at ExxonMobil where she created a Data Science Enablement team to help data scientists easily deploy ML models in a Hybrid Cloud environment. Audrey was instrumental in educating scientists about what the OpenShift platform was and how to use OpenShift containers (images) to organize, run, and visualize data analysis results. Audrey now works as a Data Scientist with the Red Hat Data Science Team where she is focused on next-generation applications. She is passionate about Data Science and in particular the current opportunities with ML and Federated Data.
Prasanth Anbalagan is a Senior Principal Software Engineer (QE and Analysis) on the Red Hat OpenShift Data Science team. Prasanth earned his M.S and Ph.D in Computer Science from North Carolina State University focusing on Software Reliability Engineering, Predictive Modeling and Automated Software Engineering. As a member of AI team at Red Hat, Prasanth focuses on development of services and tools to analyze, manipulate, and visualize data and execute automated operations as part of an Analytics, Machine Learning and AI platform.
Workshop | Virtual
Existing ETL & MLOps tools claim to solve orchestration problems but no one does it the right way. In this hands-on workshop, we’ll go through a sample standard ML data pipeline, which represents the typical data science use case, extracting data from multiple data sources: DB and DWH, transforming it, viewing the data, and cleaning it. Then we’ll make sure it meets the quality standards and start training the model. During each of these phases, we will talk about testing (unit/integration tests). As a pre-pipeline step, we’ll talk about optional data preparation flows and talk about some strategies to accelerate the whole process by setting the quality gates, data testing, and some of the labeling services out there…more details
Ido Michael co-founded Ploomber to help data scientists build faster. He’d been working at AWS leading data engineering/science teams. Single handedly he built 100’s of data pipelines during those customer engagements together with his team. He came to NY for his MS at Columbia University. He focused on building Ploomber after he constantly found that projects dedicated about 30% of their time just to refactor the dev work (prototype) into a production pipeline.
Tutorial | Virtual | Machine Learning | Advanced
In this tutorial, I’ll discuss how machine learning tools can be rigorously integrated into observational study analyses, and how they interact with classical statistical ideas around randomization, semiparametric modeling, double robustness, etc. I’ll also survey some recent advances in methods for treatment heterogeneity, and illustrate them with example applications in R. When deployed carefully, machine learning enables us to develop causal estimators that reflect an observational study design more closely than basic linear regression based methods…more details
Stefan Wager is an Associate Professor of Operations, Information and Technology at Stanford Graduate School of Business, and an Associate Professor of Statistics (by courtesy). He received his PhD in Statistics from Stanford in 2016, and has worked with or consulted for several Silicon Valley companies, including Dropbox, Facebook, Google and Uber. His research lies at the intersection of causal inference, optimization, and statistical learning. He is particularly interested in developing new solutions to problems in statistics, economics and decision making that leverage recent advances in machine learning.
Tutorial | Virtual | NLP | Beginner – Intermediate
This talk aims to give an overview walkthrough of the suite of NLP methods grounded in neural-network architectures, including recurrent neural networks (RNNs), transformers, and convolutional neural networks (CNNs). We will connect them by diving into their similarities and differences. You will come away from the talk gaining the overview picture of NLP and grasping the theoretical essence that underpins NLP methods. This talk hopes to empower you with the foundational NLP knowledge and reduce the knowledge barrier for you to jumpstart your NLP projects…more details
Chengyin Eng is a Senior Data Science Consultant on the Machine Learning Practice team at Databricks. She is experienced in developing end-to-end scalable machine learning solutions for cross-functional clients. She also teaches deep learning and ML in production courses and regularly gives talks at universities and conferences. Prior to Databricks, she worked in the life insurance industry, where she contributed to risk modeling and marketing pipelines. She holds an MS in Computer Science from the University of Massachusetts, Amherst. Her Bachelor’s degrees were in Statistics and Environmental Studies from Mount Holyoke College.
Workshop | Virtual | Responsible AI | Machine Learning Safety & Security
In this workshop, we’ll walk you through several real-world use cases for synthetic data. You’ll learn how to balance a biased medical dataset to improve early cancer detection in women, generate realistic time-series financial data for forecasting, and more. You can test the examples yourself – some with Gretel-synthetics, a fully open-source package, and some using Gretel Blueprints, a collection of notebooks and sample code that leverage the open-source package through Gretel’s client…more details
Lipika Ramaswamy is a Senior Applied Scientist at Gretel.ai where she focuses on developing advanced synthetic data generation technologies that include privacy guarantees. Prior to Gretel.ai, she worked as a data scientist at LeapYear, a differential privacy software company. Lipika attended Bryn Mawr College for her undergrad, where she began her STEM career, and holds a Master’s in Data Science from Harvard University.
Workshop | Virtual | Machine Learning | NLP | Beginner – Intermediate
This talk will present self-supervised speech representation learning approaches and their connection to related research areas. Since many of the current methods focused solely on automatic speech recognition as a downstream task, we will review recent efforts on benchmarking learned representations to extend the application of such representations beyond speech recognition…more details
Abdelrahman Mohamed (PhD) is a research scientist at Meta AI Research (previously, Facebook AI Research (FAIR)). He was a principal scientist/manager in Amazon Alexa and a researcher in Microsoft Research. Abdelrahman was part of the team that started the Deep Learning revolution in Spoken Language Processing in 2009. He is the recipient of the IEEE Signal Processing Society Best Journal Paper Award for 2016. His current research interest focuses on improving, using, and benchmarking learned speech representations, e.g. HuBERT, Wav2vec 2.0, TextlessNLP, and SUPERB.
Training | Virtual | Machine Learning | Intermediate-Advanced
During this training, we will learn about processing text data, working with imbalanced data, and Poisson regression. We will start by processing text data with scikit-learn’s vectorizers. Since the output of these vectorizers is sparse, we will also review scikit-learn estimators that can handle sparse data. We will look at estimators with class weights, resampling techniques provided by imbalanced-learn, and using a bagging classifier with balancing. Next, we will explore how to work with imbalanced data where one of the classes appears more frequently than the others…more details
Thomas J. Fan is a Senior Software Engineer at Quansight Labs, working to sustain and evolve the PyData open-source ecosystem. He is a maintainer for scikit-learn, an open-source machine learning library written for Python. Previously, he worked at Columbia University, improving the interoperability between scikit-learn and AutoML systems. Thomas holds a Masters in Physics from Stony Brook University and a Masters in Mathematics from New York University.
Training | Virtual | NLP | Machine Learning | Intermediate
Named Entity Recognition (NER) and Relationship Extraction (RE) are foundational for many downstream NLP tasks such as Information Retrieval and Knowledge Base construction. While pre-trained models exist for both NER and RE tasks, they are usually specialized for some narrow application domain. If your application domain is different, your best bet is to train your own models. However, the costs associated with training, specifically generating training data, can be a significant deterrent for doing so…more details
Sujit Pal builds intelligent systems around research content that help researchers and medical professionals achieve better outcomes. His areas of interest are Information Retrieval, Natural Language Processing and Machine Learning (including Deep Learning). As an individual contributor in the Elsevier Labs team, he works with diverse product teams to help them solve tough problems in these areas, as well as build proofs of concept at the cutting edge of applied research.
Training | Virtual | Machine Learning | Beginner-Intermediate
Scikit-learn is a Python machine learning library used by data science practitioners from many disciplines. We will learn about cross-validation, tuning machine learning algorithms, and pandas interoperability during this training. Cross-validation enables us to evaluate our machine learning models by splitting our data into multiple training and testing datasets. We will learn to handle missing values with imputation using univariate and multivariate techniques. Next, we will explore tuning algorithms in scikit-learn with grid search and random search. We will learn about categorical features and how to use scikit-learn’s encoders to convert these categorical features into numerical features for a machine-learning algorithm to consume…more details
Thomas J. Fan is a Senior Software Engineer at Quansight Labs, working to sustain and evolve the PyData open-source ecosystem. He is a maintainer for scikit-learn, an open-source machine learning library written for Python. Previously, he worked at Columbia University, improving the interoperability between scikit-learn and AutoML systems. Thomas holds a Masters in Physics from Stony Brook University and a Masters in Mathematics from New York University.
Training | Virtual | Machine Learning | MLOps & Data Engineering | Intermediate
In this session, we will provide a tutorial on TensorFlow and Keras, and guide your through a series of hands-on examples ranging from basic MNIST dataset to time series processing for model building. We will also cover data input and output processing with TensorFlow, from processing simple CSV files to cloud data warehouse services such as Google Cloud BigQuery. As a bonus we will also cover the integration of TensorFlow with Apache Kafka, to illustrate the streaming data pipeline that is used broadly across the industry…more details
Yong Tang is the Director of Engineering at MobileIron. His most recent focus is on data processing in machine learning. He is a maintainer and the SIG I/O lead of the TensorFlow project. He received the Open Source Peer Bonus Award from Google for his contributions to TensorFlow and is the author of the Kafka Dataset module in TensorFlow. In addition to TensorFlow, Yong Tang also contributes to many other projects for the open-source community. He is a maintainer of Docker, CoreDNS, and SwarmKit. Yong Tang received his PhD in Computer Science & Engineering at the University of Florida.
Tutorial | In-person | Machine Learning Safety & Security | Responsible AI | Beginner – Intermediate
After presenting a brief history of cybersecurity data science, I’ll discuss different cybersecurity data science specializations, such as malware detection and intrusion detection. The workshop will also introduce prominent practitioners and companies within the cybersecurity data science field. Finally, I’ll cover pathways to becoming a cybersecurity data scientist…more details
John Speed Meyers is a security data scientist at Chainguard. His interests include software supply chain security, open source software security and applications of data science to cybersecurity. He has a PhD in policy analysis from the Pardee RAND Graduate School.
Tutorial | In-person | Deep Learning | Machine Learning | Beginner – Intermediate
In this talk, Sabrina Smai (Program Manager, Microsoft) shares the most recent updates to PyTorch Profiler, a demo and tips for leveraging the Profiler API to help you quickly locate and address common bottlenecks, as well as a look into what’s to come…more details
Sabrina Smai is a Product Manager in Microsoft’s AI Frameworks team. She works with all things PyTorch and ONNX Runtime.
Tutorial | In-Person | Machine Learning for Biotech & Pharma | Beginner – Intermediate
In this tutorial session, attendees will learn how a set of open source tools can be leveraged to perform standardization, characterization, and data quality assessment for various health data sources. Open source tools including Synthea, ETL-Synthea, Achilles, Data Quality Dashboard, and Ares will be reviewed and demonstrated in a data operations pipeline. We will demonstrate how the global health information community leverages this strategy to ensure research-ready health data…more details
Frank DeFalco is the Director of Epidemiology Analytics at Janssen Research and Development where he architects software solutions and data platforms for the analysis and application of observational data sources. He is currently the leader and Benevolent Dictator of the OHDSI open source architecture working group. Frank is a presenter and panelist at OHDSI symposiums and has served as faculty for OHDSI symposium tutorials classes on architecture and common data model vocabulary.
In addition to leading the OHDSI Architecture working group Frank initiated development of a standardized platform for observational analytics known as ATLAS. He is an active contributor to the open source software repositories developed and released by OHDSI including ATLAS, WebAPI, Achilles, Circe, Arachne, Visualizations, Hermes, Helios and others. Frank’s areas of expertise include computation epidemiology, large scale data platforms, software development and architecture, data visualization and informatics.
Prior to joining Janssen Research and Development, Frank held the position of Senior Principal and Director of Collaboration and Analytics at British Telecom where he was a strategic advisor for multiple Fortune 100 companies across sectors including Consumer Products, Telecommunications and Pharmaceuticals. Frank received his undergraduate degrees in Computer Science and Psychology at Rutgers University.
Training | In-person | Machine Learning | Intermediate
This tutorial will show how to use XGBoost. It will demonstrate model creation, model tuning, model evaluation, and model interpretation…more details
Matt Harrison has been using Python since 2000. He runs MetaSnake, a Python and Data Science consultancy and corporate training shop. In the past, he has worked across the domains of search, build management and testing, business intelligence, and storage.
He has presented and taught tutorials at conferences such as Strata, SciPy, SCALE, PyCON, and OSCON as well as local user conferences.
Workshop | In-person | MLOps & Data Engineering | Intermediate
This session will explain:
Why a knowledge graph approach was chosen to power Beamery AI
Why RDF standard was chosen
How our core ontology was designed
Henri is Senior Knowledge Engineering at Beamery – a full-talent lifecycle scaleup making sense of enterprise people data, based in London and the US. At work, he specialises in ontology design, entity reconciliation and data provenance for semantic web databases. Henri has worked at a number of HRTech startups, and is passionate about modelling and serving AI models in the domain of people, skills, companies and occupations.
Workshop | In-person | Machine Learning Safety & Security | Machine Learning | Beginner – Intermediate
I will aim provide an important step to progress the AI for Cybersecurity discipline in this talk by summarizing the state of the field and promising future directions. I will offer a multi-disciplinary AI for Cybersecurity roadmap that centers on major themes such as cybersecurity applications and data, advanced AI methodologies for cybersecurity, and AI-enabled decision making. I will also provide examples of recent research at the intersection of AI and cybersecurity, particularly around detecting vulnerable code on GitHub repositories and detecting emerging threats from the Dark Web for proactive cyber threat intelligence capabilities…more details
Dr. Sagar Samtani is an Assistant Professor and Grant Thornton Scholar in the Department of Operations and Decision Technologies at Indiana University. Dr. Samtani graduated with his Ph.D. from the AI Lab from University of Arizona. Dr. Samtani’s research interests are in AI for Cybersecurity, developing deep learning approaches for cyber threat intelligence, vulnerability assessment, open-source software, AI risk management, and Dark Web analytics. He has received funding from NSF’s SaTC, CICI, and SFS programs and has published over 40 peer-reviewed articles in leading information systems, machine learning, and cybersecurity venues. He is deeply involved with industry, serving on the Board of Directors for the DEFCON AI Village and Executive Advisory Council for the CompTIA ISAO.
Training | In-Person | NLP | Machine Learning for Biotech & Pharma| Intermediate-Advanced
Spark NLP is a Natural Language Processing (NLP) library built on top of Apache Spark ML. It provides simple, performant & accurate NLP annotations for machine learning pipelines that can scale easily in a distributed environment. Spark NLP comes with 3000+ pretrained pipelines and models in more than 200+ languages. It supports nearly all the NLP tasks and modules that can be used seamlessly in a cluster. Downloaded more than 1 million every month and experiencing 20x growth for the last one year, Spark NLP is used by 54% of healthcare organizations as the world’s most widely used NLP library in the enterprise. In this talk, Veysel will conduct a hands-on session to go over the library’s healthcare components and teach how to solve any NLP problem in healthcare with state-of-the-art methods and practices across the industry. He will also explain the best practices for building production-grade solutions around the latest research…more details
Veysel is a well known thought leader in healthcare NLP and works as a Lead Data Scientist and ML Engineer at John Snow Labs, improving the Spark NLP for the Healthcare library and delivering hands-on projects in Healthcare and Life Science. He is a seasoned data scientist with a strong background in every aspect of data science including NLP, machine learning, deep learning, and big data with over ten years of experience. He’s also pursuing his Ph.D. in ML at Leiden University, Netherlands, and delivers graduate-level lectures in Auto ML and Distributed Data Processing. He also has broad consulting experience in Statistics, Data Science, Software Architecture, MLOps, Machine Learning, and AI to several start-ups, boot camps, and companies around the globe. He also speaks at Data Science & AI events, conferences and workshops, and has delivered more than a hundred talks at international as well as national conferences and meetups.
Training | In-person | Machine Learning Safety & Security | Responsible AI | Intermediate
We can easily trick a classifier into making embarrassingly false predictions. When this is done systematically and intentionally, it is called an adversarial attack. Specifically, this kind of attack is called an evasion attack. In this session, we will examine an evasion use case and elaborate on other forms of attacks. Then, we explain two defense methods: spatial smoothing preprocessing and adversarial training. Lastly, we will demonstrate one robustness evaluation method and one certification method to ascertain that the model can withstand such attacks…more details
Serg Masís has been at the confluence of the internet, application development, and analytics for the last two decades. Currently, he’s a Climate and Agronomic Data Scientist at Syngenta, a leading agribusiness company with a mission to improve global food security. Before that role, he co-founded a search engine startup, incubated by Harvard Innovation Labs, that combined the power of cloud computing and machine learning with principles in decision-making science to expose users to new places and events efficiently. Whether it pertains to leisure activities, plant diseases, or customer lifetime value, Serg is passionate about providing the often-missing link between data and decision-making. He wrote the bestselling book “Interpretable Machine Learning with Python” and is currently working on a new book titled “DIY AI” for Addison-Wesley for a broader audience of curious developers, makers, and hackers.