Online registration ends in
For East 2022 please rever to the event app for the in-person conference and the our virtual platform for our most current virtual schedule
We are delighted to announce our East 2023 Schedule!
All sessions are scheduled in EST time zone (Eastern Standard Time)
- ODSC Talks/Keynotes schedule includes Tuesday May 9 – Thursday May 11. In-person sessions are available to Gold, Platinum, Mini-Bootcamp, and VIP Pass holders. Business talks are available to Ai x Pass holders. Virtual Sessions are available to Virtual Premium, Virtual Platinum & Virtual Mini-Bootcamp pass holders.
- ODSC Trainings are scheduled from Tuesday May 9 – Thursday May 11. In-person sessions are available to Platinum, Mini-Bootcamp, and VIP Pass holders. Virtual Sessions are available to Virtual Platinum & Virtual Mini-Bootcamp pass holders.
- ODSC Workshop/Tutorials are scheduled from Tuesday May 9th to Thursday May 11th. All in-person sessions are available to VIP, Platinum, Mini-Bootcamp and Gold pass holders. Silver Pass holders can attend only on Wednesday and Thursday. Virtual Sessions are available for Virtual Premium, Virtual Platinum & Virtual Mini-Bootcamp pass holders.
- ODSC Bootcamp Sessions are scheduled VIRTUALLY on Monday May 8 as pre conference training. They are ONLY available for In person Mini-Bootcamp, and VIP Pass and Virtual Mini-Bootcamp holders.
Speaker and speaker schedule times are subject to change.
Please Note: In-Person attendees will have access to virtual sessions. If you have a virtual pass, please note that we will not live-stream any in-person sessions. Only virtual sessions will be recorded.
The prerequisites to the workshop and training sessions are available HERE
Please review the final schedule:
– for in-person: Download TBM Engage app
Enter the app code: east2023 in the App Store
– for virtual: live.odsc.com (agenda section)
Virtual | Bootcamp | Machine Learning | Beginner
The Introduction to Machine Learning Workshop will build upon the attendee’s foundation of math and coding knowledge to develop a basic understanding of the most popular machine learning algorithms used in industry today. We will answer such questions as: What are the different types of ML algorithms ? What is Overfitting and how can we avoid it? Why is XGBoost consistently outperform other algorithms?…more details
Julia Lintern currently works as a Director of Data Science at Gartner. Previously, she worked as a Data Scientist for the New York Times. Julia began her career as a structures engineer designing repairs for damaged aircraft. Julia holds an MA in applied math from Hunter College, where she focused on visualizations of various numerical methods and discovered a deep appreciation for the combination of mathematics and visualizations. During certain seasons of her career, she has also worked on creative side projects such as Lia Lintern, her own fashion label.
Virtual | Bootcamp | Machine Learning | MLOps | Intermediate
In this training, you will learn how to accelerate your data analyses using the Python language and Pandas, a library specifically designed for tabular data analysis. We start by learning the core Pandas data structures, the Series and DataFrame. From these foundations, we will learn to use the split-apply-combine paradigm for grouped computations, manipulate time series, and perform advanced joins between datasets. Specifically, loading, filtering, grouping, and transforming data. Having completed this workshop, you will understand the fundamentals and advanced features of Pandas, be aware of common pitfalls, and be ready to perform your own analyses…more details
Daniel Gerlanc has worked as a data scientist for more than decade and been writing software for nearly 20 years. He frequently teaches live trainings on oreilly.com and is the author of the video course Programming with Data: Python and Pandas. He has coauthored several open source R packages, published in peer-reviewed journals, and is a graduate of Williams College.
Virtual | Bootcamp | Machine Learning | Beginner
Data science uses a combination of mathematics, statistics and computer science to help us solve questions of importance in a large number fields. In this workshop we will introduce the underlying mathematical principles of the field, with example problems gleaned from a number of different industries. By the end of the workshop the participant will know enough data science to explore their own problems, and be ready for more intermediate and advanced courses…more details
Eric Eager is the VP of Research and Development at SumerSports, a football analytics startup founded by Paul Tudor Jones and Jack Jones. Prior to joining Sumer, he held similar roles at Pro Football Focus, and is responsible for many of the insights that have grown the game of American football to this day. Eric holds a PhD in Mathematical Biology from the University of Nebraska, and has taught at Wharton, DataCamp and the University of Wisconsin – La Crosse, publishing over 25 academic papers during his career.
In-person | Half-Day Training | NLP | Machine Learning | Intermediate-Advanced
Large Language Models like GPT-4 are transforming the world in general and the field of data science in particular at an unprecedented pace. This training introduces deep learning transformer architectures including LLMs. Critically, it also demonstrates the breadth of capabilities of state-of-the-art LLMs like GPT-4 can deliver, including for dramatically revolutionizing the development of machine learning models and commercially successful data-driven products, accelerating the creative capacities of data scientists and pushing them in the direction of being data product managers. Brought to life via hands-on code demos that leverage the Hugging Face and PyTorch Lightning Python libraries, this training covers the full lifecycle of LLM development, from training to production deployment.…more details
Jon Krohn is Co-Founder and Chief Data Scientist at the machine learning company Nebula. He authored the book Deep Learning Illustrated, an instant #1 bestseller that was translated into seven languages. He is also the host of SuperDataScience, the data science industry’s most listened-to podcast. Jon is renowned for his compelling lectures, which he offers at leading universities and conferences, as well as via his award-winning YouTube channel. He holds a PhD from Oxford and has been publishing on machine learning in prominent academic journals since 2010.
In-person | Full-Day Training | Machine Learning | All Tracks | Beginner
In this training session we will work through the entire process of training a machine learning model in R. Starting with the scaffolding of cross-validation, onto exploratory data analysis, feature engineering, model specification, parameter tuning and model selection. We then take the finished model and deploy it as an API in a Docker container for production use…more details
In-person | Full-Day Training | Data Visualization & Data Analysis | Machine Learning | Intermediate-Advanced
The human brain excels at finding patterns in visual representations, which is why data visualizations are essential to any analysis. Done right, they bridge the gap between those analyzing the data and those consuming the analysis. However, learning to create impactful, aesthetically-pleasing visualizations can often be challenging. This session will equip you with the skills to make customized visualizations for your data using Python…more details
Stefanie Molin is a software engineer and data scientist at Bloomberg in New York City, where she tackles tough problems in information security, particularly those revolving around data wrangling/visualization, building tools for gathering data, and knowledge sharing. She is also the author of “Hands-On Data Analysis with Pandas,” which is currently in its second edition. She holds a bachelor’s of science degree in operations research from Columbia University’s Fu Foundation School of Engineering and Applied Science, as well as a master’s degree in computer science, with a specialization in machine learning, from Georgia Tech. In her free time, she enjoys traveling the world, inventing new recipes, and learning new languages spoken among both people and computers.
In-person | Half-Day Training | Machine Learning | Intermediate-Advanced
Gradient Boosting remains the most effective method for classification and regression problems on tabular data. This session is Part Two of two, covering advanced topics that are newer and may be less familiar. First, we will discuss how to calibrate the probabilities of classification models, reviewing the major techniques. Next, we will discuss Probabilistic Regression, wherein the goal is to predict the full probability distribution of the numerical target given the features, demonstrating different approaches to this problem. Finally, we will present tools for Conformal Prediction – a hot topic which can provide prediction intervals with strong theoretical guarantees.…more details
Brian Lucena is Principal at Numeristical, where he advises companies of all sizes on how to apply modern machine learning techniques to solve real-world problems with data. He is the creator of three Python packages: StructureBoost, ML-Insights, and SplineCalib. In previous roles he has served as Principal Data Scientist at Clover Health, Senior VP of Analytics at PCCI, and Chief Mathematician at Guardian Analytics. He has taught at numerous institutions including UC-Berkeley, Brown, USF, and the Metis Data Science Bootcamp.
In-person | Workshop | Data Visualization & Data Analysis | Beginner-Intermediate
By completing this workshop, you will learn how to create compelling visualizations, using directed and undirected graphs, dynamic graphs, and clustering. You will also learn about centrality metrics and network density. Additionally, you will learn about different layout algorithms, as well as the strategies for interpreting and communicating the graph data in meaningful ways.,,more details
Tamilla Triantoro is an Associate Professor of Computer Information Systems at Quinnipiac University and a leader of the Masters Program in Business Analytics. She was previously an Academic Director of Data Analytics at the University of Connecticut. Dr. Triantoro is an author, speaker, researcher, and educator in the fields of artificial intelligence, data analytics, user experience with technology, and the future of work. She received her Ph.D. from the City University of New York where she researched online user behavior. Dr. Triantoro presents her research around the world, attempting to demystify the complexity of today’s digital world and to make it understandable and relevant to business professionals and the general audience.
In-person | Half-Day Training | Machine Learning | Deep Learning | Beginner
Scikit-learn is a Python machine-learning library used by data science practitioners from many disciplines. We will start this training by learning about scikit-learn’s API for supervised machine learning. scikit-learn’s API mainly consists of three methods: fit to build models, predict to make predictions from models, and transform to modify data…more details
Thomas J. Fan is a Staff Software Engineer at Quansight Labs and is a maintainer for scikit-learn, an open-source machine learning library for Python. Previously, Thomas worked at Columbia University to improve interoperability between scikit-learn and AutoML systems. He is a maintainer for skorch, a neural network library that wraps PyTorch. Thomas has a Masters in Mathematics from NYU and a Masters in Physics from Stony Brook University.
In-person | Workshop | Machine Learning | Beginner
Pandas can be tricky, and there is a lot of bad advice floating around. This tutorial will cut through some of the biggest issues I’ve seen with Pandas code after working with the library for a while and writing three books on it…more details
Matt Harrison has been using Python since 2000. He runs MetaSnake, a Python and Data Science consultancy and corporate training shop. In the past, he has worked across the domains of search, build management and testing, business intelligence, and storage.
He has presented and taught tutorials at conferences such as Strata, SciPy, SCALE, PyCON, and OSCON as well as local user conferences.
In-person | Half-Day Training | Machine Learning | Intermediate
The workshop participants will then get a chance to complete a set of tasks revolving around the various optimisation techniques and observe the outcomes. The tasks will include hyperparameter optimisation for a deep neural network and optimization of the parameters of one ensemble model (Random Forest)…more details
Nikolay is an experienced Data Science professional who currently leads the EMEA Data Science team at Domino Data Lab. He holds an MSc in Software Technologies, an MSc in Data Science, and is currently undertaking postgraduate research at King’s College London. His area of expertise is Statistics, Mathematics, and Data Science in general, and his research interests are in Neural Networks with emphasis on biological plausibility. He writes articles and blogs regularly and speaks at various European conferences (ODSC, Big Data Spain, Strata, Big Data London etc.) to build awareness about data science and artificial intelligence. He is also the organizer of the London Data Science and Machine Learning meetup and recipient of several technical mastery awards like the Oracle ACE Award and the IBM Outstanding Technical Achievement Award.
In-person | Half-Day Training | Machine Learning Safety and Security | All Levels
This course outlines the typical fraud framework at an organization and where data science can play a role as well as lay out how to build an analytically advanced fraud system…more details
A Teaching Associate Professor in the Institute for Advanced Analytics, Dr. Aric LaBarr is passionate about helping people solve challenges using their data. There he helps design the innovative program to prepare a modern workforce to wisely communicate and handle a data-driven future at the nation’s first Master of Science in Analytics degree program. He teaches courses in predictive modeling, forecasting, simulation, financial analytics, and risk management. Previously, he was Director and Senior Scientist at Elder Research, where he mentored and led a team of data scientists and software engineers. As director of the Raleigh, NC office he worked closely with clients and partners to solve problems in the fields of banking, consumer product goods, healthcare, and government. Dr. LaBarr holds a B.S. in economics, as well as a B.S., M.S., and Ph.D. in statistics — all from NC State University.
Virtual | Half-Day Training | NLP | Intermediate
This workshop is designed to explore how artificial intelligence can be used to generate creative outputs and to inspire technical audiences to use their skills in new and creative ways. The workshop will also include a series of code exercises designed to give participants hands-on experience working with AI models to generate creative outputs…more details
Leonardo De Marchi holds a Master in Artificial intelligence and has worked as a Data Scientist in the sports world, with clients such as the New York Knicks. He now works in Thomson Reuters as VP of Labs, and also provides consultancy and training for small and large companies. His previous experience includes being Head of Data Science and Analytics in Bumble, the largest dating site with over 500 million users, heading the team through acquisition and an IPO.
Virtual | Tutorial
This talk centers around the employment of MATLAB Big Data functionality. Specifically, it demonstrates the efficient use of the MATLAB Computational Platform for Exploratory and Confirmatory Data Analysis (EDA and CDA) respectively…more details
Dr Hezekiah O Babatunde is a faculty at the University of Virginia’s College at Wise, VA, USA and a Machine Learning Consultant. He completed his PhD degree in Computer Science from the School of Computer and Security Science, Edith Cowan University, Perth, WA, Australia in 2015. He is a certified Big Data Consultant and Scientist from Arcitura Certification. He also holds a B.Sc degree in Mathematical Sciences (computer major) and three MSc degrees in Applied Mathematics, Computer Sciences and Organizational Leadership from FUNNAB, University of Ibadan and Charleston Southern University (USA) respectively. He worked as a Postdoctoral Research Associate in Systems Biology at Professor John Yin’s Laboratory at the university of Wisconsin, Madison, USA.
Virtual | Full-Day Training | NLP | Beginner
In this course we will go through Natural Language Processing fundamentals, such as pre-processing techniques,tf-idf, embeddings, and more. It will be followed by practical coding examples, in python, to teach how to apply the theory to real use cases…more details
Leonardo De Marchi holds a Master in Artificial intelligence and has worked as a Data Scientist in the sports world, with clients such as the New York Knicks. He now works in Thomson Reuters as VP of Labs, and also provides consultancy and training for small and large companies. His previous experience includes being Head of Data Science and Analytics in Bumble, the largest dating site with over 500 million users, heading the team through acquisition and an IPO.
Laura Skylaki is a Manager of Applied Research in Thomson Reuters Labs, where she leads advanced machine learning projects in the domain of Legal and Tax AI.With a career spanning more than a decade at the intersection of research and practical application, she has contributed technical expertise in diverse fields such as bioinformatics and stem cell biology, image processing and natural language processing. She holds a doctorate in stem cell bioinformatics from the University of Edinburgh, UK, and has been publishing on machine learning applications in leading academic journals since 2012.
Virtual | Half-Day Training | Machine Learning | Deep Learning | Data Engineering | All Levels
This workshop will provide you with a crash course in how to leverage graphs by expanding your data science workflows. The hands-on section focuses on providing concrete examples of how to interact with the Neo4j database from a python notebook, create graph-y features, work with graph specific ML algorithms and how to leverage these results. Newbies to graph data science as well as experienced graph practitioners are welcome to join and dive into the GDS (Graph Data Science) Library…more details
Alison Cossette is a dynamic Data Science Strategist, Educator, and Podcast Host. As a Developer Advocate at Neo4j specializing in Graph Data Science, she brings a wealth of expertise to the field. With her strong technical background and exceptional communication skills, Alison bridges the gap between complex data science concepts and practical applications.
Alison’s passion for responsible AI shines through in her work. She actively promotes ethical and transparent AI practices and believes in the transformative potential of responsible AI for industries and society. Through her engagements with industry professionals, policymakers, and the public, she advocates for the responsible development and deployment of AI technologies.
Alison’s academic journey includes pursuing her Master of Science in Data Science program, specializing in Artificial Intelligence, at Northwestern University and research with Stanford University Human-Computer Interaction Crowd Research Collective. Alison combines academic knowledge with real-world experience. She leverages this expertise to educate and empower individuals and organizations in the field of data science.
Overall, Alison Cossette’s multifaceted background, commitment to responsible AI, and expertise in data science make her a respected figure in the field. Through her role as a Developer Advocate at Neo4j and her podcast, she continues to drive innovation, education, and responsible practices in the exciting realm of data science and AI.
Virtual | Workshop | NLP | Machine Learning | Deep Learning, Data Engineering & Big Data | Beginner-Intermediate
This workshop will introduce you to the fundamentals of PySpark (Spark’s Python API), the Spark NLP library and other best practices in Spark programming when working with textual or natural language data…more details
Akash Tandon is co-founder and CTO of Looppanel where he builds software to help product teams record, store and analyze user research data. He is a co-author of Advanced Analytics with PySpark, published by O’Reilly. Previously, Akash worked as a senior data engineer at Atlan, SocialCops and RedCarpet where he built data infrastructure for enterprise, government and finance use-cases. He has also been a participant and mentor in the Google Summer of Code program with the R Project for Statistical Computing.
Virtual | Workshop | NLP | Beginner-Intermediate
This workshop will equip participants with the skills and knowledge to conduct adversarial evaluation of NLP systems. Through active exercises and examples, we will discuss how to identify and address system weaknesses and explore how this approach can improve accuracy, reduce risk, and uncover potential blind spots. Participants will gain a greater understanding of how to use adversarial evaluation to detect and prevent errors in their NLP systems…more details
Panos Alexopoulos has been working since 2006 at the intersection of data, semantics, and software, building intelligent systems that deliver value to business and society. Born and raised in Athens, Greece, he currently works as Head of Ontology at Textkernel, in Amsterdam, Netherlands, where he leads a team of Data Professionals in developing and delivering a large cross-lingual Knowledge Graph in the HR and Recruitment domain. Panos holds a PhD in Knowledge Engineering and Management from National Technical University of Athens, and has published more than 60 papers at international conferences, journals and books. He is the author of the book “Semantic Modeling for Data – Avoiding Pitfalls and Breaking Dilemmas” (O’Reilly, 2020), and a regular speaker and trainer in both academic and industry venues.
In-person | Workshop | NLP | Intermediate
Leaving this workshop, you will understand each of these topics, and you will have gained the practical, hands-on expertise to start integrating modern NLP in your domain. Participants will fine-tune and prompt engineer state-of-the-art models like BART and XLM-Roberta, and they will peer behind the curtain of world shaking technologies like ChatGPT to understand their utility and architectures…more details
Daniel Whitenack (aka Data Dan) is a Ph.D. trained data scientist working with SIL International on NLP and speech technology for local languages in emerging markets. He has more than ten years of experience developing and deploying machine learning systems at scale. Daniel co-hosts the Practical AI podcast, has spoken at conferences around the world (Applied Machine Learning Days, O’Reilly AI, QCon AI, GopherCon, KubeCon, and more), and occasionally teaches data science/analytics at Purdue University.
In-person | Workshop | NLP | Deep Learning | Intermediate
In this workshop we will explore some popular NLP techniques that have broad applicability. From the basics of bagging and word vectors to the creating of contextualized representations of words and sentences, the workshop will equip participants with the tools they need to turn raw text data into useful insights…more details
Ben is a Senior Data Scientist at the Institute for Experiential AI at Northeastern University. He obtained his Masters in Public Health (MPH) from Johns Hopkins and his PhD in Policy Analysis from the Pardee RAND Graduate School. Since 2014, he has been working in data science for government, academia and the private sector. His major focus has been on Natural Language Processing (NLP) technology and applications. Throughout his career, he has pursued opportunities to contribute to the larger data science community. He has presented his work at conferences, published articles, taught courses in data science and NLP, and is co-organizer of the Boston chapter of PyData. He also contributes to volunteer projects applying data science tools for public good.
In-person | Workshop | Machine Learning | Intermediate
This workshop will show how to use XGBoost. It will demonstrate model creation, model tuning, model evaluation, and model interpretation…more details
Matt Harrison has been using Python since 2000. He runs MetaSnake, a Python and Data Science consultancy and corporate training shop. In the past, he has worked across the domains of search, build management and testing, business intelligence, and storage.
He has presented and taught tutorials at conferences such as Strata, SciPy, SCALE, PyCON, and OSCON as well as local user conferences.
In-person | Bootcamp | Machine Learning | All Levels
Are you looking for new and effective ways to learn data science? Join our workshop on competency-based education (CBE) in data science and explore cutting-edge approaches to learning in the fast-paced world of technology. Our workshop offers a unique perspective on virtual, CBE education and explores how the latest online learning resources and mentoring models can be used to support those looking to pick up a new skill…more details
Daniel J. Smith, PhD, MBA has worked at WGU for 3 years. He has experience in several industries in analytics through the director level in insurance, health care administration, and higher education. His experience is in AI and machine learning applications in industry using R, Tableau, SAS and Python. He enjoys working with students to improve their analytical, programming, and communication skills.
Leticia Rabor worked as a professional Software and Systems Engineer in the Defense and Aerospace industries for over 13 years. She has designed, implemented, and tested various image formation subsystem components for ground system development.
She has also worked in Academia since 2012. Her roles include program chair and instructor. Leticia is currently an adjunct professor at Fort Hays State University and a full-time senior instructor at Western Governor University.
She has a Master of Science degree in Information Assurance and a bachelor’s degree in Computer Science. Her yearly activities include conducting an external one hour workshop in both mobile development and JavaScript at the Geek Girls Tech Conference at University of San Diego (USD). She participated as one of the panel experts for “The future of mobile development” at the Geek Girls Tech Conference in San Diego, California. She is a member of the Women Who Code (WWC) and a recipient for “Faculty of the Year” award in 2017.
In-person| Tutorial | Data Engineering & Big Data | Machine Learning | Intermediate-Advanced
The Google Cloud Big Data Essentials Workshop offers industry professionals an opportunity to explore Google Cloud and its big data capabilities. They will gain hands-on experience with various big data analytics tools, such as Dataproc, BigQuery, Cloud Storage, and Compute Engine…more details
Mohammad Soltanieh-ha is a Clinical Assistant Professor in the Information Systems department at Boston University’s Questrom School of Business. He specializes in data science programming, big data analytics, and business applications. He earned his Ph.D. in computational physics from Northeastern University and currently focuses his research on computer vision applications in cancer diagnosis, macroeconomic forecasting, and high-performance computing. Mohammad holds leadership roles at Google and the American Physical Society (APS). He founded APS’s data science unit in 2019 and serves as a Faculty Expert at Google Cloud, where he supports cloud computing education and best practices for fellow faculty members.
In-person | Workshop | Machine Learning | All Levels
This workshop presents Taipy, a new low-code Python package that allows you to create complete Data Science applications, including graphical visualization and managing algorithms, pipelines, and scenarios…more details
Florian Jacta is a specialist of Taipy, a low-code open-source Python package enabling any Python developers to easily develop a production-ready AI application. Package pre-sales and after-sales functions. He is data Scientist for Groupe Les Mousquetaires (Intermarche) and ATOS. He developed several Predictive Models as part of strategic AI projects. Also, Florian got his master’s degree in Applied Mathematics from INSA, Major in Data Science and Mathematical Optimization.
Albert has skills in machine learning and big data to solve (financial) optimization problems. He developed projects of different skill levels for Taipy’s tutorial videos. He got his degree from McGill University – Bachelor of Science. Major in Computer Science & Statistics. Minor in Finance.
In-person | Workshop | Machine Learning | Beginner
This tutorial workshop will cover both statistical and neural network-based models for time series analysis. It will be introductory in nature and focus on the discussion of a couple of workhorse statistical and neural network-based time series models that are frequently applied to solving time series forecasting problems…more details
Jeffrey Yau is currently Chief Data & A.I. Officer at Fanatics Collectibles. Most recently, he served as Global Head of Data Science, Analytics & Engineering at Amazon Music where he oversaw multiple teams who developed both insights-packed analytics and end-to-end statistical and machine learning systems. Prior to Amazon, Jeffrey worked at WalmartLabs as the VP of Data Science & Engineering where he led the team responsible for powering Walmart store mobile apps and the entire store finance system. Further, his team created end-to-end machine learning systems for key business initiatives and had a multi-billion dollar impact annually on Walmart U.S.
Over the years, he has held various senior level positions in quantitative finance at global investment management firm AllianceBernstein, consulting firm Data Science at Silicon Valley Data Science, multinational financial services company Charles Schwab Corporation, and the world’s leading professional services firm KPMG. He began his career as a tenure-track Assistant Professor of Economics at Virginia Tech, and he was an adjunct professor at UC Berkeley, Cornell, and NYU, teaching machine learning and advanced statistical modeling for finance and business.
In-person | Career Workshop | Beginner
Get a crash course on the most common types of technical interview questions that show up in FAANG Data Science, ML, and Data Analyst interviews, and how to best solve them. Then, practice what you learned, by collaboratively solving a real SQL, Statistics, ML, and Product Analytics interview question with Nick Singh, an Ex-Facebook & Google employee turned best-selling author of Ace the Data Science Interview…more details
Nick Singh is an Ex-Facebook & Google Engineer turned best-selling author of Ace the Data Science Interview, and founder of SQL Interview Platform DataLemur.com. His career advice on LinkedIn has earned him 100,000 followers, and he’s successfully career coached 578 people to land their dream job in data!
Virtual | Tutorial | Deep Learning | Intermediate-Advanced
The tutorial will provide the audience with a review of the major algorithmic principles and their integration with deep learning architectures…more details
Virtual | Tutorial | Machine Learning | MLOps | Intermediate
In this tutorial, we will build a production quality Model Observability pipeline with open source python stack. ML engineers, Data scientists and Researchers can use this framework to further extend and develop a comprehensive Model Observability platform…more details
Rajeev Prabhakar is a Machine Learning Platform Engineer at Lyft. Currently he is focused on building model observability at scale for a wide range of ML applications across teams at Lyft. Prior to Lyft, he worked at Quantcast on the ML platform team. Enabling distributed computing with spark and notebooks on k8s, building control systems for optimal spend budget allocation and optimising real time prediction latency in a low latency serving environment are some of the things he worked on.
Anindya Saha is a Staff Machine Learning Platform Engineer @Lyft, focusing on distributed computing solutions for machine learning and data engineering. He led and implemented the Spark on Kubernetes support on ml platform for feature engineering at scale with ephemeral Spark clusters on k8s. He is currently working on enabling scalable distributed model training on the ML platform.
Virtual | Tutorial | Deep Learning | Intermediate-Advanced
In this tutorial, we introduce Colossal-AI, which is a unified parallel training system designed to seamlessly integrate different paradigms of parallelization techniques including data parallelism, pipeline parallelism, multiple tensor parallelism, and sequence parallelism…more details
James Demmel is the Dr. Richard Carl Dehmel Distinguished Professor of Computer Science and Mathematics at the University of California at Berkeley, and former Chair of the EECS Dept. He also serves as Chief Strategy Officer for the start-up HPC-AI Tech, whose goal is to make large-scale machine learning much more efficient, with little programming effort required by users. Demmel’s research is in high performance computing, numerical linear algebra, and communication avoiding algorithms. He is known for his work on the widely used LAPACK and ScaLAPACK linear algebra libraries. He is a member of the National Academy of Sciences, National Academy of Engineering, and American Academy of Arts and Sciences; a Fellow of the AAAS, ACM, AMS, IEEE and SIAM; and winner of the IPDPS Charles Babbage Award, IEEE Computer Society Sidney Fernbach Award, the ACM Paris Kanellakis Award, the J. H. Wilkinson Prize in Numerical Analysis and Scientific Computing, and numerous best paper prizes.
Yang You is a Presidential Young Professor at National University of Singapore. He is on an early career track at NUS for exceptional young academic talents with great potential to excel. He received his PhD in Computer Science from UC Berkeley. His advisor is Prof. James Demmel, who was the former chair of the Computer Science Division and EECS Department. Yang You’s research interests include Parallel/Distributed Algorithms, High Performance Computing, and Machine Learning. The focus of his current research is scaling up deep neural networks training on distributed systems or supercomputers. In 2017, his team broke the world record of ImageNet training speed, which was covered by the technology media like NSF, ScienceDaily, Science NewsLine, and i-programmer. In 2019, his team broke the world record of BERT training speed. The BERT training techniques have been used by many tech giants like Google, Microsoft, and NVIDIA. Yang You’s LARS and LAMB optimizers are available in industry benchmark MLPerf. He is a winner of IPDPS 2015 Best Paper Award (0.8%), ICPP 2018 Best Paper Award (0.3%) and ACM/IEEE George Michael HPC Fellowship. Yang You is a Siebel Scholar and a winner of Lotfi A. Zadeh Prize. Yang You was nominated by UC Berkeley for ACM Doctoral Dissertation Award (2 out of 81 Berkeley EECS PhD students graduated in 2020). He also made Forbes 30 Under 30 Asia list (2021) and won IEEE CS TCHPC Early Career Researchers Award for Excellence in High Performance Computing. For more information, please check his lab’s homepage at https://ai.comp.nus.edu.sg/
Virtual | Tutorial | ML for Biotech and Pharma | Intermediate
Patient’s testimonials provide valuable insights to define and characterize Quality of life and patient’s perspective of their disease, whether it is about their symptoms, or their advice to others, or what makes them suffer. Extracting information from these stories is therefore relevant to improving care…more details
Mélissa is a data scientist engineer. Over the past 7 years working at Quinten Health in the healthcare sector as a Project Manager in data science, she has participated in the development of several decision support solutions powered by AI, e.g. for rare disease diagnosis, disease progression modelling and endotyping, or evaluation of population heterogeneity. She leds multiple studies of real-world data using advanced analytics methods to characterize phenotypes and disease progression for neurological conditions, cardiovascular diseases, and oncology, for pharma companies, research organizations and care providers. Currently, she is managing at Quinten Health the development of AI-powered solutions to support R&D decisions using RW data for our client.
In-person | Half-Day Training | Machine Learning | Intermediate-Advanced
Gradient Boosting remains the most effective method for classification and regression problems on tabular data. This session is Part One of two. We will start with the fundamentals of how boosting works and best practices for model building and hyper-parameter tuning. Next, we will discuss how to interpret the model, understanding what features are important generally and for a specific prediction. Finally, we will discuss how to exploit categorical structure, when the different values of a categorical variable have a known relationship to one another.…more details
Brian Lucena is Principal at Numeristical, where he advises companies of all sizes on how to apply modern machine learning techniques to solve real-world problems with data. He is the creator of three Python packages: StructureBoost, ML-Insights, and SplineCalib. In previous roles he has served as Principal Data Scientist at Clover Health, Senior VP of Analytics at PCCI, and Chief Mathematician at Guardian Analytics. He has taught at numerous institutions including UC-Berkeley, Brown, USF, and the Metis Data Science Bootcamp.
In-person | Workshop | Data Visualization & Data Analysis | Machine Learning | Deep Learning | NLP | Data Engineering | All Levels
In today’s data-driven world, static dashboards are no longer sufficient for the needs of data consumers and businesses. People need quick access to information and must be able to rapidly create and deploy data applications for both creating and consuming analytics, models, and more…more details
Mingo is a Senior Sales Engineer at Plotly. After graduating from Bowdoin College with a degree in computer science, he started working with organizations in the master data management and data science spaces. Throughout his career, Mingo has partnered with large financial institutions, life sciences organizations, retail companies, and government agencies to help them better understand their data and more effectively serve their customers. Mingo enjoys building relationships with people to understand their pain points and help them solve their most challenging business and technical problems.
In-person | Half-Day Training | Machine Learning Safety and Security | All Levels
This course outlines the typical fraud framework at an organization and where data science can play a role as well as lay out how to build an analytically advanced fraud system…more details
A Teaching Associate Professor in the Institute for Advanced Analytics, Dr. Aric LaBarr is passionate about helping people solve challenges using their data. There he helps design the innovative program to prepare a modern workforce to wisely communicate and handle a data-driven future at the nation’s first Master of Science in Analytics degree program. He teaches courses in predictive modeling, forecasting, simulation, financial analytics, and risk management. Previously, he was Director and Senior Scientist at Elder Research, where he mentored and led a team of data scientists and software engineers. As director of the Raleigh, NC office he worked closely with clients and partners to solve problems in the fields of banking, consumer product goods, healthcare, and government. Dr. LaBarr holds a B.S. in economics, as well as a B.S., M.S., and Ph.D. in statistics — all from NC State University.
In-person | Workshop | Machine Learning | Data Analytics | Beginner-Intermediate
HPCC Systems is a completely free, open source Big Data/Data Lake platform created by LexisNexis Risk Solutions and used by companies globally. The workshop attendee will be provided with interactive code examples and solutions on an actual cluster created for ODSC attendees…more details
Bob has worked with the HPCC Systems technology platform and the ECL programming language for over a decade and has been a technical trainer for over 30 years. He is the developer and designer of the HPCC Systems Online Training Courses and is the Senior Instructor for all classroom and remote based training.
In-person | Tutorial | NLP | Intermediate-Advanced
This tutorial is targeted at learners who have experience with neural network model and are interested in gaining a deeper understanding of how they work…more details
Jacob Andreas is the X Consortium Assistant Professor at MIT. His research aims to build intelligent systems that can communicate effectively using language and learn from human guidance. Jacob earned his Ph.D. from UC Berkeley, his M.Phil. from Cambridge (where he studied as a Churchill scholar) and his B.S. from Columbia. As a researcher at Microsoft Semantic Machines, he founded the language generation team and helped develop core pieces of the technology that powers conversational interaction in Microsoft Outlook. He has been the recipient of Samsung’s AI Researcher of the Year award, MIT’s Kolokotrones teaching award, and paper awards at NAACL and ICML.
In-person | Full-Day Training | Deep Learning | Machine Learning | Beginner-Intermediate
Obscure until recently, Deep Learning is ubiquitous today across data-driven applications as diverse as machine vision, natural language processing, generative A.I., and superhuman game-playing.
This training is an introduction to Deep Learning that brings high-level theory to life with interactive examples featuring PyTorch, TensorFlow 2, and Keras — all three of the principal Python libraries for Deep Learning. Essential theory will be covered in a manner that provides students with a complete intuitive understanding of Deep Learning’s underlying foundations…more details
Jon Krohn is Co-Founder and Chief Data Scientist at the machine learning company Nebula. He authored the book Deep Learning Illustrated, an instant #1 bestseller that was translated into seven languages. He is also the host of SuperDataScience, the data science industry’s most listened-to podcast. Jon is renowned for his compelling lectures, which he offers at leading universities and conferences, as well as via his award-winning YouTube channel. He holds a PhD from Oxford and has been publishing on machine learning in prominent academic journals since 2010.
In-person | Workshop | Deep Learning | All Levels
Deep learning is an area of machine learning that has become ubiquitous with artificial intelligence. PyTorch provides a comprehensive framework for the development of deep learning models. However, project requirements often extend beyond the model development process. SAS has a rich set of established and unique capabilities that support model development and deployment, including some new features that use the TorchScript language. In this workshop, we will demonstrate how to integrate PyTorch with SAS to leverage the benefits of both technologies. The workshop will focus on computer vision applications, but the framework can easily be extended to other deep learning tasks…more details
Robert teaches machine learning for SAS and specializes in neural networks. Before joining SAS, Robert worked under the Senior Vice Provost at North Carolina State University where he built models pertaining to student success, faculty development and resource management. Prior to working in academia, Robert was a member of the research and development group on the Workforce Optimization team at Travelers Insurance. His models at Travelers focused on forecasting and optimizing resources. Robert graduated with a master’s degree in Business Analytics and Project Management from the University of Connecticut and a master’s degree in Applied and Resource Economics from East Carolina University.
Ari Zitin holds bachelor’s degrees in both physics and mathematics from UNC-Chapel Hill. His research focused on collecting and analyzing low energy physics data to better understand the neutrino. Ari taught introductory and advanced physics and scientific programming courses at UC-Berkeley while working on a master’s in physics with a focus on nonlinear dynamics. While at SAS, Ari has worked to develop courses that teach how to use Python code to control SAS analytical procedures.
In-person | Workshop | Machine Learning | Intermediate-Advanced
In this workshop, I will explain the core principles of Scruff and the main programming concepts. I will then demonstrate how we used Scruff to create a tool for wildfire risk assessment and mitigation that includes climate models, historical fire data, and fire propagation simulators. Finally, we will work through a hands-on session of getting up and running with Scruff and implementing and running simple models…more details
Dr. Avi Pfeffer is Chief Scientist at Charles River Analytics. Dr. Pfeffer is a leading researcher on a variety of computational intelligence techniques including probabilistic reasoning, machine learning, and computational game theory. Dr. Pfeffer has developed numerous innovative probabilistic representation and reasoning frameworks, such as probabilistic programming, which enables the development of probabilistic models using the full power of programming languages, and statistical relational learning, which provides the ability to combine probabilistic and relational reasoning. He is the lead developer of Charles River Analytics’ Figaro™ probabilistic programming language. As an Associate Professor at Harvard, he developed IBAL, the first general-purpose probabilistic programming language. While at Harvard, he also produced systems for representing, reasoning about, and learning the beliefs, preferences, and decision making strategies of people in strategic situations. Prior to joining Harvard, he invented object-oriented Bayesian networks and probabilistic relational models, which form the foundation of the field of statistical relational learning. Dr. Pfeffer serves as Action Editor of the Journal of Machine Learning Research and served as Associate Editor of Artificial Intelligence Journal and as Program Chair of the Conference on Uncertainty in Artificial Intelligence. He has published many journal and conference articles and is the author of a text on probabilistic programming. Dr. Pfeffer received his Ph.D. in computer science from Stanford University and his B.A. in computer science from the University of California, Berkeley.
Ms. Sanja Cvijic is a Senior Scientist at Charles River Analytics who leads our Probabilistic AI Representations and Reasoning Systems group and has pioneered the application of Scruff to real-world problems in ISR and maintenance. Dr. Cvijic’s research activities are centered around applications of probabilistic programming to condition monitoring, fault detection and prediction systems. She developed a prognostic health management tool for assessing health and status of power transformers in Scruff. She also developed a probabilistic tool in Scruff for improved space domain awareness for assessing risks to satellites in space. Previously, she worked as a Director of Software and a Consultant in power industry at New Electricity Transmission Software Solutions. She earned her Doctoral degree in Electrical and Computer Engineering, Power Systems, at Carnegie Mellon University in 2013. She earned her Bachelors in Electrical and Computer Engineering at the University of Belgrade, Serbia, in 2008.
In-person | Workshop | Machine Learning | Intermediate
Overall, the session aims to equip attendees with the necessary skills and knowledge to apply feature engineering techniques effectively in their Data Science projects. In this session, we will cover the fundamental principles and practical applications of feature engineering as they pertain to time series forecasting. During our workshop, we will explore topics like formulating a feature hypothesis, implementing SQL and Python code to construct features, and integrating your features into a pre-existing machine-learning pipeline. We will provide a hands-on experience with tools and technologies necessary to aid in the design, organization, and automation of feature learning…more details
Dr. Joshua Gordon is a data scientist specializing in statistical modeling, machine learning, and deep learning technologies. While earning his Ph.D. in Statistics from the University of California Los Angeles, he published six articles focused on geospatial time series algorithms for earthquake prediction as well as residual analysis for spatial temporal models.
Joshua has over 10 years of experience working as a Data Scientist in the insurance and manufacturing industries prior to joining dotData as a Senior Data Scientist. In dotData, he leads customer-facing data science projects and is responsible for customers’ success with dotData technologies.
In-person | Tutorial | Machine Learning | Intermediate
This tutorial is targeted towards Data Scientists and machine learning engineers who work on machine learning and deep learning models. Given a task, one is interested in finding a well-performing model to solve that task. Very often, this would involve tweaking the model either by changing the hyper parameters or modifying its architecture in order to find a better performing model. In the past, this was always done manually. But, with the advent of Automated Machine Learning, we can now leave that to the machines. In this tutorial, we will provide an overview of Hyperparameter Optimization (HPO) and Neural Architecture Search (NAS)…more details
Tejaswini Pedapati works at IBM Research. Her research is focused on interpretability and automating deep learning. To that end, she was involved in developing tools and algorithms to provide these capabilities for IBM products. She has a masters’ degree from Columbia University.
In-person | Workshop | Machine Learning | Deep Learning | NLP | Data Engineering | Intermediate
This workshop will cover the basics of Apache Arrow and Apache Parquet, how to load data to/from pyarrow arrays, csv and parquet files, and how to use pyarrow to quickly perform analytic operations such as filtering, aggregation, joining and sorting. In addition, you will also experience the benefits of the open Arrow ecosystem and see how Arrow allows fast and efficient interoperability with pandas, pol.rs, DataFusion, DuckDB and other technologies that support the Arrow memory format…more details
Andrew Lamb is the chair of the Apache Arrow Program Management Committee (PMC) and a Staff Software Engineer at InfluxData. He works on InfluxDB IOx, a time series database engine written in Rust, that heavily uses the Apache Arrow ecosystem. He actively contributes to many open source software projects including the Apache Arrow Rust implementation and the Apache Arrow DataFusion query engine.
In-person | Half-Day Training | Machine Learning | Deep Learning | Intermediate
Scikit-learn is a Python machine-learning library used by data science practitioners from many disciplines. We will learn about Pandas interoperability, categorical data, parameter tuning, and model evaluation. For Pandas interoperability, the ColumnTransformer applies data transformations on different columns from a Pandas DataFrame…more details
Thomas J. Fan is a Staff Software Engineer at Quansight Labs and is a maintainer for scikit-learn, an open-source machine learning library for Python. Previously, Thomas worked at Columbia University to improve interoperability between scikit-learn and AutoML systems. He is a maintainer for skorch, a neural network library that wraps PyTorch. Thomas has a Masters in Mathematics from NYU and a Masters in Physics from Stony Brook University.
Virtual | Tutorial | Responsible AI | Machine Learning | All Levels
Data Cards are transparency artifacts that provide structured summaries of ML datasets with explanations of processes and rationale that shape the data. They also describe how the data may be used to train or evaluate ML models. In practice, two critical factors determine the success of a transparency artifact: (1) the ability to identify the information decision-makers use and (2) the establishment of processes and guidance needed to acquire that information. To initiate practice-oriented foundations in transparency that support responsible AI development in cross-functional groups and organizations, we created the Data Cards Playbook — an open-sourced, self-service, comprehensive toolkit consisting of participatory activities, frameworks, and guidance designed to address specific challenges faced by teams, product areas and companies when setting up an AI dataset transparency effort…more details
Andrew Zaldivar is a member of the Responsible AI & Human-Centered Technology organization in Google Research. His role is to advocate for the responsible development and use of AI by disseminating and democratizing research findings from his organization. Andrew works with researchers and designers that are examining and shaping the socio-technical processes underpinning AI technologies through participatory, culturally-inclusive, and intersectional equity-oriented approaches. Before joining Google Research, Andrew was a Senior Strategist in Google’s Trust and Safety team, protecting the integrity of some of Google’s key products by utilizing machine learning to scale, optimize, and automate abuse fighting efforts. Andrew also holds a doctorate in cognitive neuroscience from the University of California, Irvine and was an Insight Data Science fellow.
Mahima Pushkarna is a design lead at the People + AI Research Initiative at Google. She brings design thinking and human-centered design into Human-AI Research. Her work explores advanced technologies, including generative AI, and draws from a mix of human-centered, participatory, and speculative design practices to bridge the gap between upstream developer practices and their impact on end user experiences and society. Mahima has designed tools and frameworks for explainability and interpretability that are widely used across industries and academia. She believes design can be a powerful tool for understanding and addressing the needs of people impacted by technology. Mahima is also interested in exploring the intersection of design, technology, and society, and is always looking for new ways to use design to make the world a better place. Mahima holds a masters degree in Information Design and Data Visualization from Northeastern University, Boston, MA. She has published in leading academic journals and conferences, including IEEE Vis, FAccT, and workshops at NeuRIPS. Prior to Google, Mahima worked as a product designer at Innovation by Design, a global think-tank, consulted at MIT’s Design Lab, and designed visualization tools at Ion Interactive. This bio was written with assistance from a language-driven model.
Virtual | Tutorial | Data Visualization & Data Analysis | All Levels
This tutorial will discuss ways to level-up your quick plots for use in published papers, automated reporting, and any other scenario where crisp and/or custom visualizations are called for…more details
Melanie Veale, Ph.D. is a recovering Astrophysicist, currently working as a Data Solutions Architect at Anomalo. Her Ph.D. research on galaxy dynamics introduced her to statistical and computational python, as well as other languages and tools like C++, Fortran, IDL, R, bash, SLURM, and others. She has also dabbled in AWS infrastructure, Kubernetes, Docker, Spark, Ray, Dask and more as a Field Engineer and Field Data Scientist at Domino Data Lab, helping analytics and machine learning teams modernize their collaboration and deployment workflows. Nowadays she is a troubleshooting enthusiast anywhere on the Data, Analytics, and MLOps tech stacks, and enjoys melding her passions for crisp technical communication, good visualizations, and first-principles thinking into helping organizations get the most out of their data.
Virtual | Tutorial | NLP | Machine Learning | Deep Learning | Intermediate
In this tutorial, we show an approach on how to create a custom vocabulary that can be further used for any NLP tasks…more details
Swagata is a Data Professional with over 6 years experience in Healthcare, Retail and Platform Integration industry. She is an avid blogger and writes about state of the art developments in the AI space. She is particularly interested in Natural Language Processing, and focuses on researching how to make NLP models work in practical setting. In her spare time, she loves to play her guitar, sip masala chai and find new spots for doing Yoga. Connect with her here – https://www.linkedin.com/in/swagata-ashwani/
In-person | Workshop | Data Engineering and Big Data | Machine Learning | Deep Learning | All Levels
This workshop is based on a precision agriculture scenario and demo featuring drones, 5G slices, Edge computing, Computer vision, and Operations Research. All of it orchestrated and developed on OpenShift and OpenShift Data Science…more details
Guillaume Moutier is a Sr. Principal Data Engineering Architect at Red Hat, focusing his work on data services, AI/ML workloads and data science platforms. Former Project Manager, Architect and CTO for large organizations, he is constantly looking for and promoting new and innovative solutions, always with a focus on usability and business alignment brought by 20 years of IT architecture and management experience. When he’s not tinkering with IT, electronics or other high tech toys, Guillaume plays music (guitar, bass, drums, keyboards), is a video games enthusiast, and a reading addict.
In-person | Workshop | Machine Learning Safety and Security | Machine Learning | Deep Learning | Data Engineering | Intermediate
The workshop will present an overview of the VM’s operations. Sample illustrations of AI for cybersecurity will be demonstrated, including detecting vulnerable code on GitHub repositories and emerging threats from the Dark Web for proactive cyber threat intelligence capabilities…more details
Dr. Sagar Samtani is an Assistant Professor and Grant Thornton Scholar in the Department of Operations and Decision Technologies at Indiana University. Dr. Samtani graduated with his Ph.D. from the AI Lab from University of Arizona. Dr. Samtani’s research interests are in AI for Cybersecurity, developing deep learning approaches for cyber threat intelligence, vulnerability assessment, open-source software, AI risk management, and Dark Web analytics. He has received funding from NSF’s SaTC, CICI, and SFS programs and has published over 40 peer-reviewed articles in leading information systems, machine learning, and cybersecurity venues. He is deeply involved with industry, serving on the Board of Directors for the DEFCON AI Village and Executive Advisory Council for the CompTIA ISAO.
In-person | Workshop | Machine Learning | Deep Learning | All Levels
During this workshop, you will build complex time series forecasting and anomaly detection models from the ground up, perform feature engineering and selection, assess the accuracy, and utilize the ModelZoo browser and root cause analysis functionalities to investigate the outcomes…more details
Philip Wauters is Customer Success Manager and Value engineer at Tangent Works working on practical applications of time series machine learning at customers from various industries such as Siemens, BASF, Borealis and Volkswagen. With a commercial background and experience with data engineering, analysis and data science his goal is to find and extract the business value in the enormous amounts of time-series data that exists at companies today.
In-person | Tutorial | ML for Biotech and Pharma | Machine Learning | Intermediate-Advanced
In this tutorial we will review the fundamentals of standardized data sources in federated health data networks and describe the data standards that enable open-source software development. We will demonstrate how to use a suite of patient level prediction tools to develop data-driven prediction models using standardized observational health data…more details
Frank DeFalco is the Director of Epidemiology Analytics at Janssen Research and Development where he architects software solutions and data platforms for the analysis and application of observational data sources. He is currently the leader and Benevolent Dictator of the OHDSI open source architecture working group. Frank is a presenter and panelist at OHDSI symposiums and has served as faculty for OHDSI symposium tutorials classes on architecture and common data model vocabulary. In addition to leading the OHDSI Architecture working group Frank initiated development of a standardized platform for observational analytics known as ATLAS. He is an active contributor to the open source software repositories developed and released by OHDSI including ATLAS, WebAPI, Achilles, Circe, Arachne, Visualizations, Hermes, Helios and others. Frank’s areas of expertise include computation epidemiology, large scale data platforms, software development and architecture, data visualization and informatics. Prior to joining Janssen Research and Development, Frank held the position of Senior Principal and Director of Collaboration and Analytics at British Telecom where he was a strategic advisor for multiple Fortune 100 companies across sectors including Consumer Products, Telecommunications and Pharmaceuticals. Frank received his undergraduate degrees in Computer Science and Psychology at Rutgers University.”
Jenna Reps is a Director at Janssen Research and Development where she is focusing on developing novel solutions to personalize risk prediction. Jenna’s areas of expertise include applying machine learning and data mining techniques to develop solutions for various healthcare problems. She is currently working within the patient level prediction OHDSI workgroup with the aim of developing open source and user friendly software for developing risk models using data sets in the OMOP Common Data Model format. Prior to joining Janssen Research and Development, Jenna was a Senior Research Fellow at the University of Nottingham where she developed supervised learning techniques to signal adverse drug reactions using UK primary care data and acted as a data consultant to other researchers within the University. Jenna received her BSc in Mathematics and MSc in Mathematical Biology at the University of Bath and her PhD in Computer Science at the University of Nottingham.
In-person | Workshop | Deep Learning | NLP | Machine Learning | Beginner-Intermediate
Recommendation describes suggesting, or recommending, items tailored to a particular user. As generative AI creates an explosion of digital content, personalization will be more important than ever! Whether the application is sneaker designs, blog posts, or even pre-trained machine learning model weights, most recommendation tasks have a similar underlying structure. We need some way to represent items and users, typically as vectors, as well as a way to index them for fast computation. We also need to design intuitive APIs that interface the recommendation system to application developers. Weaviate is an open-source vector search database that has many unique search and database features…more details
Connor Shorten is a Research Scientist at Weaviate, an Open-Source Vector Search Database. Connor has had a role in the development of Ref2Vec, Hybrid Search, Generative Search, Weaviate’s Pipe API, and Re-Ranking. Connor has also hosted 34 episodes of the Weaviate podcast featuring guests from OpenAI, Cohere, You.com, MosaicML, Jina AI, Deepset, Neural Magic and many others! Connor also co-hosts Weaviate meetups in Boston and New York City! Prior to Weaviate, Connor has earned a Ph.D. in Computer Science from Florida Atlantic University. Connor’s Ph.D. was primarily focusing on Data Augmentation in Deep Learning and Applications of Deep Learning for COVID-19. Connor’s publication “A survey on image data augmentation in deep learning” has achieved over 5,000 citations.
In-person | Tutorial | Deep Learning | All Levels
Deepfake photos and videos are already impacting many industries and sectors of society, in both positive and negative ways. In this session I’ll weave between the social context of deepfakes (how they’ve been used and what impact they’ve had) and the technical side of them (how they’re made, and some approaches to detecting them). This is the multifaceted story of deepfakes…more details
Noah Giansiracusa (PhD in math from Brown University) is a tenured associate professor of mathematics and data science at Bentley University, a business school near Boston. His research interests range from algebraic geometry to machine learning to empirical legal studies. After publishing the book How Algorithms Create and Prevent Fake News in July 2021, Noah has gotten more involved in public writing and policy discussions concerning data-driven algorithms and their role in society. He’s written op-eds for Barron’s, Boston Globe, Wired, Slate, and Fast Company and is currently working on a second book, Robin Hood Math: How to Fight Back When the World Treats You Like a Number, with a Foreword by Nobel Prize-winning economist Paul Romer.
In-person | Workshop | Machine Learning | Deep Learning | Data Engineering | NLP | Beginner
In this workshop, we will cover how to build machine learning web applications using the Gradio (www.gradio.dev) library…more details
Freddy Boulton started his career as a data scientist for Nielsen where he built predictive models of television viewing behavior to make television ratings more accurate. This gave him a first hand-view of one of the biggest challenges faced by industry data scientists – being able to easily communicate and share machine learning models with stakeholders. He is currently solving that problem by working on Gradio, an open-source python library that lets data scientists create fully interactive demos of machine learning models with just a few lines of code.
In-Person | AiX Keynote | Deep Learning | Machine Learning | All Levels
In this session, we will first arrive at a definition of Edge and its different environments. This will enable us to discuss what hardware (compute) is available for data science (model training and execution) usage in these environments. Lastly, we will examine interesting data science acceleration alternatives in the areas of data augmentation and data curation strategies, containerized models and applications…more details
Audrey is a Sr. Principal Software Engineer in the Red Hat Cloud Services – Red Hat OpenShift Data Science team focusing on helping customers with managed services, AI/ML workloads and next-generation platforms. She holds a degree in Computer Information Systems and has been working in the IT Industry for over 20 years in full stack development to data science roles. Audrey is passionate about Data Science and in particular the current opportunities with AIML at the Edge and Open Source technologies.
In-Person | Keynote | Machine Learning | All Levels
Large Language Models (LLMs) like ChatGPT have taken the world by storm with their ability to answer questions, write essays and even compose lyrics. These tools have profound implications for industries like financial services, retail, and healthcare. However, most organizations have yet to take advantage of LLMs. Why? The amount of compute, data, and knowledge required to build a proprietary model is daunting for many, yet the alternative is reliance on LLMs that are only accessible behind API paywalls and compromising your data privacy. In this keynote, we will explore why training, deploying, and owning your own LLM is critical (and at times even imperative). We will discuss how you can train and deploy your own models, while protecting your data and your business IP. Spoiler alert: in contrast to common wisdom, ownership of your own LLM is within reach for most organizations, and provides major benefits in increased security, flexibility, and accuracy…more details
Hagay Lupesko is the VP of Engineering at MosaicML, where he focuses on making generative AI training and inference efficient, fast, and accessible. Prior to MosaicML, Hagay held AI engineering leadership roles at Meta, AWS, and GE Healthcare. He shipped products across various domains: from 3D medical imaging, through global-scale web systems, and up to deep learning systems that power apps and services used by billions of people worldwide.
Jay is a VP of our Artificial Intelligence and Machine Learning organization at Oracle Cloud. He completed a degree in neuroscience and started his career in technology at Oracle, maintaining an idea that these two paths would converge.
In-person | Keynote | Responsible Ai | Machine Learning Safety and Security | All Levels
In this talk, I will describe our open-source platform MC2 (multi-party confidential computing) which enables data owners to encrypt their data and the data scientists to run analytics or machine learning on the encrypted data without having access to the data. MC2 is based on years of research at UC Berkeley and on publications at top tier security and privacy conferences…more details
Raluca Ada Popa is the Robert E. and Beverly A. Brooks associate professor of computer science at UC Berkeley working in computer security, systems, and applied cryptography. She is a co-founder and co-director of the RISELab and SkyLab at UC Berkeley, as well as a co-founder of Opaque Systems and PreVeil, two cybersecurity companies. Raluca has received her PhD in computer science as well as her Masters and two BS degrees, in computer science and in mathematics, from MIT. She is the recipient of the 2021 ACM Grace Murray Hopper Award, a Sloan Foundation Fellowship award, Jay Lepreau Best Paper Award at OSDI 2021, Distinguished Paper Award at IEEE Euro S&P 2022, Jim and Donna Gray Excellence in Undergraduate Teaching Award, NSF Career Award, Technology Review 35 Innovators under 35, Microsoft Faculty Fellowship, and a George M. Sprowls Award for best MIT CS doctoral thesis.
Virtual | Keynote | Machine Learning | All Levels
AI is red hot, but in practice many projects still fail. This talk will cover some of the key things you need to know to succeed, including:
– What current AI is and is not good for
– The difference between a demo and a product
– Pitfalls to avoid
– Organizing AI teams…more details
Pedro Domingos is a professor emeritus of computer science and engineering at the University of Washington and the author of The Master Algorithm. He is a winner of the SIGKDD Innovation Award and the IJCAI John McCarthy Award, two of the highest honors in data science and AI. He is a Fellow of the AAAS and AAAI, and has received an NSF CAREER Award, a Sloan Fellowship, a Fulbright Scholarship, an IBM Faculty Award, several best paper awards, and other distinctions. Pedro received an undergraduate degree (1988) and M.S. in Electrical Engineering and Computer Science (1992) from IST, in Lisbon, and an M.S. (1994) and Ph.D. (1997) in Information and Computer Science from the University of California at Irvine. He is the author or co-author of over 200 technical publications in machine learning, data mining, and other areas. He is a member of the editorial board of the Machine Learning journal, co-founder of the International Machine Learning Society, and past associate editor of JAIR. He was program co-chair of KDD-2003 and SRL-2009, and has served on the program committees of AAAI, ICML, IJCAI, KDD, NIPS, SIGMOD, UAI, WWW, and others. I’ve written for the Wall Street Journal, Spectator, Scientific American, Wired, and others. He helped start the fields of statistical relational AI, data stream mining, adversarial learning, machine learning for information integration, and influence maximization in social networks.
In-person | AiX Track Keynote | Responsible AI | All Levels
This session fleshes out the scope and function of AI ethics principles. It introduces the Box—a tool designed by AI Ethics Lab and further developed and deployed by the Institute for Experiential AI at Northeastern University—to put these principles into action as well as presenting a pathway to develop other tools on the main framework of the Box. The session also situates the Box and AI ethics principles into the broader framework of Responsible AI practice for an end-to-end integration of AI ethics into AI innovation…more details
Cansu joined the Institute of Experiential AI as the ethics lead and a research associate professor. She also has an affiliation with the Department of Philosophy and Religion and the Ethics Institute in the College of Social Sciences and Humanities. She has a doctorate in philosophy specializing in applied ethics.
Cansu is the founder of AI Ethics Lab, one of the first initiatives focusing exclusively on advising practitioners and conducting multidisciplinary research on the ethics of artificial intelligence. She remains the director of the AI Ethics Lab, where she leads teams of computer scientists, philosophers, legal scholars, and other experts in research, the development of toolkits, and consulting.
Cansu developed the Puzzle-solving in Ethics (PiE) Model, a dynamic and collaborative model for integrating ethics into the AI innovation cycle that organizations have implemented through consulting work with the AI Ethics Lab. She brings the PiE Model to EAI along with her industry experience and her background in philosophy.
Cansu serves as an ethics expert in various ethics, advisory, and editorial boards. She is a founding editor for the international peer-reviewed journal AI & Ethics (Springer Nature), serves as an ethics expert for EU-funded research projects focusing on the ethics of AI, robotics, human enhancement, and law enforcement AI technologies, and chairs the Institute of Electrical and Electronics Engineers (IEEE) AI Experts Network Criteria Committee.
In-person | Talk | Data Engineering & Big Data | MLOps | Beginner-Intermediate
Houseplants can be hard – in many cases, over- and under-watering can have the same symptoms. Take away the guesswork involved in caring for your houseplants while also gaining valuable experience in building a practical, event-driven pipeline in your own home! This talk explores the process of building a houseplant monitoring and alerting system using a Raspberry Pi and Apache Kafka…more details
Danica Fine is a Senior Developer Advocate at Confluent where she helps others get the most out of Kafka and their event-driven pipelines. In her previous role as a software engineer on a streaming infrastructure team, she predominantly worked on Kafka Streams- and Kafka Connect-based projects to support computing financial market data at scale. She can be found on Twitter, tweeting about tech, plants, and baking @TheDanicaFine.
In-person | Business Talk | ML for Biotech and Pharma | Intermediate-Advanced
This session will introduce attendees to the benefits and challenges of using machine learning for biotech and pharmaceutical applications. We will discuss various machine learning techniques and how they can be used to drive innovation in drug discovery, disease diagnosis, genomics, and personalized medicine. We will also explore the need for large datasets and the complexity of evaluating model performance. Attendees of this session will gain an understanding of the potential of machine learning to revolutionize the way biotechnology and pharmaceutical companies develop new therapeutics…more details
Tomasz Adamusiak MD Ph.D. is a Chief Scientist in the Clinical Insights & Innovation Cell at MITRE. He leads a multi-disciplinary group driving high-impact contributions to private and public sectors in Clinical and Genomic Data Science. Before MITRE, Tomasz was the Head of Data Science in the Pfizer Innovation Research (PfIRe) Lab. His team was responsible for developing novel digital endpoints, designing decentralized approaches for clinical trials, and applying AI/machine learning methods to generate novel insights from clinical data. Tomasz served in leadership and advisory roles in the American Medical Informatics Association, the SNOMED International, and the Epic Research Data Network.
In-person | Talk | NLP | Data Visualization & Data Analysis | Beginner-Intermediate
Finding common topics discussed in a set of text responses is often performed using techniques that learn how often words occur together in a set of responses. In real-world cases, the number of words in each response may be small, for example, finding which common topics are discussed in social media comments that mention a particular brand. This case poses a problem for these traditional methods of topic modeling: if two words with similar meanings rarely appear together in the dataset, the model will not be able to learn that they represent a common topic. Here, using a pre-trained large language model (LLM) can help. Because LLMs are trained on a much larger dataset, they contain richer information about when words typically appear together in the wild, beyond a limited dataset…more details
Matt Bezdek is a Senior Data Scientist at Elder Research. In his work, he empowers commercial clients to make better business decisions, with expertise in machine learning, forecast modeling, natural language processing, and visualization. He has a PhD in Cognitive Psychology from Stony Brook University and has conducted neuroimaging research at Georgia Tech and Washington University in St. Louis.
In-person | Talk | Data Visualization & Data Analysis | Machine Learning | Intermediate
In this talk, we’ll break down some of the key concepts to consider when communicating data, a short list of DOs & DON’Ts, and survey a variety of techniques that current practitioners use to effectively communicate data. This talk will be heavily example-driven – communication is both an art and science and as such, sometimes the best way to improve is to observe lots of great examples…more details
Matt is Director of Data Science with over a decade of experience solving complex business problems with data, modeling and simulation. Over the past year in his tenure at project44, Matt has been scaling the data science team from a few disparate efforts to a full department of 30 team members around the globe. The data science team at project44 uses the billions of shipments that are tracked through project44’s platform to extract insights that help customers made data-driven decisions: everything from “estimated time of delivery” to “impact of the latest disruptions”. Project44’s data science team uses state-of-the-art Machine Learning techniques to capture the dynamic trends and patterns of today’s supply chain. Despite the pandemic and global nature of Matt’s team – the data scientists at project44 routinely hold “virtual whiteboarding sessions” where they brainstorm, trade ideas about statistical techniques, and also discuss their latest Netflix favorites.
In-person | Keynote | NLP | Machine Learning | All Levels
In this keynote, Ted Kwartler will discuss the connections between brain regions responsible for various tasks such as executive functioning, language processing, and prediction-making. Drawing on his extensive experience as a Field CTO at DataRobot and instructor at the Harvard University Extension School, Kwartler will present a novel approach for flexible, forward-thinking GPT workflows called predict-GPT…more details
Ted Kwartler is the Field CTO at DataRobot. Ted sets product strategy for explainable and ethical uses of data technology. Ted brings unique insights and experience utilizing data, business acumen and ethics to his current and previous positions at Liberty Mutual Insurance and Amazon. In addition to having 4 DataCamp courses, he teaches graduate courses at the Harvard Extension School and is the author of “Text Mining in Practice with R.” Ted is an advisor to the US Government Bureau of Economic Affairs, sitting on a Congressionally mandated committee called the “Advisory Committee for Data for Evidence Building” advocating for data-driven policies.
In-person | Talk | ML for Biotech and Pharma | Machine Learning |Deep Learning | All Levels
I will discuss some of the ways Moderna has embedded AI-fueled technology into our business processes. We will explore one solution for Regulatory Operations in detail that uses natural language processing to accelerate our communications with health authority organizations…more details
Rebecca Vislay-Wade is a Principal Data Scientist at Moderna, where she leads a team of scientists developing AI applications for clinical operations, regulatory science, and pharmacovigilance. Prior to Moderna, she worked as Senior Research Data Scientist at Highmark Health. Rebecca holds a PhD in biochemistry from Harvard University and did postdoctoral work in neuroscience at the NIH and Children’s National Medical Center in Washington, DC. She currently lives in the Boston area with her family.
Virtual | Keynote | Machine Learning | All Tracks | All Levels
Modern medicine has given us effective tools to treat some of the most significant and burdensome diseases. At the same time, it is becoming consistently more challenging and more expensive to develop new therapeutics. A key factor in this trend is that we simply don’t understand the underlying biology of disease, and which interventions might meaningfully modulate clinical outcomes and in which patients. To achieve this goal, we are bringing together large amounts of high content data, taken both from humans and from human-derived cellular systems generated in our own lab…more details
In-person | Business Talk | ML Biotech and Pharma | All Levels
This presentation is aimed at technical leaders in biotech organizations who are ready to take on the challenge of making their data teams, their bench scientists and everyone in between work more effectively with data and digital tools. Maybe you entered biotech from a tech background. Or maybe you became a data expert starting from a background in biology or chemistry. Either way, if you know what an organization that uses data effectively looks like, the principles in this presentation will help you build that within your own organization…more details
Jesse Johnson is Vice President of Data Science and Data Engineering at Dewpoint Therapeutics, a drug development Biotech startup founded in 2019 around a scientific field called biomolecular condensates. In this role, Jesse’s diverse set of experiences from academic math departments, engineering teams at Google, and data science teams at large, medium and small life science companies provide a unique perspective on the ways that data and wet lab teams communicate differently, or sometimes don’t communicate at all.
Virtual | Talk | Machine Learning | ML for Biotech & Pharma | Intermediate
The talk will also cover the qualitative and quantitative measures that can be used to assess the quality of synthetic data, including data fidelity and privacy, in the context of clinical trial data…more details
Afrah Shafquat is a Senior Data Scientist II at Medidata, a Dassault Systemès company where she leads synthetic data solutions in clinical trials. At Medidata, her work focuses on innovative solutions to generate synthetic data, synthetic data evaluation (fidelity and privacy metrics), and new use cases for synthetic data. She has a Ph.D. in Computational Biology from Cornell University and an S.B. in Biological Engineering from Massachusetts Institute of Technology.
Virtual | Talk | Responsible AI | All Tracks | All Levels
In this talk we will discuss Responsible AI tools best practices you could apply in your machine learning lifecycle and share state-of-the-art open source tools you can incorporate to implement Responsible AI in practice…more details
Minsoo is a Senior Product Manager at Microsoft Azure Machine Learning designing and building out Responsible AI tools for data scientists. She’s worked with OSS tools such as InterpretML, Fairlearn, Responsible AI Toolbox and contributed to the UX of the Responsible AI dashboard now released in Azure Machine Learning. She has bachelor’s degrees in Applied Mathematics and Painting from Brown University and Rhode Island School of Design (RISD). Coming from an interdisciplinary background with experience in building machine learning models and products, analyzing data, and designing UX, she is always finding work at the intersection of AI/ML, design, and social sciences to empower data and ML practitioners to work ethically and responsibly end-to-end.
Mehrnoosh Sameki is a principal PM manager at Microsoft, where she leads emerging Responsible AI technology and tools and for the Azure Machine Learning platform. She has cofounded Error Analysis, Fairlearn and Responsible AI Toolbox and has been a contributor to the InterpretML offering. She earned her PhD degree in computer science at Boston University, where she currently serves as an adjunct assistant professor, offering courses in responsible AI. Previously, she was a data scientist in the retail space, incorporating data science and machine learning to enhance customers’ personalized shopping experiences.
Virtual | Talk | NLP | Beginner-Intermediate
The method of semantic search is especially helpful for multilingual and cross-lingual search: Previously, you had to spend a lot of time to tune lexical search systems for each language individually to work e.g. with synonyms, spelling variations, spelling mistakes etc. Now, with semantic search this is extremely simplified: Within minutes, you get a system that works amazingly well on 100+ languages…more details
Nils Reimers is an NLP / Deep Learning researcher with extensive experience on representing text in dense vector spaces and how to use them for various applications. During his research career, he created sentence-transformers that were the foundation for many today’s semantic search applications.
In 2022, Nils joined Cohere.com to lead the team on smarter semantic search technologies and how to connect LLMs to enterprise data. Here, his teams develop new foundation models that can understand and reason over complex data.
Virtual | Career Talk | Big Data Analytics | Cross Industry | All Levels
Organizations with a strong data culture can synthesize information effectively, and glean insights at scale, giving them an incredible edge over their competition. Some of the challenges that organizations face in their efforts to build a data-driven culture are around data silos, data hygiene, data transparency and data literacy…more details
A multi-faceted leader with in-depth experience in technology, marketing, finance, and operations, Vatsala strives to connect the dots between strategy and execution. She works at Stanford University’s Graduate School of Business as the Managing Director of Technology and Finance for Executive Education. As an ICF-credentialed leadership coach, she works with people–especially women executives, entrepreneurs, and youth–to uncover possibilities within themselves that even they are surprised by. She is an award-winning speaker and panel moderator, and has most recently spoken at the Women in Technology Conference, Argyle Forum, Dreamforce, and Stanford’s IT Unconference.
Virtual | Talk | Machine Learning | All Levels
In this talk, we’ll cover why deep learning needs human expertise to succeed and why blending human intelligence and technology to build scalable models for production is the best approach We’ll also explore where humans can (and should) be used in the model development process and, once deployed, how you can make sure that they are labeling the right things. Finally, we’ll discuss how automation and HITL should coexist in your machine-learning operations…more details
Originally from Cambridge, Matt now helps clients move to a data centric ML approach having worked with clients across autonomous vehicles, green energy and fintech whilst providing meaningful work in the developing world. Away from work Matt has a passion for photography, traveling and unusual cars. In fact his passion for unusual cars bought him to import a Nissan Stagea from Japan to the UK.
In-person | Talk
OKRs, Precision, KPIs, and F1 score — are more related than you’d think. As an ML Engineer and business leader, Liran talks about the frustration that data scientists and business stakeholders experience in turning ML projects into great, revenue-driving ML products…more details
Liran Hason is the Co-Founder and CEO of Aporia, a full-stack ML observability platform used by Fortune 500 companies and data science teams across the world to ensure responsible AI. Prior to founding Aporia, Liran was an ML Architect at Adallom (acquired by Microsoft), and later an investor at Vertex Ventures. Liran created Aporia after seeing first-hand the effects of AI without guardrails. In 2022, Forbes named Aporia as the “Next Billion-Dollar Companies”.
In-person | Talk | Machine Learning | All Levels
This presentation examines the value of the information contained in two networks – sell side analyst coverage networks and corporate board networks. Sell-side analyst coverage networks have two principal use cases i) identify which stocks are likely to outperform or underperform their peers and ii) measure the strength of economic relationships between companies. The analysis will then explore the role board networks play in the ESG outcomes of corporations…more details
Temilade (“Temi”) Oyeniyi, CFA is Vice President at S&P Global Market Intelligence’s Quantamental Research Group, which is responsible for building global equity strategies for institutional investors.
In-person | Talk | Deep Learning | Machine Learning | MLOps | Intermediate-Advanced
As neural networks increasingly make critical decisions in high-stakes settings, monitoring and explaining their behavior in an understandable and trustworthy manner has become a necessity. One commonly used type of explainer is post hoc feature attribution, which is a family of different methods of giving to each feature in a model’s input a score corresponding to the feature’s influence on the model’s output. A major limitation of this family of explainers in practice is that they can disagree on which features are more important than others. Our contribution in this paper is a method of training models with this disagreement problem in mind…more details
Max Cembalest is a researcher at Arthur focused on simplifying and explaining machine learning models. Previously, he received an M.S. in Data Science from Harvard University, where he concentrated on interpretability and graph-based models. He is particularly excited about recent advances in applying abstract algebra, topology, and category theory to neural network design.
In-person | Talk | Deep Learning | NLP | Machine Learning | MLOps and Data Engineering | Intermediate
In this presentation, Amber Roberts, Machine Learning Engineer at Arize AI, will present findings from research on ways to measure vector/embedding drift for image and language models. With lessons learned from testing different approaches (including Euclidean and Cosine distance) across billions of streams and use cases, Roberts will dive into how to detect whether two unstructured language datasets are different — and, if so, how to understand that difference using techniques such as UMAP…more details
Amber Roberts is a ML Growth Lead at Arize AI, a ML observability company built for maintaining models in production. Previously, Amber was a product manager of AI at Splunk and the Head of Artificial Intelligence at Insight Data Science. A Carnegie Fellow, Amber has an MS in Astrophysics from the Universidad de Chile.
In-person | Track Keynote | MLOps | Machine Learning | All Levels
The biggest challenges for developers of AI applications very often consist in building & delivering software to be used as a decision-making tool by operational staff. We will present how these challenges have been addressed using 2 successful projects: a cash flow prediction application (for one of Europe’s largest retailers) and a sales prediction app for a Quick Restaurant service. A novel Python Application builder, Taipy played an essential role in the success of these applications. We will highlight the core concepts and benefits of using such a framework in the context of real industrial AI applications…more details
Florian Jacta is a specialist of Taipy, a low-code open-source Python package enabling any Python developers to easily develop a production-ready AI application. Package pre-sales and after-sales functions. He is data Scientist for Groupe Les Mousquetaires (Intermarche) and ATOS. He developed several Predictive Models as part of strategic AI projects. Also, Florian got his master’s degree in Applied Mathematics from INSA, Major in Data Science and Mathematical Optimization.
Albert has skills in machine learning and big data to solve (financial) optimization problems. He developed projects of different skill levels for Taipy’s tutorial videos. He got his degree from McGill University – Bachelor of Science. Major in Computer Science & Statistics. Minor in Finance.
In-person | Talk | Machine Learning | MLOps | Data Engineering & Big Data | Beginner-Intermediate
In this short talk, we’ll cover how to easily run SQL from Jupyter Notebooks through popular open-source tools (JupySQL). The main focus will be a real-life industry use case (predicting customer churn) where we need to connect to the data source (we’ll use an example data set, but we can easily use a database, data warehouse, or a data lake), and query the data. Once our data is ready, we will run exploratory data analysis, which includes plotting the data and gaining insights about it…more details
Ido Michael, a seasoned data engineering and science professional, co-founded Ploomber & JupySQL with the mission of empowering data scientists to build faster and more efficient solutions. Prior to this, he led data engineering and science teams at Amazon Web Services (AWS), where he played an instrumental role in building hundreds of data pipelines during various customer engagements, working closely with his team.
A proud alumnus of Columbia University, Ido moved to New York to pursue his Master’s degree in Computer Engineering. It was during his time at Columbia that he identified the challenges in working with multiple data sources and Jupyter notebooks for reliable model development. This realization inspired him to concentrate on building Ploomber, a platform designed to address these issues and streamline the data science workflow.
In-person | Talk | Data Engineering & Big Data | All Levels
In this talk we will explore how the adoption of an event based data architecture can enable an organization’s sustainable transition to Data Mesh. This will include an overview of event-based architecture, architectural patterns for event-based data systems, and organizational considerations…more details
Elliott is an expert in data engineering, data warehousing, information management, and technology innovation with a passion for helping transform data into powerful information. He has more than a decade of experience implementing cutting-edge, data-driven applications. He has a passion for helping organizations understand the true potential in their data by working as a leader, architect, and hands-on contributor.
Elliott has built nearly a dozen cloud-native data platforms on AWS, ranging from data warehouses and data lakes, to real-time activation platforms in companies ranging from small startups to large enterprises.
In-person | Talk | Deep Learning | Beginner
In this talk, I will present examples of collision bias and show how it can be caused by a biased sampling process or induced by inappropriate statistical controls; and I will introduce causal diagrams as a tool for representing causal hypotheses and diagnosing collision bias…more details
Allen Downey is a Staff Scientist at DrivenData and professor emeritus at Olin College. He is the author of several books related to computer science and data science, including Think Python, Think Stats, Think Bayes, and Think Complexity. His blog, Probably Overthinking It, features articles about Bayesian statistics. He received his Ph.D. in Computer Science from U.C. Berkeley, and M.S. and B.S. degrees from MIT.
In-person | Track Keynote | Machine Learning | Deep Learning | All Levels
The past decade has seen rapid development of Artificial Intelligence (AI) and Machine Learning (ML) across different industries and for a multitude of successful use cases. However, one key challenge many businesses face for larger-scale adoption of AI and ML is that their data is often not ready for AI/ML. Automated feature engineering is a technology that aims to address the fundamental challenges of data readiness for AI. In this talk, we will review automated feature engineering technology and discuss how data scientists can benefit from this technology to transform your data and enable AI applications…more details
Aaron is currently the Vice President of Data Science and Solutions at dotData. As a data science practitioner with 14 years of research and industrial experience, he has held various leadership positions in spearheading new product development in the fields of data science and business intelligence. At dotData, Aaron leads the data science team in working directly with clients and solving their most challenging problems.
Prior to joining dotData, he was a Data Science Principle Manager with Accenture Digital, responsible for architecting data science solutions and delivering business values for the tech industry on the West Coast. He was instrumental in the strategic expansion of Accenture Digital’s footprint in the data science market in North America.
Aaron received his Ph.D. degree in Applied Physics from Northwestern University.
In-person | Career Talk | Beginner
Want to land your dream job in data? Learn what makes a Data resume stand out, how a portfolio project is a job hunting cheat code when you avoid these 6 mistakes, why cold email is a networking super-power, and how to craft a winning personal story for the behavioral interview. These tips led Nick Singh, best-selling author of Ace the Data Science interview, to work at Facebook & Google, and helped 200+ of his coaching clients land top jobs in tech…more details
Nick Singh is an Ex-Facebook & Google Engineer turned best-selling author of Ace the Data Science Interview, and founder of SQL Interview Platform DataLemur.com. His career advice on LinkedIn has earned him 100,000 followers, and he’s successfully career coached 578 people to land their dream job in data!
In-person | Track Keynote | MLOPs | All Tracks | All Levels
We’ll discuss how to manage and monitor the application pipelines, at scale. We’ll show how to use GPUs to maximize application performance while protecting your investment in AI infrastructure. We’ll share how to make the whole process efficient, effective and collaborative. The session will include a live demo of transfer learning and deployment of a transformer model (using Hugging Face and MLRun)…more details
Yaron Haviv is a serial entrepreneur who has been applying his deep technological experience in AI, cloud, data and networking to leading startups and enterprises since the late 1990s. As the Co-Founder and CTO of Iguazio, Yaron drives the strategy for the company’s MLOps platform and led the shift towards the production-first approach to data science and catering to real-time AI use cases. He also initiated and built Nuclio, a leading open source serverless framework with over 4,000 Github stars and MLRun, a cutting-edge open source MLOps orchestration framework. Prior to co-founding Iguazio in 2014, Yaron was the Vice President of Datacenter Solutions at Mellanox (now NVIDIA – NASDAQ: NVDA), where he led technology innovation, software development and solution integrations. He also served as the CTO and Vice President of R&D at Voltaire, a high-performance computing, IO and networking company which floated on the NYSE in 2007 and was later acquired by Mellanox (NASDAQ:MLNX). Yaron is an active contributor to the CNCF Working Group and was one of the foundation’s first members. He sits on the Data Science Committee of the AI Infrastructure Alliance (AIIA), of which Iguazio is a founding member. He is co-authoring a book on Implementing MLOps in the Enterprise for O’Reilly. Yaron presents at major industry events worldwide and writes tech content for leading publications including TheNewStack, Hackernoon, DZone, Towards Data Science and more.
Virtual | Business Talk | Data Engineering & Big Data | All Levels
Based on the new book Winning The Room, this session will provide concrete strategies and practical tips to clarify, simplify, and refine data-driven presentations in a way that maximizes comprehensibility without sacrificing accuracy. It will also utilize instructive and memorable visuals that illustrate how you can drive your data storytelling points home and help your audience understand and retain your message…more details
Bill Franks is the Director of the Center for Statistics and Analytical Research at Kennesaw State University. He is also Chief Analytics Officer for The International Institute For Analytics (IIA) and serves on several corporate advisory boards. Franks is also the author of the books Winning The Room, Taming The Big Data Tidal Wave, The Analytics Revolution, and 97 Things About Ethics Everyone In Data Science Should Know. He is a sought after speaker and frequent blogger who has over the years been ranked a top global big data influencer, a top global artificial intelligence and big data influencer, a top AI influencer, and was an inaugural inductee into the Analytics Hall of Fame. His work, including several years as Chief Analytics Officer for Teradata (NYSE: TDC), has spanned clients in a variety of industries for companies ranging in size from Fortune 100 companies to small non-profit organizations. You can learn more at http://www.bill-franks.com.
In-person | Business Talk | NLP | Beginner
In this talk, Christina explores the impact of ChatGPT on the industry, as well as the moral dilemma that it poses. Should companies embrace or reject this new technology? How should companies respond, given that job candidates and employees are already using these tools? It’s been reported that ChatGPT and other bots like Lensa steal data from writers and artists without their permission. How can we give credit to the creators, while appreciating the useful tools that AI has created?…more details
Christina Qi is the CEO of Databento, an on-demand market data platform. She formerly founded Domeyard LP, a hedge fund focused on high frequency trading (HFT) that traded up to $7.1 billion USD per day. Failing to earn a job offer after a Wall Street internship, Christina started Domeyard from her dorm room with $1000 in savings, about 9 years ago. Her fund was a tiny minnow amongst the tigers of the hedge fund world, but after Michael Lewis’s Flash Boys came out in 2014 and HFT firms hid from the spotlight, Domeyard accidentally found itself in the center of the ring. Over the next decade, her company’s story was featured on the front page of Forbes and Nikkei, and quoted in the Wall Street Journal, Bloomberg, CNN, NBC, and the Financial Times as a result of the controversy and fascination with HFT. By a series of accidents, Christina became a voice in her industry, contributing to the World Economic Forum’s research on AI in finance, guest lecturing at dozens of universities, and teaching Domeyard’s case study at Harvard Business School. She is grateful to be able to open up about her mistakes, and to help people turn failures into opportunities.
No amount of therapy has quashed Christina’s impostor syndrome, but she will always be proud of her non-profit volunteer work. Christina was elected as a Member of the MIT Corporation, MIT’s Board of Trustees. She is Co-Chair of the Board of Invest in Girls, bringing financial literacy education to underserved populations across the US. Christina also sits on the Board of Directors of The Financial Executives Alliance (FEA) Hedge Fund Group, drives entrepreneurship efforts at the MIT Sloan Boston Alumni Association (MIT SBAA), and served on the U.S. Non-Profit Boards Committee of 100 Women in Finance. Although “X Under X” lists are a gimmick, she’ll admit that Forbes 30 Under 30 made a positive impact on her life by giving her a community – friends who dragged her out of bed during the lowest days of her life. Christina holds a Bachelor of Science in Management Science from MIT and is a CAIA Charterholder.
Virtual | Talk | MLOps | Intermediate
This talk will not discuss a specific MLOps tool, but instead present guidelines and mental models for how to think about the problems you and your team are facing, and how to select the best tools for the task. We will review a few example problems, analyze them, and suggest Open Source solutions for them. We will provide a mental framework that will help tackle future problems you might face and extract the concrete value each tool provides…more details
Dean has a background combining physics and computer science. He’s worked on quantum optics and communication, computer vision, software development, and design. He’s currently CEO at DagsHub, where he builds products that enable data scientists to work together and get their models to production, using popular open-source tools.
He’s also the host of the MLOps Podcast, where he speaks with industry experts about ML in production.
Virtual | Talk | NLP | Beginner-Intermediate
In this talk, we will present how we build a multilingual multi-label topic classification model that supports zero-shot, to match reviews with unseen users’ search topics. We will show how fine-tuning BERT-like models on the tourism domain with a small dataset can outperform other pre-trained models and will share experiments results of different architectures. Furthermore, we will present how we collected the data using an active learning approach and AWS Sagemaker ground truth tool, and we will show a short demo of the model with explainability using Streamlit…more details
Moran is a machine learning manager at booking.com, researching and developing computer vision and NLP models for the tourism domain. Moran is a Ph.D candidate in information systems engineering at Ben Gurion University, researching NLP aspects in temporal graphs. Previously worked as a Data Science Team Leader at Diagnostic Robotics, building ML solutions for the medical domain and NLP algorithms to extract clinical entities from medical visit summaries.
Virtual | Talk | NLP | Intermediate
In this talk, we will present how to achieve multi-domain QA systems through the collaboration of multiple models. We will further explore how relevance feedback can be used for few-shot document re-ranking, and will finish by introducing UKP-SQuARE, the first online platform that provides an ecosystem for QA research. With UKP-SQuARE, users can deploy, run, analyze, and compare models with a standardized interface from multiple perspectives, such as general behavior, explainability, adversarial attacks, and behavioral tests, enabling a holistic analysis…more details
Iryna Gurevych (PhD 2003, U. Duisburg-Essen, Germany) is professor of Computer Science and director of the Ubiquitous Knowledge Processing (UKP) Lab at the Technical University (TU) of Darmstadt in Germany. Her main research interests are in machine learning for large-scale language understanding and text semantics. Iryna’s work has received numerous awards. Examples are the ACL fellow award 2020 and the first Hessian LOEWE Distinguished Chair award (2,5 mil. Euro) in 2021. Iryna is co-director of the NLP program within ELLIS, a European network of excellence in machine learning. She is currently the president of the Association of Computational Linguistics. In 2022, she received an ERC Advanced Grant to support her vision for the next big step in NLP “InterText – Modeling Text as a Living Object in a Cross-Document Context”.
Haritz Puerto is a Ph.D. candidate in Machine Learning & Natural Language Processing at UKP Lab in TU Darmstadt, supervised by Prof. Iryna Gurevych. His main research interests are reasoning for Question Answering and Graph Neural Networks. Previously, he worked at the Coleridge Initiative, where he co-organized the Kaggle Competition Show US the Data. He got his master’s degree from the School of Computing at KAIST, where he was a research assistant at IR&NLP Lab and was advised by Prof. Sung-Hyon Myaeng.
Virtual | Talk | NLP | Machine Learning | All Levels
In this talk, I will cover the hallucination problem and how Truth Checker models can detect and deliver great and accurate experiences making it possible to use LLMs without human in the loop…more details
Chandra Khatri is the Chief Scientist and Head of AI at Got It AI, wherein, his team is transforming AI space by leveraging state-of-the-art technologies to deliver the world’s first fully autonomous Conversational AI system. Under his leadership, Got It AI is democratizing Conversational AI and related ecosystems through automation. Prior to Got-It, Chandra was leading various AI applied and research groups at Uber, Amazon Alexa and eBay.
At Uber, he was leading Conversational AI, Multi-modal AI, and Recommendation Systems. At Amazon he was the founding member of the Alexa Prize Competition and Alexa AI, wherein he was leading the R&D and got the opportunity to significantly advance the field of Conversational AI, particularly Open-domain Dialog Systems, which is considered as the holy-grail of Conversational AI and is one of the open-ended problems in AI. And at eBay he was driving NLP, Deep Learning, and Recommendation Systems related applied research projects.
He graduated from Georgia Tech with a specialization in Deep Learning in 2015 and holds an undergraduate degree from BITS Pilani, India. His current areas of research include Artificial and General Intelligence, Democratization of AI, Reinforcement Learning, Language and Multi-modal Understanding, and Introducing Common Sense within Artificial Agents.
Virtual | Career Talk | MLOPs | All Levels
Join Anna in this session to learn about how open source and machine learning coexist and what it means to be part of the machine learning open source community. In addition, she’ll walk through how to navigate open source space using a popular MLOps project Kubeflow as an example, and share tips on how to set yourself up for success for your future contribution…more details
Anna Jung is a Senior ML Open Source Engineer at VMware, leading the open source team as part of the VMware AI Labs. She currently contributes to various upstream ML-related open source projects focusing on the project’s overall health, adoption, and innovation. She believes in the importance of giving back to the community and is passionate about increasing diversity in open source. When away from the keyboard, Anna is often at film festivals supporting independent filmmakers.
In-Person | Talk | Machine Learning | All Levels
In this session, we’ll discuss how graph embeddings build on and enrich graph data science workflows. We’ll review various embedding algorithms, as well as highlight real-world use cases where embeddings can help translate complex data patterns into tangible business value…more details
Katie is a Data Science Solution Architect at Neo4j. She completed her degree in Cognitive Neuroscience at Harvard University. Passionate about people and problem solving, she transitioned to focusing on helping people and businesses leverage data for impactful outcomes. As a customer-facing data scientist, she has had the opportunity to work with large and small organizations across a variety of industries. At Neo4j she helps teams up-level their data science practice with graph data science.
In-person | Talk | Data Engineering & Big Data | Beginner-Intermediate
In this talk, we highlight what it means for Apache Flink to be a general data processor that acts as a data integration hub. Looking under the hood, we demonstrate Flink’s SQL engine as a changelog processor that ships with an ecosystem tailored to processing CDC data and maintaining materialized views. We will discuss the semantics of different data sources and how to perform joins or stream enrichment between them. This talk illustrates how Flink can be used with systems such as Kafka (for upsert logging), Debezium, JDBC, and others…more details
Timo Walther is a Principal Software Engineer at Confluent and a long-time member of Apache Flink’s management committee. He studied Computer Science at TU Berlin and was part of the Database Group there – the origins of Apache Flink. He worked as a software engineer at DataArtisans and led SQL team at Ververica. He was a Co-Founder of Immerok which was acquired by Confluent in 2023. In Flink, he is working on various topics in the Table & SQL ecosystem to make stream processing accessible for everyone.
In-person | Talk | Data Engineering & Big Data | MLOps | Beginner-Intermediate
In this talk, we give an overview of the online model serving requirements at Lyft that drove us to build LyftLearn Serving. We showcase various techniques we used to tackle the aforementioned challenges to achieve a low latency, high throughput model serving system powering products of 40+ teams. We will also present design decisions we made for LyftLearn Serving for efficient versioning, deploying, testing, and monitoring ML models and describe tradeoffs that would help and inspire ML Ops practitioners while building similar systems…more details
Hakan is a staff software engineer in ML Platform team at Lyft. They build ML development, training and serving systems helping 40+ teams. Previously, Hakan was a staff engineer in Box. He helped build cloud content management applications focused on security and also scaled kubernetes clusters, service meshes in an on-premise infrastructure. He started his career at the hardware level, building ASICs and transitioned to distributed systems software in a startup experience. Hakan is passionate about wearing many hats, switching abstraction levels, operational excellence and mentorship, and loves challenges and solving problems that take the whole team to address.
Mihir Mathur is the lead Product Manager for Machine Learning at Lyft, where he works on building ML/AI tools that power Lyft’s automated intelligent decisions across realtime pricing, ETAs, fraud detection, safety classification etc. In the past Mihir has worked on building delightful products for millions of users at Quora, Houzz, and Thomson Reuters and spoken about his work at conferences such as MLOps World and ODSC. Mihir graduated magna cum laude from UCLA with a Bachelor’s and Master’s in Computer Science.
In-person | Talk | Responsible AI | Machine Learning Safety and Security | Intermediate
This session will demonstrate new practical tools that enable data scientists and companies to continuously transform their Machine Learning life cycles to make debugging models simpler for AI developers; business decision-makers to act faster with more confidence; and end-users gain trust in AI system…more details
Ruth Yakubu is a Principal Cloud Advocate at Microsoft. Ruth specializes in Java, Advanced Analytics, Data Platforms and Artificial Intelligence (AI).
In addition, she’s been a tech speaker at several conferences like Microsoft Ignite, O’reilly velocity, Devoxx UK, Grace Hopper Dublin, TechSummit, Websummit and numerous other developer conferences. Prior to Microsoft, She has also worked for great companies like UNISYS, ACCENTURE and DIRECTV over the years where she gained a lot of experience with software architectural design and programming. She’s awarded Dzone.com’s Most Valued Blogger.
In-person | Talk | Machine Learning | Intermediate
Location data is a powerful tool. The places people go reflect who they are and what they care about – especially during the holiday season, when shopping is at a high. During the holiday season, shoppers often deviate from their normal habits, shopping more frequently and engaging in larger purchases…more details
Ali Rossi is a Data Science Tech Lead at Foursquare, working closely with their first-party foot traffic panel to deliver insights against a broad range of client business questions. She is passionate about consumer behavioral data, with experience building consumer panels, researching normalization methodologies, and developing methods to derive actionable insights. Previously, she worked in product management at Foursquare, Amazon and Nielsen, mainly focused on building analytics products using consumer-sourced data. She studied chemistry and mathematics at the University of Connecticut and is currently pursuing a Master of Science in computer science at the Georgia Institute of Technology.
In-person | Talk | Machine Learning | Machine Learning Safety & Security | All Levels
In order for machine learning to continue to drive impact in new applications we will need to address this problem directly. We start with testing. ML is software, and good tests are an irreplaceable tool for building a resilient system. We will explore how to design end-to-end simulations to assess our models’ resilience…more details
As the Head of Machine Learning at Abnormal Security, Dan builds cybercrime detection algorithms to keep people and businesses safe. Before joining Abnormal Dan worked at Twitter: first as an ML researcher working on recommendation systems, and then as the head of web ads machine learning. Before Twitter Dan built smartphone sensor algorithms at TrueMotion and Computer Vision systems at the Serre Lab.
In-person | Talk | Machine Learning | Deep Learning | NLP | Data Engineering | All Levels
Feature engineering is vital to AI success, especially for tabular data. Yet feature engineering is little more than a footnote in most popular machine learning education courses. One of the challenges in teaching and practicing ML feature engineering is the lack of a systematic approach based on understanding of data semantics and database structure. As a result, feature lists are often extremely bloated, containing unexplainable features which are difficult to maintain and make sense of…more details
Sergey is a data scientist with a background in physics and neurobiology. FeatureByte is Sergey’s second startup. He was one of the first employees at DataRobot where he created and led a professional services group and helped the company grow into a unicorn. Sergey is widely known for being a Kaggle Grandmaster and holding the #1 rank on Kaggle in the past. Multiple times he was mentioned as one of the top data scientists by various publications. Sergey’s passion is in machine learning, predictive modeling and inventive feature engineering.
In-person | Talk | Data Engineering & Big Data | All Levels
We will explore the critical tablestakes and imperatives for effective MLOps, including considerations for model versioning, automated deployment pipelines, monitoring, and governance. We will discuss the challenges that organizations face in operationalizing machine learning models and the best practices that address these challenges. We will explore the options available on Oracle Cloud Infrastructure (OCI) that facilitate MLOps practices. Specifically, we will introduce OCI Data Science (OCI DS), a cloud-based platform that provides a comprehensive suite of tools and services for developing, training, and deploying machine learning models. OCI DS enables seamless collaboration between data scientists, engineers, and other stakeholders, while ensuring scalability, security, and cost-effectiveness…more details
Allen is a Principal Machine Learning Architect and AI Researcher working for Oracle Cloud Infrastructure.
In-person | Talk
The faster and more incremental the delivery of business value through AI, the higher the likelihood of successful adoption and implementation of AI within enterprises…more details
Shan is a data analytics, AI, and management consulting practitioner with over 20 years of experience in solving complex business problems for clients through innovative technology solutions. Shan joined Fujitsu in 2017 to grow and lead the data analytics business at Fujitsu North America. In his current role as the Head of AI offerings, Shan is responsible for shaping the AI go-to-market strategy, and product offerings, and for promoting AI adoption amongst Fujitsu’s clients. Shan has published thought leadership articles/whitepapers for the Fujitsu Global blog on technology and industry topics such as AI Enabled Trusted Society, AI and Advanced Analytics, Mobility Industry, and Smart Cities. Shan lives in Dallas, Texas, and is an avid NBA fan and motorcyclist.
In-person | Business Talk | All Levels
In this talk, I will present findings from a series of interviews with 20 data science hiring managers at leading organizations across industry (e.g. FAANG, finance, startups). We will discuss common patterns that emerged around both challenges and best practices, and make some actionable recommendations for data science teams looking to improve their hiring processes…more details
Isaac is a co-founder and Principal Data Scientist at DrivenData, Inc, where he leads client engagements and spearheads development of the data science competition platform. He holds a master’s in Computational Science and Engineering from Harvard’s School of Engineering and Applied Sciences and a BS in Operations Research from the U.S. Coast Guard Academy, and previously spent seven years as a Coast Guard officer serving in a variety of operational and quantitative roles.
Virtual | Talk | Data Visualization & Data Analysis | All Tracks | All Levels
The talk will overview the general influence of geometry on data and data science, starting with metric geometry and embeddings of text and image data. We’ll then dive into network geometry, survey data analytics, and the geometry of supervised learning algorithms for small data methods. We’ll end with some exciting geometry tools in quantum computing for network science and image analytics…more details
Colleen M. Farrelly is a lead data scientist whose expertise spans generative AI, topological data analysis, network science, and NLP, among others. She’s recently focused her research on the geometry of generative AI models and how this impacts their performance on tasks such as bias detection, and her volunteer work includes mentoring African machine learning students. She and Dr. Yae Gaba are the authors of The Shape of Data, an overview of machine learning from a geometric perspective.
Virtual | Talk | NLP
Abstract Coming Soon!
Dan Roth is the Eduardo D. Glandt Distinguished Professor at the Department of Computer and Information Science, University of Pennsylvania, a VP/Distinguished Scientist at Amazon AWS, and a Fellow of the AAAS, the ACM, AAAI, and the ACL.
In 2017 Roth was awarded the John McCarthy Award, the highest award the AI community gives to mid-career AI researchers. Roth was recognized “for major conceptual and theoretical advances in the modeling of natural language understanding, machine learning, and reasoning.”
Roth has published broadly in machine learning, natural language processing, knowledge representation and reasoning, and learning theory, and has developed advanced machine learning based tools for natural language applications that are being used widely. Until February 2017 Roth was the Editor-in-Chief of the Journal of Artificial Intelligence Research (JAIR). Roth has been involved in several startups; most recently he was a co-founder and chief scientist of NexLP, a startup that leverages the latest advances in Natural Language Processing (NLP), Cognitive Analytics, and Machine Learning in the legal and compliance domains. NexLP was acquired by Reveal in 2020. Prof. Roth received his B.A Summa cum laude in Mathematics from the Technion, Israel, and his Ph.D. in Computer Science from Harvard University in 1995.
Virtual | Track Keynote | Deep Learning | All Levels
This talk describes how to extend the GPT paradigm to learning to act by watching (training on) videos. We trained a neural network to play Minecraft by Video PreTraining (VPT) on a massive unlabeled video dataset of human Minecraft play, while using only a small amount of labeled contractor data. With fine-tuning, our model can learn to craft diamond tools, a task that usually takes proficient humans over 20 minutes (24,000 actions). Our model uses the native human interface of keypresses and mouse movements, making it quite general, and represents a step towards general computer-using agents. Ultimately this paradigm could help automate much of the work humans do on computers. …more details
Jeff Clune is an Associate Professor of Computer Science at the University of British Columbia and a Faculty Member at the Vector Institute and a Senior Research Advisor at DeepMind.
Previously, he was a Research Team Leader at OpenAI. Before that he was a Senior Research Manager and founding member of Uber AI Labs, which was formed after Uber acquired a startup our startup. Prior to Uber, he was the Loy and Edith Harris Associate Professor in Computer Science at the University of Wyoming.
He conducts research in three related areas of machine learning (and combinations thereof):
– Deep Learning: Improving our understanding of deep neural networks, harnessing them in novel applications, and advancing deep reinforcement learning
– Evolving Neural Networks: Investigating open questions in evolutionary biology regarding how intelligence evolved and harnessing those discoveries to improve our ability to evolve more complex, intelligent neural networks
– Robotics: Making robots more like animals in being adaptable and resilient
A good way to learn about Jeff’s research is by visiting the Google Scholar page, which lists all of his publications.
Virtual | Talk | Data Engineering | MLOps | Beginner
In this talk, I’ll introduce techniques and principles for getting started with data quality that are applicable across the vast majority of organizations and datasets. I’ll also show you how to implement those using Great Expectations OSS: a Python-based data quality platform that you can use across an extremely broad range of data and tech stacks…more details
Alex Sherstinsky is a staff machine learning and data products engineer on the team developing the core platform of Great Expectations, the leading open source data quality platform. Previously, Alex developed augmented intelligence systems that harness machine learning and gig work models to transform and scale customer service at Directly, Inc. He was a product and technical co-founder at GrowthHackers.com and Qualaroo, and a product/engineering executive at other venture capital-backed startups. Alex earned his Ph.D. in machine learning from MIT, with research conducted at the Media Lab. His scientific publications appear in refereed journals and conference proceedings; he holds 5 U.S. patents.
Virtual | Talk | Deep Learning | ML Safety and Security | Intermediate
In this talk, we will explore the various ways in which Machine Learning (ML) systems can be broken, leading to incorrect predictions, bias, or even security vulnerabilities. We will discuss ten common ways in which ML systems can fail, including data poisoning attacks, adversarial examples, concept drift, and model inversion attacks…more details
Bhakti is a Responsible AI Tech Lead at Google Research, where she develops fair, safe, and robust AI systems. She has spearheaded numerous projects at Google, including YouTube, Maps, Android, and Ads, making significant advancements to ensure that ML in these applications is fair, transparent, and safe for all. She is also a strong supporter of open-source technology and is the maintainer of several offerings in the TF Responsible AI toolkit, used globally by developers in the industry to make their ML workflows more responsible.
In-person | Talk | NLP | Machine Learning | ML Safety (AI Safety) & ML Security | MLOps | Data Engineering & Big Data | Data Visualization & Data Analysis | Beginner-Intermediate
We’ll talk about automatic data collection with privacy constraints and the infrastructure setup for data ingestion (Kafka), persistence (Delta Lake, Azure Databricks Lakehouse), and processing (Spark Batch, Spark SQL, Spark Streaming). We’ll walk through the lessons learned in bringing large volumes of data into a single platform for data analytics…more details
Zairah is a Data Scientist at you.com, the AI search engine, where she leverages her expertise in statistical and machine-learning techniques to build analytics and experimentation platforms. She recently spoke at NeurIPS 2022 and shared her expertise on data-driven decision-making in a privacy-focused AI-first startup. Previously, Zairah was a Data Scientist at IBM Research, researching Natural Language Processing (NLP) and AI Fairness topics. She has published research and holds patents in these domains. Zairah obtained her M.S. in Computer Science from the University of Pennsylvania, where she researched scikit-learn model performance. Her findings have since been used as guidelines for applying machine learning to supervised classification tasks. Zairah has published her work in top AI conferences such AAAI and has over 300 citations. Aside from work, Zairah enjoys adventure sports and poetry.
Virtual | Talk | NLP | Deep Learning | Beginner-Intermediate
We will look at how we solve this problem using transfer learning through Natural Language Processing and Computer vision to create a hierarchical classification Deep Neural Network to categorize products into a hierarchical tree taxonomy. We will dig deeper into modeling challenges and how we came up with specific architecture decisions…more details
Kshetrajna is a Staff Data Scientist at Shopify working in the Merchant Services Org. Over the last 10 years of his career he has built and productionalized many ML models in various domains including retail, ad-tech and healthcare. His interests are mainly applied ML and ML systems and enjoys solving complex problems to help use machine learning at scale. Outside of work, Kshetrajna loves to spend time with his dogs, play music on his guitar, and is an avid gamer.
Virtual | Talk | Intermediate
This talk will describe how high-level ideas from data-centric AI can be operationalized across a wide variety of datasets (image, text, tabular, etc), and introduce new algorithmic strategies to improve data that we have researched and published papers on with extensive benchmarks. I will conclude with a discussion of connections to AutoML, where the data-centric AI movement is headed next, and key obstacles that deserve more attention…more details
Jonas Mueller is Chief Scientist and Co-Founder at Cleanlab, a software company providing data-centric AI tools to efficiently improve ML datasets. Previously, he was a senior scientist at Amazon Web Services developing AutoML and Deep Learning algorithms which now power ML applications at hundreds of the world’s largest companies. In 2018, he completed his PhD in Machine Learning at MIT, also doing research in NLP, Statistics, and Computational Biology.
Jonas has published over 30 papers in top ML and Data Science venues (NeurIPS, ICML, ICLR, AAAI, JASA, Annals of Statistics, etc). This research has been featured in Wired, VentureBeat, Technology Review, World Economic Forum, and other media. He has also contributed open-source software, including the fastest-growing open-source libraries for AutoML (https://github.com/awslabs/autogluon) and Data-Centric AI (https://github.com/cleanlab/cleanlab).
Virtual | Talk | Machine Learning Safety and Security | Responsible AI | Intermediate
AI tools are ubiquitous, but most users treat it as a black box: a handy tool that suggests purchases, flags spam, or autocompletes text. While researchers have presented explanations for making AI less of a black box, a lack of metrics make it hard to optimize explicitly for interpretability. Thus, I propose two metrics for interpretability suitable for unsupervised and supervised AI methods…more details
Jordan is an associate professor in the University of Maryland Computer Science Department (tenure home), Institute of Advanced Computer Studies, iSchool, and Language Science Center. Previously, he was an assistant professor at Colorado’s Department of Computer Science (tenure granted in 2017). He was a graduate student at Princeton with David Blei.
His research focuses on making machine learning more useful, more interpretable, and able to learn and interact from humans. This helps users sift through decades of documents; discover when individuals lie, reframe, or change the topic in a conversation; or to compete against humans in games that are based in natural language.
In-person | Talk | Responsible AI | Beginner
For many AI applications, a prediction is not enough. End-users need to understand the “why” behind a prediction to make decisions and take next steps. Explainable AI techniques today can provide some insight into what your model has learned, but recent research highlights the need for interactivity with XAI tools. End-users need to interact and test “what if” scenarios in order to understand and build trust with an AI system. In this talk, I’ll discuss what human-factors research tells us about human decision making and how users build trust (or lose trust) in systems. I’ll also present interaction design techniques that can be applied to XAI services design…more details
Meg is currently the Lead UXR for Intrinsic.ai, where she focuses her work on making it easier for engineers to adopt and automate with industrial robotics. She is a “Xoogler”, and prior to Intrinsic worked on the Explainable AI services on Google Cloud. Meg has had a varied career working for start-ups and large corporations alike, and she has published on topics such as user research, information visualization, educational-technology design, voice user interface (VUI) design, explainable AI (XAI), and human-robot interaction (HRI). Meg is also a proud alumnus of Virginia Tech, where she received her Ph.D. in Human-Computer Interaction.
Virtual | Talk | Machine Learning | Beginner
The game of football is undergoing a significant shift towards the quantitative. Much of the progress made in the analytics space can be attributed to play-by-play data and charting data. However, recent years have given rise to tracking data, which has opened the door for innovation that was not possible before. In this talk I will describe how to gain an edge in player evaluation by building off of traditional charting data with state-of-the-art player tracking data, and foreshadow how such methods will revolutionize the sport of football in the future…more details
Eric Eager is the VP of Research and Development at SumerSports, a football analytics startup founded by Paul Tudor Jones and Jack Jones. Prior to joining Sumer, he held similar roles at Pro Football Focus, and is responsible for many of the insights that have grown the game of American football to this day. Eric holds a PhD in Mathematical Biology from the University of Nebraska, and has taught at Wharton, DataCamp and the University of Wisconsin – La Crosse, publishing over 25 academic papers during his career.
In-person | Talk
In this talk, Peter will answer all these questions. He will explain why the AI revolution must not only learn from the last 15 years of Open Source data science, but that it must build on those principles if we are to achieve a broad vision of human thriving in whatever world lies ahead…more details
Peter Wang is the CEO and co-founder of Anaconda, Inc. Prior to founding Anaconda (formerly Continuum Analytics), Peter spent 15 years in software design and development across a broad range of areas, including 3D graphics, geophysics, large data simulation and visualization, financial risk modeling, and medical imaging. As a creator of the PyData community and conferences, he devotes time and energy to growing the Python data science community and advocating for increasing data literacy around the world. Peter holds a BA in Physics from Cornell University.
In-person | Talk | Machine Learning | Beginner-Intermediate
In this talk, we will explore:
– The challenges we face when building content-based recommendation systems at Shopify.
– How we generated high-quality product embeddings using Universal Sentence Encoder (USE).
– Why we chose USE over other popular options such as BERT
– How we scaled our approach using Ray Actor Pools to generate recommendations for over 350M products.
– The impact of launching this new model to millions of merchants…more details
Madhav is a Senior Data Scientist at Shopify where he focuses on building/evaluating recommendation systems. His role includes prototyping potential solutions and scaling them for production. Prior to Shopify, Madhav was a data science consultant where he focused on NLP projects for pharmaceutical companies. He then transitioned to Disney to develop personalized movie recommendations which sparked his passion for recommendation systems. In his free time, Madhav hosts free Q&A sessions for aspiring data scientists who are looking to get into this space.
In-person | Talk | Machine Learning | Deep Learning | Big Data | ML Safety | All Levels
In this talk, we’ll discuss some strategies for incorporating human judgment into the prediction-detection loop in order to improve accuracy, produce more true positives and fewer false positives, and improve user satisfaction…more details
Andrew is the head of data science at Bigeye, a data observability company. Prior to joining Bigeye, Andrew built ML-powered tools for Citi and (as a consultant) a range of top consumer banks; he specialized in pricing and underwriting problems. In his free time, Andrew enjoys cooking, travel, and using his TVR Chimaera to escape New York.
In-person | Talk | Machine Learning | Data Engineering & Big Data | Data Visualization & Data Analysis | Beginner-Intermediate
In this talk, I will show the key concepts of a Bayesian approach to MMM, its implementation using Python, and practical tips…more details
Hajime is a data professional with five years of expertise in marketing, retail, and eCommerce, working across Japan and the United States.
As a Data Analyst at Procter and Gamble and MIKI HOUSE Americas, Hajime has led data-driven strategy formulation and implemented technology initiatives such as e-commerce expansion, advertising optimization, and the identification of growth opportunities.
As an organizer of PyData NYC, Hajime is dedicated to fostering a vibrant community centered around the exchange of knowledge on open-source technologies in New York. Additionally, Hajime lends his expertise as a contributing technical writer for Towards Data Science.
In-person | Talk | Data Engineering & Big Data | All Levels
We begin by exploring the foundational aspects of location data, including addresses, geo-addressing, and data enrichment, followed by examining the challenges data scientists face in managing and analyzing complex datasets in modern networks. We then delve into how location data can enhance analytics, offering richer context and more accurate insights for better decision-making. A telecom fiber planning example will illustrate the practical applications of location data in improving infrastructure development and reducing costs. The presentation highlights the versatility of location data by discussing its applications across various industries, such as retail, transportation, and environmental management. Businesses can optimize processes, boost efficiency, and better understand their target markets by leveraging location intelligence…more details
Dr. Mohammed Taboun is a Principal Data Scientist at Precisely, where he uses his experience in analytics, optimization, and machine learning to drive innovation and business growth. Over the past 15 years, Mohammed has consistently demonstrated exceptional expertise in his field applied to various industries including technology, oil and gas, energy and utilities and telecommunications. With a strong academic background, Mohammed holds a PhD in Mechanical Engineering, specializing in Intelligent Control Systems, as well as a Master of Applied Science (MASc) and a Bachelor of Applied Science (BASc) in Industrial Engineering, focusing on Operations Research.
In-person | Talk | Machine Learning | Deep Learning | Data Engineering | NLP | All Levels
In this talk we will present a new open-source project called Kangas that allows easy exploration and analysis of data when it is mixed with multimedia datatypes, such as images, video, and audio. Typically, Pandas DataFrame is the go-to tool for EDA (Exploratory Data Analysis)…more details
Dr. Blank is Professor Emeritus at Bryn Mawr College and Head of Research at Comet ML. Doug has 30 years of experience in Deep Learning and Robotics, was one of the founders of the area of Developmental Robotics, and is a contributor to the open source Jupyter Project, a core tool in Data Science. He currently lives in San Francisco, California, along with his family and animals.
In-person | Women Ignite | All Levels
In this presentation I will discuss ClinicalMind’s endeavor to leverage data science and analytics techniques to explore and understand the various voices that exist in the pharma industry…more details
In-person | Talk | Data Visualization & Data Analysis | Beginner-Intermediate
Using tools like QGIS, Python, and SQL attendees will learn how to build complex themes about the built infrastructure in urban environments, flooding in Pakistan, the Brazilian Rainforest, and post-disaster rebuilding on the island of Puerto Rico by highlighting individual stories noticing connections and the synthesis of larger concepts and emerging ideas…more details
Bonny is a geospatial analyst and self described human geographer and social anthropologist. Exploring geographic properties that capture complex interactions, dynamic shifts in ecosystem balance and how activities influence eco-geomorphic conceptual frameworks across a wide variety of environments are the topics of popular public talks and panel discussions.
The ability to apply advanced data analytics, including data engineering and geo-enrichment, to poverty, race, and gender discussions targets judgments about structural determinants, racial equity, and elements of intersectionality to illuminate the confluence of metrics contributing to poverty.
Bonny is the author of the books Python for Geospatial Data Analysis: Theory, Tools, and Practice for Location Intelligence (publisher, O’Reilly Media) and Geospatial Analysis with SQL: A hands on guide to performing geospatial analysis by unlocking the syntax of spatial SQL published by Packt Press. Current projects include a new book in progress with Locate Press, Geospatial Data Science & the Art of Storytelling.
In-person | Career Talk | All Levels
The specific purpose of this session is to help those who are newer to data science and those who currently work as individual contributors to use data culture as a mechanism that will support career growth. In the history of the field there have been few times better than now to enter or level up in a data career…more details
Dr. Adam Ross Nelson, is a career coach and a data science consultant. As a career coach he helps others enter and level up in data related professions. As a data science consultant he provides research, data science, machine learning, and data governance services. Previously, he was the inaugural data scientist at The Common Application which provides undergraduate college application platforms for institutions around the world. He holds a PhD from The University of Wisconsin – Madison in Educational Leadership & Policy Analysis. Adam is also formerly an attorney with a history of working in higher education, teaching all ages, and working as an educational administrator. Adam sees it as important for him to focus time, energy, and attention on projects that may promote access, equity, and integrity in the field of data science. This commitment means he strives to find ways for his work to challenge system oppression, injustice, and inequity.
In-person | Talk | Data Engineering & Big Data | Machine Learning | Intermediate
This talk covers the strategies and best practices around moving portions of workloads to distributed computing through the open-source Fugue project. The Fugue API has a suite of standalone functions compatible with Pandas, Spark, Dask, and Ray. Collectively, these functions allow users to scale any part of their pipeline when ready for full-scale production workloads on big data…more details
Han Wang is the tech lead of Lyft Machine Learning Platform, focusing on distributed computing solutions. Before joining Lyft, he worked at Microsoft, Hudson River Trading, Amazon and Quantlab. Han is the creator of the Fugue project, aiming at democratizing distributed computing and machine learning.
In-person | Talk | ML for Biotech and Pharma | Machine Learning | Intermediate
In this talk I will explain the challenges of developing computational models of nonlinear systems and examples of success stories using machine learning algorithms. There are lot of problems in biology and medicine that the practicing data scientists can address…more details
Joshy George is a bioinformatics researcher with a Ph.D. in Bioinformatics from the University of Melbourne, Australia, and a Master's in Computer Science from the Indian Institute of Science. With his background in data science and machine learning, Dr. George has co-authored over 100 peer- reviewed scientific articles, showcasing expertise in developing principled methods to solve complex biological problems. In his current role, he leads a team that is focused on building predictive models for cancer precision medicine and understanding the molecular mechanisms leading to diseases.
In-person | Business Talk | Cross-Industry | Beginner
Due to the tsunami of data available to organizations today, artificial intelligence (AI) and machine learning (ML) are increasingly important to businesses seeking competitive advantage through digital transformation. That’s why over 75% of enterprises prioritize AI and ML over other IT initiatives…more details
Tendü Yoğurtçu, Ph.D., is the Chief Technology Officer (CTO) at Precisely. In this role, she directs the company’s technology strategy and innovation, leading all product research, and development programs.
Prior to becoming Chief Technology Officer, Tendü served as General Manager of Big Data for Syncsort, the precursor to Precisely, leading the global software business for Data Integration, Hadoop, and Cloud. She previously held several engineering leadership roles at the company, directing the development of the Integrate family of products.
Tendü has over 25 years of software industry experience, with a focus on Big Data and Cloud technologies. She has also spent time in academics, working as a Computer Science Adjunct Faculty Member at Stevens Institute of Technology.
In 2019, Tendü was named CTO of the Year at the prestigious Women in IT Awards, and in 2018 was recognized as an Outstanding Executive in Technology by Advancing Women in Technology (AWT).
Tendü received her Ph.D. in Computer Science from Stevens Institute of Technology, NJ, a Master of Science in Industrial Engineering, and a B.S. in Computer Engineering from Bosphorus University in Istanbul.
In-person | Women Ignite | All Levels
In this talk, I will share our research agenda on human-AI collaboration and demonstrate how leveraging the strengths of both humans and AI can lead to better outcomes…more details
Tamilla Triantoro is an Associate Professor of Computer Information Systems at Quinnipiac University and a leader of the Masters Program in Business Analytics. She was previously an Academic Director of Data Analytics at the University of Connecticut. Dr. Triantoro is an author, speaker, researcher, and educator in the fields of artificial intelligence, data analytics, user experience with technology, and the future of work. She received her Ph.D. from the City University of New York where she researched online user behavior. Dr. Triantoro presents her research around the world, attempting to demystify the complexity of today’s digital world and to make it understandable and relevant to business professionals and the general audience.
Virtual | Talk | Machine Learning | All Levels
The field of AI is advancing at unprecedented speed in the past few years, due to the rise of large-scale, self-supervised pre-trained models (a.k.a. “foundation models”), such as GPT-3, GPT-4, ChatGPT, Chinchilla, LLaMA, CLIP, DALL-e, StableDiffusion and many others. Impressive few-shot generalization capabilities of such models on a very wide range of novel tasks appear to emerge primarily due to the drastic increase in the size of the models, training data and compute resources…more details
Irina Rish is an Associate Professor in the Computer Science and Operations Research Department at the Université de Montréal (UdeM) and a core faculty member of MILA – Quebec AI Institute. She holds Canada Excellence Research Chair (CERC) in Autonomous AI and a Canadian Institute for Advanced Research (CIFAR) Canada AI Chair. She received her MSc and PhD in AI from University of California, Irvine and MSc in Applied Mathematics from Moscow Gubkin Institute. Dr. Rish’s research focus is on machine learning, neural data analysis and neuroscience-inspired AI. Before joining UdeM and MILA in 2019, Irina was a research scientist at the IBM T.J. Watson Research Center, where she worked on various projects at the intersection of neuroscience and AI, and led the Neuro-AI challenge. She received multiple IBM awards, including IBM Eminence & Excellence Award and IBM Outstanding Innovation Award in 2018, IBM Outstanding Technical Achievement Award in 2017, and IBM Research Accomplishment Award in 2009. Dr. Rish holds 64 patents, has published over 80 research papers in peer-reviewed conferences and journals, several book chapters, three edited books, and a monograph on Sparse Modeling.
In-person | Women Ignite | All Levels
It is often thought that leadership comes after mastering technical skills, but you can easily gain leadership and learning opportunities through creating a grassroots community. Within 1 year of my career, I was able to contribute to data strategy and data literacy at a Fortune 100 company of 36K employees by creating a 600+ grassroots community…more details
Virtual | Talk | Responsible AI | Intermediate-Advanced
This session is intended for data science practitioners and leaders who need to know what they can & should do today to build AI systems that work safety & correctly in the real world…more details
David Talby is the Chief Technology Officer at John Snow Labs, helping companies apply artificial intelligence to solve real-world problems in healthcare and life science. David is the creator of Spark NLP – the world’s most widely used natural language processing library in the enterprise.
He has extensive experience building and running web-scale software platforms and teams – in startups, for Microsoft’s Bing in the US and Europe, and to scale Amazon’s financial systems in Seattle and the UK.
David holds a Ph.D. in Computer Science and Master’s degrees in both Computer Science and Business Administration. He was named USA CTO of the Year by the Global 100 Awards and GameChangers Awards in 2022.
Virtual | Talk | Deep Learning | Intermediate-Advanced
Embeddings are numerical representations of concepts converted to number sequences, which make it easy for computers to understand the relationships between those concepts. This talk will focus on introducing embeddings that is useful to perform natural language and code tasks like semantic search, clustering, topic modeling, and classification. Our embeddings available in the OpenAI API and outperform top models in 3 standard benchmarks, including a 20% relative improvement in code search…more details
Arvind Neelakantan is a Research Lead and Manager at OpenAI working on deep learning research for real-world applications. He got his PhD from UMass Amherst where he was also a Google PhD Fellow. His work has received best paper awards at NeurIPS and at Automated Knowledge Base Construction workshop.
Virtual | Talk | Data Visualization & Data Analysis | Intermediate
Notebooks with an elegant combination of prose, visualization, and code open the door for greater understanding and introspection of data. The ability to explain the data through words and illustrative charts provides a foundation for anyone to understand and critically reason about data and its application. While other tools like an IDE or BI tool provide artifacts, the notebooks compose information in a way that is greater than the sum of individual parts and enable more people to experiment and take action. The future is notebooks…more details
Elijah Meeks is a co-founder and Chief Innovation Officer of Noteable, a startup focused on evolving how we analyze and communicate data. He is known for his pioneering work in the digital humanities while at Stanford, where he was the technical lead for acclaimed works like ORBIS and Kindred Britain. He was Netflix’s first Senior Data Visualization Engineer, and while at Netflix and Apple worked to develop the charting library Semiotic as well as bring cutting-edge data visualization techniques to analytical applications for stakeholders across the organization including A/B testing, conversation flows, algorithms, membership, people analytics, content, image testing and social media. He is a prolific writer, speaker and leader in the field of data visualization and the co-founder and first executive director of the Data Visualization Society.
Carol Willing is the VP of Engineering at Noteable, a three-time Python Steering Council member, a Python Core Developer, PSF Fellow, and a Project Jupyter core contributor. In 2019, she was awarded the Frank Willison Award for technical and community contributions to Python. As part of the Jupyter core team, Carol was awarded the 2017 ACM Software System Award for Project Jupyter’s lasting influence. She’s also a leader in open science and open-source governance serving on Quansight Labs Advisory Board and the CZI Open Science Advisory Board. She’s driven to make open science accessible through open tools and learning materials.
Virtual | Talk | Machine Learning | Deep Leaning | Intermediate
This talk we will include details of state-of-the-art recommendation system in practice, M6 and Gemini…more details
Dr. Hongxia Yang, PhD from Duke University, led the team to develop AI open sourced platforms and systems such as AliGraph, M6, Luoxi. Dr. Yang has published nearly 100 top conference and journal papers, and held more than 20 patents. She has been awarded the highest prize of the 2019 World Artificial Intelligence Conference, Super AI Leader (SAIL Award), the second prize of the 2020 National Science and Technology Progress Award (China’s Top tech award), the first prize of Science and Technology Progress of the Chinese Institute of Electronics in 2021, and the Forbes China Top 50 Women in Science and Technology in 2022. She used to work as the Senior Staff Data Scientist and Director in Alibaba Group, Principal Data Scientist at Yahoo! Inc and Research Staff Member at IBM T.J. Watson Research Center, joint adjunct professor at Zhejiang University Shanghai Advanced Research Institute respectively.
Virtual | Talk | Responsible AI | Machine Learning | Deep Learning | NLP | Advanced
In this talk, I will provide a succinct overview of our research on trustworthy machine learning, including robustness, privacy, generalization, and their underlying interconnections…more details
Virtual | Talk | Deep Learning | NLP | Intermediate
In this talk, I’ll go over the approaches to overcome challenges in large scale GNN modeling and present methods to scale and productionize GNN based models. I’ll briefly go over our recent publications in WWW and ICLR which provide research directions in this area…more details
In-person | Talk | MLOps | Data Engineering | Intermediate
Abstract Coming Soon!
Pete joined Elementl as head of engineering in early 2022, and took over the reins as CEO in November of that year. Pete was previously co-founder and CEO of Smyte, an anti-abuse provider that was acquired by Twitter. Prior to his Pete led Instagram’s web team, built Instagram’s business analytics products, and helped to open source Facebook’s React.js.
In-person | Lightning Talks | MLOps | Beginner-Intermediate
Zoox is developing a ground-up robotaxi designed for AI to drive and riders to enjoy. Without a human driver, the Zoox robotaxi relies on an array of sensors and sophisticated ML algorithms to perceive and navigate the world around it. To enable the Zoox robotaxi to continually learn, it is critical to have a well-oiled MLOps ecosystem that can facilitate both model-centric and data-centric ML development. In this talk, RJ He, Director of Perception at Zoox, will share how Zoox is building the ML machinery that will enable Zoox to deploy and scale robotaxis across multiple geofences and operational domains…more details
RJ He is Zoox’s Director of Perception, where he is responsible for Zoox robotaxi’s ability to see and understand the world around them. He also leads the Zoox Boston office to assemble a world-class team of AI engineers. RJ was previously co-founder & CEO of Strio AI, an agriculture robotics startup, as well as VP Eng at Optimus Ride, an AV startup. RJ has also commanded a mechanized infantry unit, and holds an MIT PhD in autonomous systems.
In-person | Lightning Talks | NLP | Machine Learning | Big Data | All Levels
As organizations continue to adopt new technologies and digital transformations, the complexity of IT operations has grown significantly. One of the biggest challenges facing IT teams today is dealing with the vast amounts of unstructured data generated by IT systems, such as log files and error messages. Traditional approaches to analyzing and understanding this data can be time-consuming, error-prone, and inefficient…more details
Roozbeh Davari is a highly experienced data scientist and technology leader with a diverse background in research, development, software engineering, and product management. He has a track record of developing and deploying innovative solutions that leverage AI and data science to solve complex business problems. He holds a Ph.D. in Astrophysics from the University of California, Riverside and Carnegie Observatories. Currently, he serves as the Director of Data Science at Aisera, a software company that provides an AI-driven service management platform. Prior to joining Aisera, he worked as a Data Scientist at The Honest Company and Happy Money, where he built predictive models and data analytics solutions for various business applications.
In-person | Business Talk | Machine Learning | MLOps | Data Engineering | Responsible AI | All Levels
The audience will walk away with: -An overview of critical capabilities companies need to successfully realize value from their data and AI initiatives -How to build a strategic roadmap that will mature their organization to effectively work with data and build AI -Common pitfalls and challenges companies face during this process and tactics on how to avoid them…more details
Rehgan Avon is the co-founder & CEO of AlignAI, a Knowledge Management Platform helping companies sustainably transform their organizations to effectively work with data & Artificial Intelligence. With a background in Integrated Systems Engineering and a strong focus on building technology to support analytics and machine learning, Rehgan has worked on architecting solutions and products to operationalize machine learning models at scale within the large enterprise. Rehgan’s previous experience has been fueled by a passion for early-stage startups and product development.
Rehgan has built an extensive community of analytics & data experts through Women in Analytics, a global organization she founded in 2016 to provide more visibility to diverse individuals making an impact in this space. She hosts a global annual conference that has put over 250 women on the stage. The community has over 5000 members from around the world that participate in tutorials, learning groups, discussion boards, and mentorship programs. She was also inducted into the inaugural class of Columbus CEO’s Future 50.
In-person | Talk | Data Engineering & Big Data | Machine Learning | NLP | All Levels
In this session, we demonstrate how to get started with a streaming architecture using Delta Lake and Kafka on Azure. However, rather than ingesting data into Delta Lake via Spark jobs, we are going to use a couple of open-source libraries (yeah, no Spark required…some restrictions apply). By the end of this session you’ll walk away with a playbook for up-leveling your streaming analytics architecture..more details
Gary Nakanelua is a professional technologist with over 17 years of experience and the author of Experiment or Expire. Gary is the Managing Director of Innovation at Blueprint, a data intelligence company based in Bellevue, WA. He’s responsible for the experimentation and creation of Blueprint’s transformative solutions and accelerators. With his diverse background, Gary brings a different perspective to problems that businesses are facing today to create quantifiable solutions driven through a high level of collaborative thought processing, strategic planning, and cannibalization.
In-person | Talk | Machine Learning | ML Safety and Security| Responsible AI | Intermediate-Advanced
Machine learning (ML) technologies impact our lives in myriad ways, both visible and invisible, and already there is a clear need for “responsible ML” practices, which promote the development and application of auditable and accountable machine learning systems. As ML continues to mature—and as individuals, corporations, and governments accelerate their adoption of ML technologies—this need will only grow more urgent. And as those building and releasing ML into the world, we especially need practical approaches that bias ourselves and our models toward responsible use as part of normal operations…more details
Tom Shafer works as a Lead Data Scientist at Elder Research, a recognized leader in data science, machine learning, and artificial intelligence consulting since its founding in 1995. As a lead scientist, Tom contributes technically to a wide variety of projects across the company, mentors data scientists, and helps to direct the company’s technical vision. His current interests focus on Bayesian modeling, interpretable ML, and data science workflow. Before joining Elder Research, Tom completed a PhD in Physics at the University of North Carolina, modeling nuclear radioactive decays using high-performance computing.
In-person | Lightning Talks | Machine Learning | MLOps | Intermediate
Testing should be at the heart of Data Science. It is, after all, supposed to be Science: reproducible, reliable, consistent, all that. But testing for ML is a terrible experience. Nobody does it well, and some don’t even do it at all. This leaves us all mired in a mess of unreliable models, rickety pipelines, and who-knows-what data in between…more details
Emily is a Staff MLOps Engineer at Intuit Mailchimp, meaning she gets paid to say “it depends” and “well actually.” Professionally she leads a crazy good team focused on helping Data Scientists do higher quality work faster and more intuitively. Non-professionally she paints huge landscapes and hurricanes in oils, crushes sweet V1s (as long as they’re not too crimpy), rides her bike, reads a lot, and bothers her cats. She lives in Atlanta, GA, which is inarguably the best city in the world, with her husband Ryan who’s a pretty darn cool guy.
In-person | Lightning Talk | Intermediate
This talk will discuss both the challenges and successes we have experienced introducing automation and time-series forecasting methodologies in this space, as well as differences in strategy and approach when producing forecasts two weeks, two months, two years, and 20 years in advance…more details
Alex Antony is a senior staff data scientist at GE Aerospace where he leads modeling, analytic development, reporting, and forecasting for the market intelligence function. He has 10 years of experience in the data science field and 15 years of experience working with the Department of Defense. He holds a MS in Applied Statistics and a PhD from Indiana University where he focused on Computational and Quantitative Social Science.
In-person | Talk | Machine Learning | All Levels
In this session, you will learn more about vector embeddings, how vector search engines work, why they are so fast, and how they could help you take your search to production. I will also share a live-coding demo, showing you all the steps from setup to query execution…more details
Erika Cardenas is a Developer Advocate at Weaviate, an open-source vector database. She has two master’s degrees in economics and data science from Florida Atlantic University. Erika was part of the NSF-NRT program, where she published a paper on predicting house prices using structured and unstructured text data. She has written several blog posts such as vector database versus vector library, hybrid search, and integrating LangChain and Weaviate for generative search applications.
In-person | Lightning Talks | Machine Learning | Intermediate
In this talk, we present Mab2Rec, an open-source library for building bandit-based recommender systems developed by the AI Center of Excellence at Fidelity Investments. Our approach takes advantage of modular system design that tightly integrates yet maintains the independence of individual components, thus satisfying the two of the most important aspects of industrial applications, generality and specificity. This provides a powerful and scalable framework for building and deploying recommender applications, while also allowing individual components to be re-used beyond recommender systems…more details
Bernard is a Director in the AI Center of Excellence at Fidelity Investments working on personalization and recommender systems. His work is primarily concentrated in recommender systems and optimization, and he regularly presents on these topics, most recently at IJCAI’21 and CPAIOR’21 conferences. He is the lead developer of the open-source libraries Selective, MABWiser and Mab2Rec. He holds a MS in Computational Science and Engineering from Harvard University.
In-person | Track Keynote | All | All Levels
In this talk, we’ll explore the critical role that decision intelligence workflows play in driving business value with data-derived insights. We’ll discuss the building blocks of these workflows, from ML-based recommendations to user-friendly interfaces that facilitate action-taking, and best practices to influence your decision-makers to act. Join us to learn how to effect change and give your analytical outputs the best chance of driving measurable business value…more details
Joe Dery joined Western Governors University’s College of IT as the VP & Dean of Data Analytics in summer, 2022. At WGU, Joe is working to help more than 3,000 current analytics students learn how to effect change in their professional roles – surgically balancing a combination of mathematics, data management, programming, and business influence skills. Prior to joining academia full-time, Joe spent much of his corporate career working for EMC – and later, Dell Technologies – where he joined as a “hands-on-keyboard” Data Scientist in 2011. Joe went on to hold leadership positions in Dell’s Sales, Finance, and Supply Chain organizations driving efforts in Data Science, Business Intelligence, Digital Strategy, and Digital Transformation. Across these domains, Joe’s efforts touched a wide variety of business problems, including ML-driven sales quota allocations, sales forecasting & opportunity prioritization, customer cross-sell/whitespace targeting, addressable marketing opportunity sizing, sales territory optimization, supply chain planning optimization, data/analytics literacy training, and self-service BI. Building from his experiences, Joe is often invited to speak on the crucial role of decision intelligence frameworks, change management, and “improv” in bringing analytics solutions to life. Joe holds a Ph.D in Business Analytics & an M.S. in Marketing Analytics, both from Bentley University.
In-person | Lightning Talks | Data Engineering & Big Data | Data Visualization & Data Analysis | All Levels
MLOps surface quality defect detection is an operational data analysis project aimed at capturing quality defects by analyzing sensor data. The solution is a pilot leveraging a cloud service (Databricks) to automate model maintenance and enhance the scalability of the solution. We designed a novel MLOps architecture which can benefit metal manufacturing industry…more details
Shanshan is working for Novelis as Lead Data Scientist. Her field focuses on advanced operation data analytics, and AI implementation in aluminum rolling and recycling. Her team is leading AI Eco system build up in Novelis. She got her PhD degree from Missouri S&T, and worked for Center of Intelligent Maintenance Systems with focusing on fault diagnosis, prognosis and predictive maintenance in IIoT systems.
In-person | Lightning Talks | Deep Learning | NLP | Machine Learning | Intermediate
In this talk, we will provide a comprehensive introduction to the basics of TDA, including key concepts such as topological spaces, homeomorphisms, and persistent homology. We will then delve into the details of how TDA can be applied in the context of machine learning, including the use of tools such as the Mapper algorithm and the TDA package in R…more details
Christian is Machine Learning Technical Leader at Mercado Libre, the largest e-commerce/fintech company in Latin America, where he dedicates his efforts to creating tools for monitoring and quality of learning models. He is a Computer Engineer and Master in Science with a major in Astronomy from UNAM (Universidad Nacional Autonoma de Mexico). He is a “Xoogler” and has more than 15 years of experience in the field of machine learning. He has lectured in almost a dozen countries.
In-person | Talk | NLP | Machine Learning | Deep Learning | All Levels
We discuss real-world use cases and technical integration aspects, emphasizing how Knowledge Graphs are used to provide accurate and relevant responses by using rich domain knowledge, its provenance, and its semantic description. Attendees gain insights into how Knowledge Graphs and LLMs can work together to unlock new possibilities for conversation driven data exploration and question answering, by harnessing the rich stores of knowledge that Knowledge Graphs provide…more details
Sean’s experience covers multiple aspects of starting and growing a software company, including holding various titles from President through to co-lead dish washer. He continues in a leadership role as CTO and serves on the board.
In-person | Talk | ML for Biotech and Pharma | Beginner-Intermediate
Background: Therapeutic administration of psychedelic drugs has shown the potential to improve mental health in many historical anecdotal accounts as well as in an extensive and growing scientific literature. A recent randomized double-blind phase-IIb study demonstrated the safety and efficacy of psilocybin in participants with treatment resistant depression (TDR). While promising, this study also showed that the treatment works for a portion of the TRD population, and thus early prediction of outcome is a key objective…more details
As the Vice President of Digital Health Research at COMPASS Pathways, Bob is leading the data science and machine learning efforts aimed at improving the safety, efficacy, and scalability of psilocybin therapy. He is an accomplished neuroscientist and engineer with deep expertise in measuring human brain and behavior, and building data-driven solutions to mental health care challenges. Prior to joining COMPASS Pathways, Bob was VP of Research at Mindstrong, leading the research and data science teams in the development of digital biomarkers for mental health. Prior to Mindstrong, Bob was the Research Director of the Stanford Center for Neurobiological Imaging. He has published over one hundred peer-reviewed articles in the fields of psychology, psychiatry, neuroscience, statistics, and magnetic resonance technology over his 30+ year scientific career. Bob completed his PhD in Experimental Psychology at the University of California at Santa Cruz, and postdoctoral fellowships at the University of British Columbia and Stanford University.
Virtual | Talk | Machine Learning
(1) testing if A is positive semidefinite or has minimum eigenvalue sufficiently negative. We give optimal algorithms in the matrix-vector and vector-matrix-vector product models. One of our algorithms implements a new random walk and uses only a single vector-matrix-vector product in each iteration, rather than the usual matrix-vector product in each iteration of classical subspace iteration algorithms…more details
David Woodruff is a professor at Carnegie Mellon University in the Computer Science Department. Before that he was a research scientist at the IBM Almaden Research Center, which he joined in 2007 after completing his Ph.D. at MIT in theoretical computer science. His research interests include data stream algorithms, distributed algorithms, machine learning, numerical linear algebra, optimization, sketching, and sparse recovery. He is the recipient of the 2020 Simons Investigator Award, the 2014 Presburger Award, and Best Paper Awards at STOC 2013, PODS 2010, and PODS, 2020. At IBM he was a member of the Academy of Technology and a Master Inventor.
Virtual | Talk
Abstract Coming Soon!
Tom Corcoran is a Principal Solution Architect at Red Hat.
Tom’s areas of specialty are Red Hat Application Services, AI-ML Ops, and Openshift. Having held this role with Red Hat for the past five years, Tom has managed projects throughout Europe, the US, and ANZ.
Tom has over 15 years of experience in Java Development in a variety of industry sectors and geo.s, which ensures deep technical expertise across Red Hat’s products and solutions spanning application and AI/ML workloads. His extensive experience lends a sharp focus to the solution architect role and he brings a passion for delivering outstanding technical and business value to Red Hat’s customers’ projects and products.
Andreas is leading the Cloud Strategy & Transformation topics for Red Hat across Australia & New Zealand. His hands-on experience in startups as well as large scale enterprise transformation programs has given Andreas a solid understanding of business drivers and value creation. Andreas has worked on a wide range of initiatives across different industries in Europe, North America and APAC including full-scale ERP migrations, HR, finance and accounting, manufacturing, supply chain logistics transformations and scalable core banking strategies to support regional business growth strategies. Since joining Red Hat in 2015, Andreas is focussed on helping Red Hat customers to build the necessary capabilities and to make the best-fit technology, methodology and architecture choices to be a successful digital competitor. Andreas is part of Red Hat’s global #redhatchiefs network and works closely with the CTO office on emerging technologies related engineering topics. Andreas got his first Commodore 64 when he was 12 years old and started to work as a software developer in 1996 with Krauss-Maffei in Munich building full mission simulators. Andreas holds an Engineering degree from the University of Ravensburg, Germany.
Virtual | Talk | MLOps | Intermediate-Advanced
This talk includes a highly technical walkthrough of the lower level system libraries involved in GPU computing. It covers each layer between a Data Scientist’s ML application in a container and the GPU that backs it. It’s everything you need to know to wrangle Docker, Nvidia, Kubernetes, PyTorch, Tensorflow, and other friends into the same spot…more details
Emily is a Staff MLOps Engineer at Intuit Mailchimp, meaning she gets paid to say “it depends” and “well actually.” Professionally she leads a crazy good team focused on helping Data Scientists do higher quality work faster and more intuitively. Non-professionally she paints huge landscapes and hurricanes in oils, crushes sweet V1s (as long as they’re not too crimpy), rides her bike, reads a lot, and bothers her cats. She lives in Atlanta, GA, which is inarguably the best city in the world, with her husband Ryan who’s a pretty darn cool guy.
Virtual | Talk | Machine Learning
Abstract Coming Soon!
Pradeep Ravikumar is a Professor in the Machine Learning Department, School of Computer Science at Carnegie Mellon University. He was previously an Associate Director at the Center for Big Data Analytics, at the University of Texas at Austin. His thesis has received honorable mentions in the ACM SIGKDD Dissertation award and the CMU School of Computer Science Distinguished Dissertation award. He is a Sloan Fellow, a Siebel Scholar, a recipient of the NSF CAREER Award, and was Program Chair for the International Conference on Artificial Intelligence and Statistics (AISTATS) in 2013. He is Associate Editor-in-Chief for IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), and action editor for the Machine Learning journal, and the Journal of Machine Learning Research.
Dr. Ravikumar’s research group at CMU works on the foundations of statistical machine learning, with recent focus on “next generation” machine learning systems, that are explainable, robust to train and test time corruptions, and resilient to distribution shifts, and are learnt under resource constraints by leveraging or discovering various notions of “structure” and domain knowledge.
In-person | Lightning Talk | MLOps | Data Engineering & Big Data | All Levels
Operational machine learning requires the orchestration of particularly complex data pipelines, because they incorporate every other type of pipeline, from data engineering to model training to performance monitoring. So providing a general framework for MLOps cannot be done from the perspective of any one point in the graph of data technologies, nor of any particular persona. Everything has to come together…more details
In my talk, I will share lessons learned from real-world examples where we had to:
· Design secondary explainer models into workflows that empower decision-makers
· Deploy model monitoring systems to manage risk, bias, and drift
· Test the robustness of models, especially in applications that touch unstructured data
· Optimize for both model performance and the resulting human action and impact / ROI..more details
Cal Al-Dhubaib is a data scientist, entrepreneur, and innovator in responsible artificial intelligence, specializing in high-risk sectors such as healthcare, energy, and defense. He is the founder and CEO of Pandata, a consulting company that helps organizations to design and develop AI-driven solutions for complex business challenges. Their clients include globally recognized organizations like the Cleveland Clinic, Progressive Insurance, University Hospitals, and Parker Hannifin.
Cal frequently speaks on topics including AI ethics, change management, data literacy, and the unique challenges of implementing AI solutions in high-risk industries. His insights have been featured in numerous publications such as Forbes, Ohiox, the Marketing AI Institute, Open Data Science, and AI Business News. Cal has also received recognition among Crain’s Cleveland Notable Immigrant Leaders, Notable Entrepreneurs, and most recently, Notable Technology Executives.
In-person | Talk | Machine Learning | Deep Learning | All Levels
In this session, Jon Malloy will discuss how you can fully define the data you need and begin gathering it as quickly and efficiently to allow Data Science teams to focus on the issues they want to focus on…more details
Jon Malloy is a Data Strategist at Snowplow where he is responsible for helping customers get the most value from their pipeline and derive meaningful insights. Prior to join Snowplow, Jon spent 4 years as a Technical Analyst in the US health care communications industry and 4 years as a Data Scientist in the US health care communications / finance industry. He holds as Master of Science in Business Analytics from Bentley University and resides in Boston, MA.
In-person | Talk and Career Talk
In this lightning overview, we’ll discuss the most impactful changes you can make to your data science practice. Topics include running a model in shadow mode, data versioning, estimating costs, and communicating impact to a non-technical audience…more details
Kerstin is CEO and Co-founder of SuperUse, a collaboration platform. She has led data science initiatives at startups across industries, from healthcare to CPG. She takes pride in mentoring fantastic data scientists and nurturing talent. A builder at heart, she regularly pushes code, trains models, and uncovers insights. She has Masters degrees in Mathematical Computer Science and Mathematical Statistics. She is expecting her PhD from Cornell in early 2023. She spends her free time going on long hikes with her two small dogs through the big mountains outside Seattle.
In-person | Lightning Talk | Responsible AI | Intermediate
My goal is to introduce you to explainable AI and encourage you to incorporate explainability into your ML workflows. In this talk I will share several examples of applications and how explainable AI can enhance these applications…more detials
Haritha Yanam is the Director of Data science, Innovation at Liberty Mutual Insurance, where she focuses on building AI/ML solutions which help in mitigating Insurance Risk. Before joining Liberty Haritha worked at several fortune 500 companies leading data science and data engineering teams. Haritha comes with a strong Data Science/Machine Learning, Data Engineering & Data Analytics background. She enjoys teaching and currently works as an adjunct professor(part-time) at the University of Maryland Baltimore teaching data science.
Virtual | Bootcamp | Machine Learning | Beginner
The Introduction to Machine Learning Workshop will build upon the attendee’s foundation of math and coding knowledge to develop a basic understanding of the most popular machine learning algorithms used in industry today. We will answer such questions as: What are the different types of ML algorithms ? What is Overfitting and how can we avoid it? Why is XGBoost consistently outperform other algorithms?…more details
Julia Lintern currently works as a Director of Data Science at Gartner. Previously, she worked as a Data Scientist for the New York Times. Julia began her career as a structures engineer designing repairs for damaged aircraft. Julia holds an MA in applied math from Hunter College, where she focused on visualizations of various numerical methods and discovered a deep appreciation for the combination of mathematics and visualizations. During certain seasons of her career, she has also worked on creative side projects such as Lia Lintern, her own fashion label.
Virtual | Bootcamp | Machine Learning | MLOps | Intermediate
In this training, you will learn how to accelerate your data analyses using the Python language and Pandas, a library specifically designed for tabular data analysis. We start by learning the core Pandas data structures, the Series and DataFrame. From these foundations, we will learn to use the split-apply-combine paradigm for grouped computations, manipulate time series, and perform advanced joins between datasets. Specifically, loading, filtering, grouping, and transforming data. Having completed this workshop, you will understand the fundamentals and advanced features of Pandas, be aware of common pitfalls, and be ready to perform your own analyses…more details
Daniel Gerlanc has worked as a data scientist for more than decade and been writing software for nearly 20 years. He frequently teaches live trainings on oreilly.com and is the author of the video course Programming with Data: Python and Pandas. He has coauthored several open source R packages, published in peer-reviewed journals, and is a graduate of Williams College.
Virtual | Bootcamp | Machine Learning | Beginner
Data science uses a combination of mathematics, statistics and computer science to help us solve questions of importance in a large number fields. In this workshop we will introduce the underlying mathematical principles of the field, with example problems gleaned from a number of different industries. By the end of the workshop the participant will know enough data science to explore their own problems, and be ready for more intermediate and advanced courses…more details
Eric Eager is the VP of Research and Development at SumerSports, a football analytics startup founded by Paul Tudor Jones and Jack Jones. Prior to joining Sumer, he held similar roles at Pro Football Focus, and is responsible for many of the insights that have grown the game of American football to this day. Eric holds a PhD in Mathematical Biology from the University of Nebraska, and has taught at Wharton, DataCamp and the University of Wisconsin – La Crosse, publishing over 25 academic papers during his career.
In-person | Half-Day Training | NLP | Machine Learning | Intermediate-Advanced
Large Language Models like GPT-4 are transforming the world in general and the field of data science in particular at an unprecedented pace. This training introduces deep learning transformer architectures including LLMs. Critically, it also demonstrates the breadth of capabilities of state-of-the-art LLMs like GPT-4 can deliver, including for dramatically revolutionizing the development of machine learning models and commercially successful data-driven products, accelerating the creative capacities of data scientists and pushing them in the direction of being data product managers. Brought to life via hands-on code demos that leverage the Hugging Face and PyTorch Lightning Python libraries, this training covers the full lifecycle of LLM development, from training to production deployment.…more details
Jon Krohn is Co-Founder and Chief Data Scientist at the machine learning company Nebula. He authored the book Deep Learning Illustrated, an instant #1 bestseller that was translated into seven languages. He is also the host of SuperDataScience, the data science industry’s most listened-to podcast. Jon is renowned for his compelling lectures, which he offers at leading universities and conferences, as well as via his award-winning YouTube channel. He holds a PhD from Oxford and has been publishing on machine learning in prominent academic journals since 2010.
In-person | Full-Day Training | Machine Learning | All Tracks | Beginner
In this training session we will work through the entire process of training a machine learning model in R. Starting with the scaffolding of cross-validation, onto exploratory data analysis, feature engineering, model specification, parameter tuning and model selection. We then take the finished model and deploy it as an API in a Docker container for production use…more details
In-person | Full-Day Training | Data Visualization & Data Analysis | Machine Learning | Intermediate-Advanced
The human brain excels at finding patterns in visual representations, which is why data visualizations are essential to any analysis. Done right, they bridge the gap between those analyzing the data and those consuming the analysis. However, learning to create impactful, aesthetically-pleasing visualizations can often be challenging. This session will equip you with the skills to make customized visualizations for your data using Python…more details
Stefanie Molin is a software engineer and data scientist at Bloomberg in New York City, where she tackles tough problems in information security, particularly those revolving around data wrangling/visualization, building tools for gathering data, and knowledge sharing. She is also the author of “Hands-On Data Analysis with Pandas,” which is currently in its second edition. S