ODSC West Schedule 2020

Day by Day Session Schedule

If you are an attendee, please refer the most updated schedule on live.staging6.odsc.com

West Trainings
West Workshops/Tutorials
West Talks
09:30 - 12:30
Introduction to Shiny Application Development

Half-Day Training | R-Programming | Data Visualization | Beginner-Intermediate

 

Turning raw data into meaningful information and telling data driven stories is one of the great challenges of data science. When your data does not have you to speak for it in a live situation, your application needs to communicate your message clearly and provide simple interfaces and meaningful interactions to drive your message home to consumers.
In this session you will learn to use Shiny to build a dashboard from blank page to interactive application using the programming language R, the free R development environment rStudio and Redis. We will use free public data and open source libraries as we sculpt our dashboard together...more details

 

Introduction to Shiny Application Development image
Bethany Poulin
Data Science Instructor | General Assembly
09:30 - 12:30
Getting Started with Pandas for Data Analysis

Half-Day Training | Data Science Kick-Starter | Beginner

 

This tutorial offers a comprehensive introduction to the powerful pandas library for data analysis built on top of the Python programming language. Pandas represents a great step forward for graphical spreadsheet users looking to grow their data manipulation skills. I like to call it “Excel on steroids”. By completing this workshop, you’ll have a strong foundation for using Pandas in your day-to-day data analysis needs. We’ll start out with the basics — importing datasets, selecting rows and columns, filtering rows by criteria — and progress to advanced concepts like grouping values, joining multiple datasets together, and cleaning text…more details

Getting Started with Pandas for Data Analysis image
Boris Paskhaver
Software Engineer | Stride Consulting
09:30 - 12:30
Keras from Soup to Nuts – An Example Driven Tutorial

Half-Day Training | Deep Learning | Intermediate

 

This workshop hopes to convince participants that Keras is a worthwhile addition to their Machine Learning toolbelt. It teaches them how to build their own Keras models, initially using components already available in Keras, then extend them by customizing some of these components, and finally exploit the underlying Tensorflow platform for maximum flexibility and performance. They will also be able to work with the many cool (sometimes SOTA) models shared by the Keras community…more details

Keras from Soup to Nuts – An Example Driven Tutorial image
Sujit Pal
Technology Research Director | Elsevier Labs
10:30 - 13:30
Introduction to Generative Modeling Using Quantum Machine Learning

Half-Day Training | Machine Learning | Intermediate-Advanced

 

Ever wondered how quantum computers work, and how they do machine learning? With quantum computing technologies nearing the ear of commercialization and quantum advantage, machine learning has been proposed as one of the most promising applications. One of the areas in which quantum computing is showing great potential is in generative models in unsupervised and semi-supervised learning. In this training you will develop a basic understanding of quantum computing and how it can be used in machine learning models, with special emphasis on generative models. We will focus on a particular architecture, the quantum circuit Born machine (QCBM), and use it to generate a simple dataset of bars and stripesmore details

 

Introduction to Generative Modeling Using Quantum Machine Learning image
Luis Serrano, PhD
Quantum AI Research Scientist | Zapata Computing
Introduction to Generative Modeling Using Quantum Machine Learning image
Kaitlin Gili
Quantum Applications Intern | Zapata Computing
Introduction to Generative Modeling Using Quantum Machine Learning image
Alejandro Perdomo, PhD
Senior Quantum Scientist | Zapata Computing
10:30 - 13:30
Data Visualization: From Jupyter to Dashboards

Half-Day Training | Data Visualization | Intermediate

 

Data visualization is fundamental to the data science process. Using plots and graphs to convey a complex idea makes your data more accessible to everyone. In this session, you will learn the fundamentals of plotting with Pandas in Jupyter by building an interactive visualization prototype that can also run as a standalone web application/dashboard. This session is for anyone who wants to be more familiar with data visualization, hands-on, with Python, Pandas, Matplotlib, interactive widgets, and Flask...more details

Data Visualization:  From Jupyter to Dashboards image
David Yerrington
Data Science Consultant | Yerrington Consulting
10:30 - 13:30
State of the art AI methods with TensorFlow: Transfer Learning, RL and GANs

Half-Day Training | Deep Learning | Advanced

 

Although supervised learning has dominated industry machine learning implementations, unsupervised and semi-supervised methods have started to be practically applied to real world problems (outside of playing video games). Generative Adversarial Networks (GANs) are being utilized to augment data and generate dialogue, and Reinforcement Learning (RL) is helping people plan marketing campaigns and control robots. In this training, you will develop a theoretical understanding of these and other related state-of-the-art AI methods along with the hands-on skills needed to train and utilize them. You will implement a variety of models in TensorFlow for tasks including object recognition, image generation and robotics…more details

10:30 - 17:00
Advanced NLP with TensorFlow and PyTorch: LSTMs, Self-attention and Transformers

Full-Day Training | NLP | Advanced

 

Natural Language Processing (NLP) has recently experienced its own “ImageNet” moment. Rapidly evolving language models have enabled practitioners to decipher long lost languages, translate speech in one language to speech in another language directly without converting to text, generate long form text that adapts to the style and content of human prompts, and translate between language pairs never seen explicitly by computer systems (among many other impressive results).
In this training, you will develop a theoretical understanding of modern NLP along with the hands-on skills needed to develop state-of-the-art models. You will implement a variety of recurrent layer and transformer based architectures in both TensorFlow and PyTorch for tasks including text classification, machine translation, and predictive text…more details

10:30 - 16:00
Deep Learning (with TensorFlow 2)

Full-Day Training | Deep Learning | Machine Learning | Beginner – Intermediate

 

Relatively obscure a few short years ago, Deep Learning is ubiquitous today across data-driven applications as diverse as machine vision, natural language processing, and super-human game-playing.
This Deep Learning primer brings the revolutionary machine-learning approach behind contemporary artificial intelligence to life with interactive demos featuring TensorFlow 2, the major, cutting-edge revision of the world’s most popular Deep Learning library. To facilitate an intuitive understanding of Deep Learning’s artificial-neural-network foundations, the essential theory will be introduced visually and pragmatically. Paired with tips for overcoming common pitfalls and hands-on Python code run-throughs provided in straightforward Jupyter notebooks, this foundational knowledge empowers you to build powerful state-of-the-art Deep Learning models…more details

 

Deep Learning (with TensorFlow 2) image
Dr. Jon Krohn
Chief Data Scientist, Author of Deep Learning Illustrated | Nebula.io
10:30 - 13:30
Modern and Old Reinforcement Learning Part 1

Half-Day Training | Deep Learning | Machine Learning | Beginner-Intermediate

 

Reinforcement Learning recently progressed greatly in the industry as one of the best techniques for sequential decision making and control policies.DeepMind used RL to greatly reduce energy consumption in Google’s data center. It has been used to do text summarization, autonomous driving, dialog systems, media advertisements and in finance by JPMorgan Chase. We are at the very beginning of the adoption of these algorithms as systems are required to operate more and more autonomously.
In this workshop we will explore Reinforcement Learning, starting from its fundamentals and ending creating our own algorithms. We will use OpenAI gym to try our RL algorithms. OpenAI is a non profit organization that want committed to open source all their research on Artificial Intelligence…more details

Modern and Old Reinforcement Learning Part 1 image
Leonardo De Marchi
VP of Labs | Thomson Reuters
13:00 - 16:00
Mathematics for Data Science and Machine Learning

Half-Day Training | Kick-starter | Machine Learning | All Levels

 

The field of machine learning and data science has gained sudden resurgence in the last few years. The contributions of machine learning in solving data-driven problems and creating intelligent applications cannot be overemphasized. This field which intersects statistics and probability, mathematics, computer science and algorithms can be used to learn iteratively from complex data and find hidden insights. Understanding the mathematics behind machine learning allows us to choose the right algorithms for our problem, make good choices on parameter settings and validation strategies, recognize under- and over-fitting, troubleshoot ambiguous results and put appropriate confidence bounds on results...more details

Mathematics for Data Science and Machine Learning image
Wale Akinfaderin, PhD
Sr. Data Scientist | Duke Energy Corporation
14:00 - 17:00
Remote HPCC Systems/ECL Training

Half-Day Training | NLP | Machine Learning | Beginner-Intermediate

 

Learn why the truly open source HPCC Systems platform is Better at Big Data and learn how ECL can empower you to build powerful data queries with ease. HPCC Systems is a comprehensive, dedicated data lake platform makes combining different types of data easier and faster than competing platforms — even data stored in massive, mixed schema data lakes — and it scales very quickly as your data needs grow...more details

Remote HPCC Systems/ECL Training image
Bob Foreman
Software Engineering Lead | LexisNexis Risk Solutions
Remote HPCC Systems/ECL Training image
Hugo Watanuki
Senior Technical Support Engineer | LexisNexis Risk Solutions
14:00 - 17:00
Session Title by Alejandro Martinez, PhD Coming Soon!
Session Title by Alejandro Martinez, PhD Coming Soon! image
Alejandro Martinez, PhD
CEO and Co-Founder | MatrixDS
14:00 - 17:00
Modern and Old Reinforcement Learning Part 2

Half-Day Training | Deep Learning | Machine Learning | Beginner-Intermediate

 

We will use OpenAI gym to try our RL algorithms. OpenAI is a non profit organization that want committed to open source all their research on Artificial Intelligence. To foster innovation OpenAI created a virtual environment, OpenAi gym, where it’s easy to test Reinforcement Learning algorithms. In particular, we will start with some popular techniques like Multi-Armed Bandit, going thought Markov Decision Processes and Dynamic Programming. We then will also explore other RL frameworks and more complex concepts like Policy gradients methods and Deep Reinforcement learning, which recently changed the field of Reinforcement Learning. In particular, we will see Actor-Critic models and Proximal Policy Optimizations that allowed OpenAI to beat some of the best Dota players. We will also provide the necessary Deep Learning concepts for the course…more details

Modern and Old Reinforcement Learning Part 2 image
Leonardo De Marchi
VP of Labs | Thomson Reuters
10:45 - 12:15
Build an ML pipeline for BERT models with TensorFlow Extended – An end-to-end Tutorial

Workshop | MLOps & Management | NLP | Intermediate-Advanced

 

During the workshop, the audience will gain not only a holistic overview of the TensorFlow ecosystem but will also learn the necessary steps to bring ML projects from experiments to production. With the knowledge, the participants can translate their ML projects into TFX pipelines and simplify their ML model production processesmore details

Build an ML pipeline for BERT models with TensorFlow Extended – An end-to-end Tutorial image
Hannes Hapke
Machine Learning Engineer | Digits
10:45 - 12:15
Fast Data Access in R and Python with Apache Arrow

Workshop | R Programming | Intermediate

 

Apache Arrow is a cross-language development platform for in-memory analytics. In this tutorial, I’ll show how you can use Arrow in Python and R, both separately and together, to speed up data analysis on datasets that are bigger than memory. We’ll cover the fundamentals of Arrow in Python in R, then explore in depth Arrow’s Dataset feature, which provides for fast, efficient querying of large, multi-file datasets. Finally, we’ll discuss Flight, an Arrow-native client-server framework for transporting data, and show how to set up a server and query against it...more details

Fast Data Access in R and Python with Apache Arrow image
Neal Richardson, PhD
Director of Engineering | Ursa Labs / RStudio
10:45 - 12:15
Session Title by Jeffrey Yau, PhD Coming Soon!
Session Title by  Jeffrey Yau, PhD Coming Soon! image
Jeffrey Yau, PhD
Chief Data & A.I. Officer | Fanatics Collectibles
10:45 - 12:15
Solving Problems with Both Text and Numerical Data Using Gradient Boosting

Workshop | Machine Learning | Open-source | Intermediate

 

Gradient boosting is a powerful machine-learning technique that achieves state-of-the-art results in a variety of practical tasks. For a number of years, it has remained the primary method for learning problems with heterogeneous features, noisy data, and complex dependencies: web search, recommendation systems, weather forecasting, and many others.
Some problems contain different types of data, including numerical, categorical and text data. In this case the best solution is either buiding new numerical features instead of text and categories and pass it to gradient boosting, or using out-of-the box solutions for that…
more details

Solving Problems with Both Text and Numerical Data Using Gradient Boosting image
Stanislav Kirillov
Senior Software Developer | Yandex
10:45 - 12:15
Conversational AI with DeepPavlov

Workshop | Deep Learning |  Beginner-Intermediate

 

Adoption of messaging communication and voice assistants has grown rapidly in the last years. This creates a huge demand for tools that speed up development of Conversational AI systems. An open-source DeepPavlov framework is created for development of multi-skill conversational agents. It prioritizes efficiency, modularity, and extensibility with the goal to make it easier to develop dialogue systems from scratch and with limited data available. It also supports modular as well as end-to-end approaches to the implementation of conversational agents...more details

Conversational AI with DeepPavlov image
Mikhail Burtsev, PhD
Head of Lab | Moscow Institute of Physics and Technology
10:45 - 12:15
GPU-accelerated Data Science with RAPIDS

Workshop | Machine Learning | ML for Programmers | Beginner-Intermediate

 

The talk will emphasize both data preparation (ETL) and machine learning operations, with a hands-on demonstration of porting a typical workflow from CPU to GPU and measuring the speedup. We’ll go into more detail on real-world applications taking advantage of these speed improvements, including hyperparameter optimization for machine learning models, single cell genomics analysis, and applications in finance. For large-data users, we’ll discuss some of the options for scaling RAPIDS to multiple GPUs or multiple nodes, emphasizing the tight integration with the Dask ecosystemmore details

GPU-accelerated Data Science with RAPIDS image
John Zedlewski
Director, GPU-accelerated machine learning | NVIDIA
12:30 - 14:00
Causal Inference in Data Science

Tutorial | Machine Learning | Data Visualization | Beginner-Intermediate

 

Data scientists often get asked questions of the form “Does X Drive Y”: (1) did recent PR coverage drive sign ups, (2) does customer support increase sales, or (3) did improving the recommendation model drive revenue? Supporting company stakeholders requires every data scientist to learn techniques that can answer questions like these, which are centered around issues of causality. Often times we cannot use AB testing to answer these questions and must turn to causal inference techniques insteadmore details

Causal Inference in Data Science image
Vinod Bakthavachalam
Data Scientist | Coursera
12:30 - 14:00
A Hands-On Tutorial for Training Interpretable Variational Autoencoders Using siVAE

Workshop | Deep Learning | Data Visualization | Intermediate

 

In this hands-on tutorial, we will introduce attendees to the siVAE (scalable, interpretable VAE) model that infers a set of factor loadings that explicitly map latent dimensions to the input features that define them, during training of the VAE model. Using standard datasets from computer vision (MNIST, Fashion-MNIST and CIFAR-10), we will walk attendees through the process of training the siVAE model, visualizing the sample embeddings inferred by classic VAEs, and extracting and visualizing the features that contribute to individual latent dimensions. We will also teach attendees how to estimate and visualize feature awareness, a new metric for measuring the overall importance of individual features for embedding a sample in the latent space. At the end of the tutorial, attendees will be able to train an siVAE model on their own datasets and interpret and visualize the latent dimensions inferred…more details

A Hands-On Tutorial for Training Interpretable Variational Autoencoders Using siVAE image
Gerald Quon, PhD
Assistant Professor | UC Davis Machine Learning & AI Group
A Hands-On Tutorial for Training Interpretable Variational Autoencoders Using siVAE image
Yongin Choi
PhD Candidate | UC Davis
12:30 - 14:00
Uncertainty Sampling and Diversity Sampling

Workshop | Machine Learning | Beginner-Intermediate

 

Most deployed Machine Learning models use Supervised Learning powered by human training data. Selecting the right data for human review is known as Active Learning. This talk will introduce a set of Active Learning methods that help you understand where your model is currently confused (Uncertainty Sampling) and to identify gaps in your model knowledge (Diversity Sampling). We’ll cover techniques that are only a few lines of code through to techniques that build on recent advances in transfer learning. We’ll use code examples from my open source PyTorch Active Learning library...more details

Uncertainty Sampling and Diversity Sampling image
Robert Munro, PhD
CEO | Author of Human-in-the-Loop Machine Learning | Machine Learning Consulting
12:30 - 14:00
State-of-the-Art Natural Language Processing with Spark NLP

Tutorial | NLP | Machine Learning | Intermediate

 

This is a hands-on tutorial on applying the latest advances in deep learning and transfer learning for common NLP tasks such as named entity recognition, document classification, spell checking, and sentiment analysis. Learn to building complete text analysis pipelines using the highly accurate, high performant, open-source Spark NLP library in Python.This is a hands-on tutorial on applying the latest advances in deep learning and transfer learning for common NLP tasks such as named entity recognition, document classification, spell checking, and sentiment analysis. Learn to building complete text analysis pipelines using the highly accurate, high performant, open-source Spark NLP library in Python…more details

 

State-of-the-Art Natural Language Processing with Spark NLP image
David Talby, PhD
CTO | John Snow Labs
12:30 - 14:00
Deep Learning-Driven Text Summarization & Explainability

Workshop | NLP | Deep Learning | Intermediate-Advanced

 

NLP is one of the fastest-growing fields within AI. A wide variety of tasks can be tackled with NLP such as text classification, question-answering (e.g. chatbots), translation, topic modelling, sentiment analysis, summarization, and so on. In this workshop, we focus on text summarization, as it is not commonly showcased in tutorials despite being a powerful and challenging application of NLP. We see a trend towards pre-training Deep Learning models on a large text corpus and fine-tuning them for a specific downstream task (also known as transfer learning). In this hands-on workshop, you’ll get the opportunity to apply a state-of-the-art summarization model to generate news headlines. We finetuned this model on Reuters news data, which is professionally produced by journalists and strictly follows rules of integrity, independence and freedom from bias…more details

 

Deep Learning-Driven Text Summarization & Explainability image
Nadja Herger, PhD
Data Scientist | Thomson Reuters
Deep Learning-Driven Text Summarization & Explainability image
Nina Hristozova
Junior Data Scientist | Thomson Reuters
Deep Learning-Driven Text Summarization & Explainability image
Viktoriia Samatova
Head of Technology & Innovation | Thomson Reuters
14:15 - 15:45
Uplift Modeling Tutorial: From Predictive to Prescriptive Analytics

Workshop | Machine Learning | Research Frontiers | Intermediate


Randomized Controlled Trial (RCT) has been the gold standard for determining the Average Treatment Effect of a program for marketing, medical, economic, political, and policy applications. Uplift modeling is an additional step to uncover individuals who are most positively influenced by treatment through predictive analytics and machine learning by identifying Heterogeneous Treatment Effects. This approach enables us to identify the likely “”persuadables”” in order to maximize treatment impact through optimal target selection. It has gained significant attention in recent years from multiple fields such as personalized medicine, political election, personalized marketing, and personalized healthcare with growing publications and presentations from industry and academic experts across the world...more details

Uplift Modeling Tutorial: From Predictive to Prescriptive Analytics image
Victor Lo, PhD
Senior VP, Data Science & Artificial Intelligence | Fidelity Investments
14:15 - 15:45
Bayesian Statistics Made Simple

Workshop | Machine Learning

 

Bayesian statistical methods are becoming more common, but there are not many resources to help beginners get started. People who know Python can use their programming skills to get a head start. In this workshop, I introduce Bayesian methods using grid algorithms, which help develop understanding, and MCMC, which is a powerful algorithm for real-world problems.
As the primary example, we will estimate goal scoring rates in hockey and soccer. This example is meant to be fun, but it is also useful; the same methods apply to any system well-modeled by a Poisson process, including customers arriving at a business, requests coming in to a server, and many other applications…more details

Bayesian Statistics Made Simple image
Allen Downey, PhD
Staff Producer | Brilliant.org
14:15 - 15:45
End to End Modeling & Machine Learning

Workshop | Machine Learning | Intermediate

 

Effective predictive modeling projects follow the analytics life cycle, from data and discovery to deployment and decisions. Data scientists use a variety of tools, both commercial and open-source, to collaborate and develop enterprise applications of analytics and artificial intelligence. SAS Viya provides a unified platform to perform all these from one graphical user interface or through programming APIs. In this workshop, you will load data into memory, prepare input variables for modeling and build complex analytics pipelines to demonstrate powerful machine learning models. Need to integrate open source models? No problem. We’ll show you how you to do that and deploy any model. Then you can save and package the best performing model for deployment while keeping the ability to retrain it on new data…more details

End to End Modeling & Machine Learning image
Jordan Bakerman, PhD
Sr. Analytical Training Consultant | SAS
End to End Modeling & Machine Learning image
Ari Zitin
Analytical Training Consultant | SAS
14:15 - 15:45
StructureBoost: Gradient Boosting with Categorical Structure

Workshop | Machine Learning | Research Frontiers | Intermediate-Advanced

 

The values of a categorical variable frequently have a structure that is not ordinal or linear in nature. For example, the months of the year have a circular structure, and the US States have a geographical structure. Standard approaches such as one-hot or numerical encoding are unable to effectively exploit the structural information of such variables. In this tutorial, we will introduce the StructureBoost gradient boosting package, wherein the structure of categorical variables can be represented by a graph, and exploited to improve predictive performance. Moreover, StructureBoost can make informed predictions on categorical values for which there is little or no data, by leveraging the knowledge of the structure. We will walk through examples of how to configure and train models using StructureBoost and demonstrate other features of the package…more details

StructureBoost: Gradient Boosting with Categorical Structure image
Brian Lucena, PhD
Principal | Numeristical
14:15 - 15:45
Building a ML Serving Platform at Scale for Natural Language Processing

Tutorial | Machine Learning | NLP | Intermediate-Advanced

 

xNLP driven AI is a fast-growing technology domain with diverse applications in customer engagement, employee collaboration, marketing, and social media. Does having an accurate model by itself mean, that we have a successful product, service, or solution? No! The most difficult phase begins after that. We still have the following challenges to solve. How do we Create a sellable service around this model? How to Scale this service to handle millions of inference requests, in a cost-effective manner? How to build automated deployment pipelines for software and models? How to add security, privacy, manageability, and observability for the service? How to track model drift and analyze model performance? In this session, we will discuss these questions and explore the answers. We will go through the unique challenges that NLP serving poses and the solutions and best practices to overcome them…more details

Building a ML Serving Platform at Scale for Natural Language Processing image
Kumaran Ponnambalam
Big Data & Data Science & Analytics Leader | Cisco
14:15 - 15:45
Reinforcement Learning Research with the Dopamine Framework

Workshop | Machine Learning | ML for Programmers | Intermediate-Advanced

 

A brief introduction to Reinforcement Learning (RL), and a walkthrough of using the Dopamine library for running RL experiments…more details

Reinforcement Learning Research with the Dopamine Framework image
Pablo Samuel Castro, PhD
Staff Research Software Developer | Google
14:15 - 15:45
Introduction on Genomics Using R

Workshop | Research Frontiers | Intermediate

 

Novel single-cell transcriptome sequencing assays allow researchers to measure gene expression levels at the resolution of single cells and offer the unprecedented opportunity to investigate fundamental biological questions at the cellular level, such as stem cell differentiation or the discovery and characterization of rare cell types. The majority of the computational methods to analyze single-cell RNA-Seq data are implemented in R making it a natural tool to start working with single-cell transcriptomic datamore details

Introduction on Genomics Using R image
Fanny Perraudeau, PhD
Senior Manager, Software & Data Science | Pendulum Therapeutics
16:00 - 17:30
Session Title by Eric Xing, PhD Coming Soon!
Session Title by Eric Xing, PhD Coming Soon! image
Eric Xing, PhD
Founder & Chief Scientist, Professor | Petuum, Carnegie Mellon University
16:00 - 17:30
Taking Unique Advantage of High Missing Data Scenarios

Workshop | R Programming | Machine Learning | Intermediate

 

With customer privacy laws, missing data is becoming more of a normal situation. Many approaches are either computationally expensive, or throw out the baby with the bath water. There are certain models which allow missing data directly, and variants of common models which can be adapted to do so. Surprisingly, we can create effective predictability even in scenarios where data is missing not at random and at rates of higher than 80%. Hands on workshop using R or Python...more details

Taking Unique Advantage of High Missing Data Scenarios image
Anne Lifton
Lead Data Scientist | Logic20/20, Inc.
16:00 - 17:30
Interpretable Machine Learning with Python

Workshop | Machine Learning | Deep Learning | All Levels

 

Leveraging data to produce algorithmic rules is nothing new. However, we used to second guess these rules, and now we don’t. Therefore, a human used to be accountable, and now it’s the algorithm… But can it? How do we solve this accountability gap and ethical quandary?…more details

Interpretable Machine Learning with Python image
Serg Masis
Climate Data Scientist | Syngenta
16:00 - 17:30
Interacting with Deep Generative Models for Content Creation

Tutorial | Deep Learning | Machine Learning | Intermediate

 

Deep generative models have made great progress to synthesize data realistically in various domains such as image generation and speech synthesis. Generative modeling brings a new paradigm shift in AI from content classification and regression to content analysis and creation.  This tutorial aims at introducing the basics of deep generative models such as Variational AutoEncoder (VAEs) and Generative Adversarial Networks (GANs) as well as the user interaction with the models for content manipulation and creation. By completing this workshop, you will have an understanding of the deep generative models, their strengths and weaknesses, and their promising applications in content analytics and creation. The workshop will focus on the generative models used in image synthesis but the introduced methodology is able to be extended in different domains…more details

Interacting with Deep Generative Models for Content Creation image
Bolei Zhou, PhD
Assistant Professor, Department of Information Engineering | The Chinese University of Hong Kong
09:00 - 09:30
ODSC Keynote : Generalized Deep Reinforcement Learning for Solving Combinatorial Optimization Problems

Many problems in systems and chip design are in the form of combinatorial optimization on graph structured data. In this talk, I will motivate taking a learning based approach to combinatorial optimization problems with a focus on deep reinforcement learning (RL) agents that generalize. I will discuss our work on a new domain-transferable reinforcement learning methodology for optimizing chip placement, a long pole in hardware designmore details

ODSC Keynote : Generalized Deep Reinforcement Learning for Solving Combinatorial Optimization Problems image
Azalia Mirhoseini, PhD
Senior Research Scientist | Google Brain
09:00 - 09:30
ODSC Keynote: Are We Ready for the Era of Analytics Heterogeneity? Maybe… but the Data Says No

 

We live in the era of Analytics Heterogeneity – where analytics is not limited to one single methodology, tool or algorithm, but is able to leverage the full potential of the fast-growing and rapid-changing ecosystem of analytical solutions and technologies available. However, one byproduct of this evolution has been increasing complexity of how to turn analytics into value. Which new capabilities are needed to be successful in this new era? What can we do to be champions and convert this complexity into a competitive advantage? In this session, I will walk you through the best practices to move from sandbox to production…more details

ODSC Keynote: Are We Ready for the Era of Analytics Heterogeneity? Maybe… but the Data Says No image
Marinela Profi
Product Marketing Manager | SAS
09:00 - 09:30
ODSC Keynote: Data for Good: Ensuring the Responsible Use of Data to Benefit Society

AI for Good | Machine Learning | All Levels

 

Every field has data. We use data to discover new knowledge, to interpret the world, to make decisions, and even to predict the future. The recent convergence of big data, cloud computing, and novel machine learning algorithms and statistical methods is causing an explosive interest in data science and its applicability to all fields. This convergence has already enabled the automation of some tasks that better human performance. The novel capabilities we derive from data science will drive our cars, treat disease, and keep us safe. At the same time, such capabilities risk leading to biased, inappropriate, or unintended action. The design of data science solutions requires both excellence in the fundamentals of the field and expertise to develop applications which meet human challenges without creating even greater risk...more details

ODSC Keynote: Data for Good: Ensuring the Responsible Use of Data to Benefit Society image
Jeannette M. Wing, PhD
Avanessians Director of the Data Science Institute and Professor of Computer Science | Columbia University
09:30 - 10:00
ODSC Keynote Suchi Saria, PhD
ODSC Keynote Suchi Saria, PhD image
Suchi Saria, PhD
Director, Machine Learning & Healthcare Lab | Johns Hopkins University
09:30 - 10:00
ODSC Keynote Zoubin Ghahramani, PhD
ODSC Keynote Zoubin Ghahramani, PhD image
Zoubin Ghahramani, PhD
Distinguished Scientist and Sr Research Director | Professor of Information Engineering | ex-Chief Scientist and VP of AI | Google | University of Cambridge | Uber
09:30 - 09:55
ODSC Keynote: Our Applied AI Future
ODSC Keynote: Our Applied AI Future image
Ben Taylor, PhD
Chief AI Evangelist | DataRobot
10:00 - 10:30
ODSC Keynote: The Future of Computing is Distributed

Distributed applications are not new. The first distributed applications were developed over 50 years ago with the arrival of computer networks, such as ARPANET. Since then, developers have leveraged distributed systems to scale-out applications and services, including large-scale simulations, web serving, and big data processing. However, until recently, distributed applications have been the exception, rather than the norm. However, this is changing quicklymore details

ODSC Keynote: The Future of Computing is Distributed image
Ion Stoica, PhD
Professor | Director | University of California, Berkeley | RISELab
10:00 - 10:25
ODSC Keynote : A Secure Collaborative Learning Platform

Multiple organizations often wish to aggregate their sensitive data and learn from it, but they cannot do so because they cannot share their data. For example, banks wish to train models jointly over their aggregate transaction data to detect money launderers because criminals hide their traces across different banks. To address such problems, my students and I developed MC^2, a framework for secure collaborative computation. My talk will overview our MC^2 platform, our technical approach, results, and adoption…more details

ODSC Keynote : A Secure Collaborative Learning Platform image
Raluca Ada Popa, PhD
Assistant Professor | Co-Founder | Berkeley | PreVeil
10:30 - 11:15
Predicting Model Failures in Production

Track Keynote | Machine Learning | All levels

 

ML Models in production lose accuracy over time. A number of factors contribute to this change, like Demographic Mix, Consumer Behavior change etc. Using these models’ output resultz in incorrect decisions that could lead to catastrophic failures for the organization. This existing whitespace calls for a solution(s) to help Data Science teams predict the failure in advance...more details

Predicting Model Failures in Production image
Aravind Chandramouli, PhD
Head of Data Science | Tredence Inc.
10:30 - 11:15
Rainforest XPRIZE: Harnessing Data for Good

Talk | Machine Learning | AI for Good | All Levels

 

Rainforests are the most diverse, complex, and imperiled terrestrial ecosystems on Earth. Despite their critical importance to the livelihoods of local communities, harboring immense biodiversity, regulating climate, and being the origin of most modern medicines, rainforests are undervalued, understudied, and overexploited.

In addition, the Rainforest is dense, vast, and home to notoriously harsh environmental conditions, all presenting barriers for current technology and research. The 5-year $10M Rainforest XPRIZE attempts to solve this problem by improving our understanding of rainforest ecosystems. The competition challenges teams to integrate current and emerging technology such as data science, artificial intelligence, machine learning, robotics, and remote sensing to survey biodiversity in multiple stories of the rainforest and to use data to deliver insights in near real-time and in unprecedented detail that will reveal the true value of standing forests, ultimately leading to their sustainable use and preservation globally.

Tune in to our talk to learn more and see how you can become involved in this groundbreaking competition for the benefit of humanitymore details

Rainforest XPRIZE: Harnessing Data for Good image
Peter Houlihan
Technical Lead | Rainforest XPRIZE
10:30 - 11:15
Semantic Scholar and the Fight Against COVID-19

Track Keynote | AI for Good | Research Frontiers | All Levels

 

This talk will describe the dramatic creation of the COVID-19 Open Research Dataset (CORD-19) and the broad range of efforts, both inside and outside of the Semantic Scholar project, to garner insights into COVID-19 and its treatment based on this data...more details

Semantic Scholar and the Fight Against COVID-19 image
Oren Etzioni, PhD
CEO | Allen Institute for AI
10:30 - 11:15
The Life of Scikit-learn

Track Keynote

 

Scikit-learn was born as a tool from machine-learning geeks for computer geeks. But it has grown as an industry standard, used by many with various impacts on the world. Growing up brings new challenges. How can an open source, community-driven project address the needs of a diverse and huge user base?more details

The Life of Scikit-learn image
Gaël Varoquaux, PhD
Research Director | Director, Scikit-learn | INRIA
10:30 - 11:15
Topic-Adjusted Visibility Metric for Scientific Articles

Track Keynote | NLP | Intermediate

 

Measuring the impact of scientific articles is important for evaluating the research output of individual scientists, academic institutions and journals. While citations are raw data for constructing impact measures, there exist biases and potential issues if factors affecting citation patterns are not properly accounted for. In this work, we address the problem of field variation and introduce an article level metric useful for evaluating individual articles’ visibility. This measure derives from joint probabilistic modeling of the content in the articles and the citations among them using latent Dirichlet allocation (LDA) and the mixed membership stochastic blockmodel (MMSB). Our proposed model provides a visibility metric for individual articles adjusted for field variation in citation rates, a structural understanding of citation behavior in different fields, and article recommendations which take into account article visibility and citation patterns...more details

Topic-Adjusted Visibility Metric for Scientific Articles image
Tian Zheng, PhD
Chair, Department of Statistics | Associate Director | Columbia University | Data Science Institute
11:20 - 12:05
Codeless Reinforcement Learning: Building a Gaming AI

Talk | Machine Learning | Deep Learning |  Beginner-Intermediate

 

This session will begin by introducing the concept of Reinforcement Learning, as well as some common use cases. After the high-level introduction, a more formal mathematical framework will be introduced. Finally, a review and demonstration of the prior ideas to create a Tic-Tac-Toe playing AI, code-free, in the KNIME Analytics Platformmore details

Codeless Reinforcement Learning: Building a Gaming AI image
Corey Weisinger
Data Scientist | KNIME
11:20 - 12:05
AlphaStar: Grandmaster Level in StarCraft II Using Multi-Agent Reinforcement Learning

Track Keynote | Deep Learning | All Levels

 

Games have been used for decades as an important way to test and evaluate the performance of artificial intelligence systems. As capabilities have increased, the research community has sought games with increasing complexity that capture different elements of intelligence required to solve scientific and real-world problems. In recent years, StarCraft, considered to be one of the most challenging Real-Time Strategy (RTS) games and one of the longest-played esports of all time, has emerged by consensus as a “grand challenge” for AI research.

In this talk, I will introduce our StarCraft II program AlphaStar, the first Artificial Intelligence to reach Grandmaster status without any game restrictions. The focus will be on the technical contributions which made possible this milestone in AImore details

AlphaStar: Grandmaster Level in StarCraft II Using Multi-Agent Reinforcement Learning image
Oriol Vinyals, PhD
Principal Research Scientist | Google DeepMind