If you are an attendee, please refer the most updated schedule on live.staging6.odsc.com
Half-Day Training | R-Programming | Data Visualization | Beginner-Intermediate
Turning raw data into meaningful information and telling data driven stories is one of the great challenges of data science. When your data does not have you to speak for it in a live situation, your application needs to communicate your message clearly and provide simple interfaces and meaningful interactions to drive your message home to consumers.
In this session you will learn to use Shiny to build a dashboard from blank page to interactive application using the programming language R, the free R development environment rStudio and Redis. We will use free public data and open source libraries as we sculpt our dashboard together...more details
Half-Day Training | Data Science Kick-Starter | Beginner
This tutorial offers a comprehensive introduction to the powerful pandas library for data analysis built on top of the Python programming language. Pandas represents a great step forward for graphical spreadsheet users looking to grow their data manipulation skills. I like to call it “Excel on steroids”. By completing this workshop, you’ll have a strong foundation for using Pandas in your day-to-day data analysis needs. We’ll start out with the basics — importing datasets, selecting rows and columns, filtering rows by criteria — and progress to advanced concepts like grouping values, joining multiple datasets together, and cleaning text…more details
Half-Day Training | Deep Learning | Intermediate
This workshop hopes to convince participants that Keras is a worthwhile addition to their Machine Learning toolbelt. It teaches them how to build their own Keras models, initially using components already available in Keras, then extend them by customizing some of these components, and finally exploit the underlying Tensorflow platform for maximum flexibility and performance. They will also be able to work with the many cool (sometimes SOTA) models shared by the Keras community…more details
Half-Day Training | Machine Learning | Intermediate-Advanced
Ever wondered how quantum computers work, and how they do machine learning? With quantum computing technologies nearing the ear of commercialization and quantum advantage, machine learning has been proposed as one of the most promising applications. One of the areas in which quantum computing is showing great potential is in generative models in unsupervised and semi-supervised learning. In this training you will develop a basic understanding of quantum computing and how it can be used in machine learning models, with special emphasis on generative models. We will focus on a particular architecture, the quantum circuit Born machine (QCBM), and use it to generate a simple dataset of bars and stripes…more details
Half-Day Training | Data Visualization | Intermediate
Data visualization is fundamental to the data science process. Using plots and graphs to convey a complex idea makes your data more accessible to everyone. In this session, you will learn the fundamentals of plotting with Pandas in Jupyter by building an interactive visualization prototype that can also run as a standalone web application/dashboard. This session is for anyone who wants to be more familiar with data visualization, hands-on, with Python, Pandas, Matplotlib, interactive widgets, and Flask...more details
Half-Day Training | Deep Learning | Advanced
Although supervised learning has dominated industry machine learning implementations, unsupervised and semi-supervised methods have started to be practically applied to real world problems (outside of playing video games). Generative Adversarial Networks (GANs) are being utilized to augment data and generate dialogue, and Reinforcement Learning (RL) is helping people plan marketing campaigns and control robots. In this training, you will develop a theoretical understanding of these and other related state-of-the-art AI methods along with the hands-on skills needed to train and utilize them. You will implement a variety of models in TensorFlow for tasks including object recognition, image generation and robotics…more details
Full-Day Training | NLP | Advanced
Natural Language Processing (NLP) has recently experienced its own “ImageNet” moment. Rapidly evolving language models have enabled practitioners to decipher long lost languages, translate speech in one language to speech in another language directly without converting to text, generate long form text that adapts to the style and content of human prompts, and translate between language pairs never seen explicitly by computer systems (among many other impressive results).
In this training, you will develop a theoretical understanding of modern NLP along with the hands-on skills needed to develop state-of-the-art models. You will implement a variety of recurrent layer and transformer based architectures in both TensorFlow and PyTorch for tasks including text classification, machine translation, and predictive text…more details
Full-Day Training | Deep Learning | Machine Learning | Beginner – Intermediate
Relatively obscure a few short years ago, Deep Learning is ubiquitous today across data-driven applications as diverse as machine vision, natural language processing, and super-human game-playing.
This Deep Learning primer brings the revolutionary machine-learning approach behind contemporary artificial intelligence to life with interactive demos featuring TensorFlow 2, the major, cutting-edge revision of the world’s most popular Deep Learning library. To facilitate an intuitive understanding of Deep Learning’s artificial-neural-network foundations, the essential theory will be introduced visually and pragmatically. Paired with tips for overcoming common pitfalls and hands-on Python code run-throughs provided in straightforward Jupyter notebooks, this foundational knowledge empowers you to build powerful state-of-the-art Deep Learning models…more details
Half-Day Training | Deep Learning | Machine Learning | Beginner-Intermediate
Reinforcement Learning recently progressed greatly in the industry as one of the best techniques for sequential decision making and control policies.DeepMind used RL to greatly reduce energy consumption in Google’s data center. It has been used to do text summarization, autonomous driving, dialog systems, media advertisements and in finance by JPMorgan Chase. We are at the very beginning of the adoption of these algorithms as systems are required to operate more and more autonomously.
In this workshop we will explore Reinforcement Learning, starting from its fundamentals and ending creating our own algorithms. We will use OpenAI gym to try our RL algorithms. OpenAI is a non profit organization that want committed to open source all their research on Artificial Intelligence…more details
Half-Day Training | Kick-starter | Machine Learning | All Levels
The field of machine learning and data science has gained sudden resurgence in the last few years. The contributions of machine learning in solving data-driven problems and creating intelligent applications cannot be overemphasized. This field which intersects statistics and probability, mathematics, computer science and algorithms can be used to learn iteratively from complex data and find hidden insights. Understanding the mathematics behind machine learning allows us to choose the right algorithms for our problem, make good choices on parameter settings and validation strategies, recognize under- and over-fitting, troubleshoot ambiguous results and put appropriate confidence bounds on results...more details
Half-Day Training | NLP | Machine Learning | Beginner-Intermediate
Learn why the truly open source HPCC Systems platform is Better at Big Data and learn how ECL can empower you to build powerful data queries with ease. HPCC Systems is a comprehensive, dedicated data lake platform makes combining different types of data easier and faster than competing platforms — even data stored in massive, mixed schema data lakes — and it scales very quickly as your data needs grow...more details
Half-Day Training | Deep Learning | Machine Learning | Beginner-Intermediate
We will use OpenAI gym to try our RL algorithms. OpenAI is a non profit organization that want committed to open source all their research on Artificial Intelligence. To foster innovation OpenAI created a virtual environment, OpenAi gym, where it’s easy to test Reinforcement Learning algorithms. In particular, we will start with some popular techniques like Multi-Armed Bandit, going thought Markov Decision Processes and Dynamic Programming. We then will also explore other RL frameworks and more complex concepts like Policy gradients methods and Deep Reinforcement learning, which recently changed the field of Reinforcement Learning. In particular, we will see Actor-Critic models and Proximal Policy Optimizations that allowed OpenAI to beat some of the best Dota players. We will also provide the necessary Deep Learning concepts for the course…more details
Workshop | MLOps & Management | NLP | Intermediate-Advanced
During the workshop, the audience will gain not only a holistic overview of the TensorFlow ecosystem but will also learn the necessary steps to bring ML projects from experiments to production. With the knowledge, the participants can translate their ML projects into TFX pipelines and simplify their ML model production processes…more details
Workshop | R Programming | Intermediate
Apache Arrow is a cross-language development platform for in-memory analytics. In this tutorial, I’ll show how you can use Arrow in Python and R, both separately and together, to speed up data analysis on datasets that are bigger than memory. We’ll cover the fundamentals of Arrow in Python in R, then explore in depth Arrow’s Dataset feature, which provides for fast, efficient querying of large, multi-file datasets. Finally, we’ll discuss Flight, an Arrow-native client-server framework for transporting data, and show how to set up a server and query against it...more details
Workshop | Machine Learning | Open-source | Intermediate
Gradient boosting is a powerful machine-learning technique that achieves state-of-the-art results in a variety of practical tasks. For a number of years, it has remained the primary method for learning problems with heterogeneous features, noisy data, and complex dependencies: web search, recommendation systems, weather forecasting, and many others.
Some problems contain different types of data, including numerical, categorical and text data. In this case the best solution is either buiding new numerical features instead of text and categories and pass it to gradient boosting, or using out-of-the box solutions for that…more details
Workshop | Deep Learning | Beginner-Intermediate
Adoption of messaging communication and voice assistants has grown rapidly in the last years. This creates a huge demand for tools that speed up development of Conversational AI systems. An open-source DeepPavlov framework is created for development of multi-skill conversational agents. It prioritizes efficiency, modularity, and extensibility with the goal to make it easier to develop dialogue systems from scratch and with limited data available. It also supports modular as well as end-to-end approaches to the implementation of conversational agents...more details
Workshop | Machine Learning | ML for Programmers | Beginner-Intermediate
The talk will emphasize both data preparation (ETL) and machine learning operations, with a hands-on demonstration of porting a typical workflow from CPU to GPU and measuring the speedup. We’ll go into more detail on real-world applications taking advantage of these speed improvements, including hyperparameter optimization for machine learning models, single cell genomics analysis, and applications in finance. For large-data users, we’ll discuss some of the options for scaling RAPIDS to multiple GPUs or multiple nodes, emphasizing the tight integration with the Dask ecosystem…more details
Tutorial | Machine Learning | Data Visualization | Beginner-Intermediate
Data scientists often get asked questions of the form “Does X Drive Y”: (1) did recent PR coverage drive sign ups, (2) does customer support increase sales, or (3) did improving the recommendation model drive revenue? Supporting company stakeholders requires every data scientist to learn techniques that can answer questions like these, which are centered around issues of causality. Often times we cannot use AB testing to answer these questions and must turn to causal inference techniques instead…more details
Workshop | Deep Learning | Data Visualization | Intermediate
In this hands-on tutorial, we will introduce attendees to the siVAE (scalable, interpretable VAE) model that infers a set of factor loadings that explicitly map latent dimensions to the input features that define them, during training of the VAE model. Using standard datasets from computer vision (MNIST, Fashion-MNIST and CIFAR-10), we will walk attendees through the process of training the siVAE model, visualizing the sample embeddings inferred by classic VAEs, and extracting and visualizing the features that contribute to individual latent dimensions. We will also teach attendees how to estimate and visualize feature awareness, a new metric for measuring the overall importance of individual features for embedding a sample in the latent space. At the end of the tutorial, attendees will be able to train an siVAE model on their own datasets and interpret and visualize the latent dimensions inferred…more details
Workshop | Machine Learning | Beginner-Intermediate
Most deployed Machine Learning models use Supervised Learning powered by human training data. Selecting the right data for human review is known as Active Learning. This talk will introduce a set of Active Learning methods that help you understand where your model is currently confused (Uncertainty Sampling) and to identify gaps in your model knowledge (Diversity Sampling). We’ll cover techniques that are only a few lines of code through to techniques that build on recent advances in transfer learning. We’ll use code examples from my open source PyTorch Active Learning library...more details
Tutorial | NLP | Machine Learning | Intermediate
This is a hands-on tutorial on applying the latest advances in deep learning and transfer learning for common NLP tasks such as named entity recognition, document classification, spell checking, and sentiment analysis. Learn to building complete text analysis pipelines using the highly accurate, high performant, open-source Spark NLP library in Python.This is a hands-on tutorial on applying the latest advances in deep learning and transfer learning for common NLP tasks such as named entity recognition, document classification, spell checking, and sentiment analysis. Learn to building complete text analysis pipelines using the highly accurate, high performant, open-source Spark NLP library in Python…more details
Workshop | NLP | Deep Learning | Intermediate-Advanced
NLP is one of the fastest-growing fields within AI. A wide variety of tasks can be tackled with NLP such as text classification, question-answering (e.g. chatbots), translation, topic modelling, sentiment analysis, summarization, and so on. In this workshop, we focus on text summarization, as it is not commonly showcased in tutorials despite being a powerful and challenging application of NLP. We see a trend towards pre-training Deep Learning models on a large text corpus and fine-tuning them for a specific downstream task (also known as transfer learning). In this hands-on workshop, you’ll get the opportunity to apply a state-of-the-art summarization model to generate news headlines. We finetuned this model on Reuters news data, which is professionally produced by journalists and strictly follows rules of integrity, independence and freedom from bias…more details
Workshop | Machine Learning
Bayesian statistical methods are becoming more common, but there are not many resources to help beginners get started. People who know Python can use their programming skills to get a head start. In this workshop, I introduce Bayesian methods using grid algorithms, which help develop understanding, and MCMC, which is a powerful algorithm for real-world problems.
As the primary example, we will estimate goal scoring rates in hockey and soccer. This example is meant to be fun, but it is also useful; the same methods apply to any system well-modeled by a Poisson process, including customers arriving at a business, requests coming in to a server, and many other applications…more details
Workshop | Machine Learning | Intermediate
Effective predictive modeling projects follow the analytics life cycle, from data and discovery to deployment and decisions. Data scientists use a variety of tools, both commercial and open-source, to collaborate and develop enterprise applications of analytics and artificial intelligence. SAS Viya provides a unified platform to perform all these from one graphical user interface or through programming APIs. In this workshop, you will load data into memory, prepare input variables for modeling and build complex analytics pipelines to demonstrate powerful machine learning models. Need to integrate open source models? No problem. We’ll show you how you to do that and deploy any model. Then you can save and package the best performing model for deployment while keeping the ability to retrain it on new data…more details
Workshop | Research Frontiers | Intermediate
Novel single-cell transcriptome sequencing assays allow researchers to measure gene expression levels at the resolution of single cells and offer the unprecedented opportunity to investigate fundamental biological questions at the cellular level, such as stem cell differentiation or the discovery and characterization of rare cell types. The majority of the computational methods to analyze single-cell RNA-Seq data are implemented in R making it a natural tool to start working with single-cell transcriptomic data…more details
Workshop | Machine Learning | Research Frontiers | Intermediate-Advanced
The values of a categorical variable frequently have a structure that is not ordinal or linear in nature. For example, the months of the year have a circular structure, and the US States have a geographical structure. Standard approaches such as one-hot or numerical encoding are unable to effectively exploit the structural information of such variables. In this tutorial, we will introduce the StructureBoost gradient boosting package, wherein the structure of categorical variables can be represented by a graph, and exploited to improve predictive performance. Moreover, StructureBoost can make informed predictions on categorical values for which there is little or no data, by leveraging the knowledge of the structure. We will walk through examples of how to configure and train models using StructureBoost and demonstrate other features of the package…more details
Workshop | Machine Learning | Research Frontiers | Intermediate
Randomized Controlled Trial (RCT) has been the gold standard for determining the Average Treatment Effect of a program for marketing, medical, economic, political, and policy applications. Uplift modeling is an additional step to uncover individuals who are most positively influenced by treatment through predictive analytics and machine learning by identifying Heterogeneous Treatment Effects. This approach enables us to identify the likely “”persuadables”” in order to maximize treatment impact through optimal target selection. It has gained significant attention in recent years from multiple fields such as personalized medicine, political election, personalized marketing, and personalized healthcare with growing publications and presentations from industry and academic experts across the world...more details
Tutorial | Machine Learning | NLP | Intermediate-Advanced
xNLP driven AI is a fast-growing technology domain with diverse applications in customer engagement, employee collaboration, marketing, and social media. Does having an accurate model by itself mean, that we have a successful product, service, or solution? No! The most difficult phase begins after that. We still have the following challenges to solve. How do we Create a sellable service around this model? How to Scale this service to handle millions of inference requests, in a cost-effective manner? How to build automated deployment pipelines for software and models? How to add security, privacy, manageability, and observability for the service? How to track model drift and analyze model performance? In this session, we will discuss these questions and explore the answers. We will go through the unique challenges that NLP serving poses and the solutions and best practices to overcome them…more details
Workshop | Machine Learning | ML for Programmers | Intermediate-Advanced
A brief introduction to Reinforcement Learning (RL), and a walkthrough of using the Dopamine library for running RL experiments…more details
Workshop | R Programming | Machine Learning | Intermediate
With customer privacy laws, missing data is becoming more of a normal situation. Many approaches are either computationally expensive, or throw out the baby with the bath water. There are certain models which allow missing data directly, and variants of common models which can be adapted to do so. Surprisingly, we can create effective predictability even in scenarios where data is missing not at random and at rates of higher than 80%. Hands on workshop using R or Python...more details
Workshop | Machine Learning | Deep Learning | All Levels
Leveraging data to produce algorithmic rules is nothing new. However, we used to second guess these rules, and now we don’t. Therefore, a human used to be accountable, and now it’s the algorithm… But can it? How do we solve this accountability gap and ethical quandary?…more details
Tutorial | Deep Learning | Machine Learning | Intermediate
Deep generative models have made great progress to synthesize data realistically in various domains such as image generation and speech synthesis. Generative modeling brings a new paradigm shift in AI from content classification and regression to content analysis and creation. This tutorial aims at introducing the basics of deep generative models such as Variational AutoEncoder (VAEs) and Generative Adversarial Networks (GANs) as well as the user interaction with the models for content manipulation and creation. By completing this workshop, you will have an understanding of the deep generative models, their strengths and weaknesses, and their promising applications in content analytics and creation. The workshop will focus on the generative models used in image synthesis but the introduced methodology is able to be extended in different domains…more details
Many problems in systems and chip design are in the form of combinatorial optimization on graph structured data. In this talk, I will motivate taking a learning based approach to combinatorial optimization problems with a focus on deep reinforcement learning (RL) agents that generalize. I will discuss our work on a new domain-transferable reinforcement learning methodology for optimizing chip placement, a long pole in hardware design…more details
We live in the era of Analytics Heterogeneity – where analytics is not limited to one single methodology, tool or algorithm, but is able to leverage the full potential of the fast-growing and rapid-changing ecosystem of analytical solutions and technologies available. However, one byproduct of this evolution has been increasing complexity of how to turn analytics into value. Which new capabilities are needed to be successful in this new era? What can we do to be champions and convert this complexity into a competitive advantage? In this session, I will walk you through the best practices to move from sandbox to production…more details
AI for Good | Machine Learning | All Levels
Every field has data. We use data to discover new knowledge, to interpret the world, to make decisions, and even to predict the future. The recent convergence of big data, cloud computing, and novel machine learning algorithms and statistical methods is causing an explosive interest in data science and its applicability to all fields. This convergence has already enabled the automation of some tasks that better human performance. The novel capabilities we derive from data science will drive our cars, treat disease, and keep us safe. At the same time, such capabilities risk leading to biased, inappropriate, or unintended action. The design of data science solutions requires both excellence in the fundamentals of the field and expertise to develop applications which meet human challenges without creating even greater risk...more details
Distributed applications are not new. The first distributed applications were developed over 50 years ago with the arrival of computer networks, such as ARPANET. Since then, developers have leveraged distributed systems to scale-out applications and services, including large-scale simulations, web serving, and big data processing. However, until recently, distributed applications have been the exception, rather than the norm. However, this is changing quickly…more details
Multiple organizations often wish to aggregate their sensitive data and learn from it, but they cannot do so because they cannot share their data. For example, banks wish to train models jointly over their aggregate transaction data to detect money launderers because criminals hide their traces across different banks. To address such problems, my students and I developed MC^2, a framework for secure collaborative computation. My talk will overview our MC^2 platform, our technical approach, results, and adoption…more details
Track Keynote | Machine Learning | All levels
ML Models in production lose accuracy over time. A number of factors contribute to this change, like Demographic Mix, Consumer Behavior change etc. Using these models’ output resultz in incorrect decisions that could lead to catastrophic failures for the organization. This existing whitespace calls for a solution(s) to help Data Science teams predict the failure in advance...more details
Talk | Machine Learning | AI for Good | All Levels
Rainforests are the most diverse, complex, and imperiled terrestrial ecosystems on Earth. Despite their critical importance to the livelihoods of local communities, harboring immense biodiversity, regulating climate, and being the origin of most modern medicines, rainforests are undervalued, understudied, and overexploited.
In addition, the Rainforest is dense, vast, and home to notoriously harsh environmental conditions, all presenting barriers for current technology and research. The 5-year $10M Rainforest XPRIZE attempts to solve this problem by improving our understanding of rainforest ecosystems. The competition challenges teams to integrate current and emerging technology such as data science, artificial intelligence, machine learning, robotics, and remote sensing to survey biodiversity in multiple stories of the rainforest and to use data to deliver insights in near real-time and in unprecedented detail that will reveal the true value of standing forests, ultimately leading to their sustainable use and preservation globally.
Tune in to our talk to learn more and see how you can become involved in this groundbreaking competition for the benefit of humanity…more details
Track Keynote | AI for Good | Research Frontiers | All Levels
This talk will describe the dramatic creation of the COVID-19 Open Research Dataset (CORD-19) and the broad range of efforts, both inside and outside of the Semantic Scholar project, to garner insights into COVID-19 and its treatment based on this data...more details
Track Keynote
Scikit-learn was born as a tool from machine-learning geeks for computer geeks. But it has grown as an industry standard, used by many with various impacts on the world. Growing up brings new challenges. How can an open source, community-driven project address the needs of a diverse and huge user base?…more details
Track Keynote | NLP | Intermediate
Measuring the impact of scientific articles is important for evaluating the research output of individual scientists, academic institutions and journals. While citations are raw data for constructing impact measures, there exist biases and potential issues if factors affecting citation patterns are not properly accounted for. In this work, we address the problem of field variation and introduce an article level metric useful for evaluating individual articles’ visibility. This measure derives from joint probabilistic modeling of the content in the articles and the citations among them using latent Dirichlet allocation (LDA) and the mixed membership stochastic blockmodel (MMSB). Our proposed model provides a visibility metric for individual articles adjusted for field variation in citation rates, a structural understanding of citation behavior in different fields, and article recommendations which take into account article visibility and citation patterns...more details
Talk | Machine Learning | Deep Learning | Beginner-Intermediate
This session will begin by introducing the concept of Reinforcement Learning, as well as some common use cases. After the high-level introduction, a more formal mathematical framework will be introduced. Finally, a review and demonstration of the prior ideas to create a Tic-Tac-Toe playing AI, code-free, in the KNIME Analytics Platform…more details
Track Keynote | Deep Learning | All Levels
Games have been used for decades as an important way to test and evaluate the performance of artificial intelligence systems. As capabilities have increased, the research community has sought games with increasing complexity that capture different elements of intelligence required to solve scientific and real-world problems. In recent years, StarCraft, considered to be one of the most challenging Real-Time Strategy (RTS) games and one of the longest-played esports of all time, has emerged by consensus as a “grand challenge” for AI research.
In this talk, I will introduce our StarCraft II program AlphaStar, the first Artificial Intelligence to reach Grandmaster status without any game restrictions. The focus will be on the technical contributions which made possible this milestone in AI…more details
Talk | Machine Learning | Beginner-Intermediate
In this talk, ZS will showcase:
How to use graph neural networks to learn rich, latent representations for customers (customer embeddings) to encode interactions and behaviors.
Graph neural network’s superior performance on a variety of customer-level ML tasks such as prediction of brand adoption (pre-launch, early launch), channel preferences, and engagement with digital channels (e.g. email) compared to existing approaches…more details
Track Keynote
As Deep Neural Nets have come forward to provide the best accuracy on an ever-increasing number of tasks in computer vision, audio processing, natural language processing, and recommendation systems, efficient training, and inference of Deep Neural Nets have naturally emerged as critical computing challenges of our time. While these Deep Neural Net computations are hosted on a very diverse range of processors, from hyperscale systems in datacenters to microcontrollers at the edge, there are a common set of underlying principles for making Deep Neural Nets fast and energy-efficient on these hosts…more details