more sessions added weekly
The prerequisites to the workshop and training sessions are available HERE
Please review the final schedule:
– for in-person: download ODSC Events in the App Store (CrowdCompass)
– for virtual : live.odsc.com (agenda section)
Virtual | Keynote | Responsible AI and Social Good
AI is ever more ubiquitous in our lives but all countries are not created equal in their access to or use of AI. Likewise all countries and businesses do not adhere to the same regulatory frameworks or opinions on governance. Yet all companies would benefit from knowing where they stand so that investment in technology is not ultimately wasted. Likewise, access to AI is being used as a geopolitical tool. What lessons are we able to draw and adopt now and how might this thinking mature into the future…more details
Kay Firth-Butterfield is Head of Artificial Intelligence and a member of the Executive Committee at the World Economic Forum and is one of the foremost experts in the world on the governance of AI. She is a Barrister, former Judge and Professor, technologist and entrepreneur who has an abiding interest in how humanity can equitably benefit from new technologies, especially AI. Kay is an Associate Barrister (Doughty Street Chambers), Master of the Inner Temple, London and serves on the Lord Chief Justice’s Advisory Panel on AI and Law. She co-founded AI Global and was the world’s first Chief AI Ethics officer in 2014 and created the AIEthics twitter hashtag. Kay is Vice-Chair of The IEEE Global Initiative for Ethical Considerations in Artificial Intelligence and Autonomous Systems and was part of the group which met at Asilomar to create the Asilomar AI Ethical Principles. She is on the Polaris Council for the Government Accountability Office (USA), the Advisory Board for UNESCO International Research Centre on AI and AI4All. Kay has advanced degrees in Law and International Relations and regularly speaks to international audiences addressing many aspects of the beneficial and challenging technical, economic and social changes arising from the use of AI. She has been consistently recognized as a leading woman in AI since 2018 and was featured in the New York Times as one of 10 Women Changing the Landscape of Leadership.
In-person | Keynote | Machine Learning | Deep Learning | All Levels
Statistical decisions are often given meaning in the context of other decisions, particularly when there are scarce resources to be shared. Managing such sharing is one of the classical goals of microeconomics, and it is given new relevance in the modern setting of large, human-focused datasets, and in data-analytic contexts such as classifiers and recommendation systems…more details
Michael I. Jordan is the Pehong Chen Distinguished Professor in the Department of Electrical Engineering and Computer Science and the Department of Statistics at the University of California, Berkeley. His research interests bridge the computational, statistical, cognitive, and biological sciences; in recent years, he has focused on Bayesian nonparametric analysis, probabilistic graphical models, spectral methods, kernel machines, and applications to problems in distributed computing systems, natural language processing, signal processing, and statistical genetics. Previously, he was a professor at MIT. Michael is a member of the National Academy of Sciences, the National Academy of Engineering, and the American Academy of Arts and Sciences and a fellow of the American Association for the Advancement of Science, the AAAI, ACM, ASA, CSS, IEEE, IMS, ISBA, and SIAM. He has been named a Neyman Lecturer and a Medallion Lecturer by the Institute of Mathematical Statistics. He received the David E. Rumelhart Prize in 2015 and the ACM/AAAI Allen Newell Award in 2009. Michael holds a master’s degree in mathematics from Arizona State University and a PhD in cognitive science from the University of California, San Diego.
Virtual | Keynote
Session Abstract Coming Soon!
Jun Zeng is HP’s Distinguished Technologist and founding manager of the 3D Digital Twin group. Jun has 20 years of industrial experiences in creating and commercializing software for improving cyber-physical systems. His publications include a co-edited book on computer-aided Design and a co-authored book on digital factory, and 50+ peer-reviewed papers. He has 58 U.S. patents granted and more pending. His academic training includes Ph.D. in mechanical engineering and M.S. in computer science, both from Johns Hopkins University. He is ACM member, and IEEE senior member.
In-person | AiX Keynote | Cross Industry | All Levels
Younes Ben Brahim is a senior Product Marketing Manager at Red Hat and focuses on AI/ML, data analytics and HPC solutions on OpenShift. Previously, Younes worked as a product manager for NetApp and held various roles in Sales, and Consulting at Cisco, Nokia and Perficient. Younes holds a Bachelor of Science from Colorado State University and an MBA from the University of Denver.
Scott McClellan is a senior director of product management at NVIDIA, focused on data science workflows. Before joining NVIDIA, Scott was the chief technology officer of PRGX Inc. He has been chief technologist and led engineering and product development at companies including RedHat and HP, where he guided strategies across HPC, cloud, big data and AI solutions. Scott holds a Bachelor of Science from the University of Iowa.
Over the past decade the computation demands of machine learning (ML) workloads have grown much faster than the capabilities of a single processor, including hardware accelerators such as GPUs and TPUs. As a result researchers and practitioners have been left with no choice but to distribute these workloads. Unfortunately, developing distributed applications is very challenging. In this talk I will present two projects we developed at UC Berkeley, Ray (https://github.com/ray-project/ray) and Alpa (https://github.com/alpa-projects/alpa), that dramatically simplify scaling ML workloads..more details
Ion Stoica is a Professor in the EECS Department at University of California at Berkeley. He does research on cloud computing and networked computer systems. Past work includes Apache Spark, Apache Mesos, Tachyon, Chord DHT, and Dynamic Packet State (DPS). He is an ACM Fellow and has received numerous awards, including the SIGOPS Hall of Fame Award (2015), the SIGCOMM Test of Time Award (2011), and the ACM doctoral dissertation award (2001). In 2013, he co-founded Databricks a startup to commercialize technologies for Big Data processing, and in 2006 he co-founded and Conviva, a startup to commercialize technologies for large scale video distribution.
Virtual | Keynote | NLP | All Levels
As natural language processing now permeates many different applications, its practical use is unquestionable. However, at the same time NLP is still imperfect, and errors cause everything from minor inconveniences to major PR disasters. Better understanding when our NLP models work and when they fail is critical to the efficient and reliable use of NLP in real-world scenarios. So how can we do so? In this talk I will discuss two issues: automatic evaluation of generated text, and automatic fine-grained analysis of NLP system results, which are some first steps towards a science of NLP model evaluation…more details
Graham Neubig is an associate professor at the Language Technologies Institute of Carnegie Mellon University. His research focuses on multilingual natural language processing, natural language interfaces to computers, and machine learning methods for NLP, with the final goal of every person in the world being able to communicate with each-other, and with computers in their own language. He also contributes to making NLP research more accessible through open publishing of research papers, advanced NLP course materials and video lectures, and open-source software, all of which are available on his web site.
In this session, we will dive deep into Feathr, taking you on a journey into this scalable open-source feature store which has now joined the Linux Foundation AI and Data ecosystem. Feathr has been battle-tested in LinkedIn powering high scale ML applications, supporting 100s of training and inferencing pipelines. This enables feature sharing among teams, leading to significant business metrics gain…more details
Dr. Inchiosa’s passion for AI drives his work as Principal Data Scientist Manager in Azure Data’s Advanced Workload Engineering team, where he leads a team of data scientists focused on AI-led co-innovation engagements with strategic customers and partners. Previously, Mario served as Revolution Analytics’ Chief Scientist and as Analytics Architect in IBM’s Big Data organization, where he worked on advanced analytics in Hadoop, Teradata, and R. Prior to that, Mario was US Chief Scientist in Netezza Labs, bringing advanced analytics and R integration to Netezza’s SQL-based data warehouse appliances. He also served as US Chief Science Officer at NuTech Solutions, a computer science consultancy specializing in simulation, optimization, and data mining, and Senior Scientist at BiosGroup, a complexity science spin-off of the Santa Fe Institute. Mario holds Bachelor’s, Master’s, and PhD degrees in Physics from Harvard University. He has been awarded four patents and has published over 30 research papers, earning Publication of the Year and Open Literature Publication Excellence awards.
Virtual | Keynote | Machine Learning | Deep Learning | All Levels
In-person | Business Talk | Cross Industry | Beginner – Intermediate
I will talk about practical methods to model survey data effectively to avoid cheaters and self-attestation tendencies to lie, using statistics, analytics, and ML/AI. In my talk, I will explain how to treat surveys as big data and combine them with additional data sources for effective modeling. Using benchmarking techniques and novel modeling approaches, we may find the underlying gems in this somewhat old-fashioned yet still relevant information source…more details
Anna Litvak-Hinenzon is SVP, Global Head of Data Science at The RepTrak Company. She leads RepTrak’s global international data organization, providing clients with actionable data insights on Reputation, Brand, and ESG. As a data and technology leader for over 15 years, Anna helps organizations to achieve goals with data products powered by cutting-edge machine learning and AI models leveraging multiple data sources. Anna is a passionate digital transformation leader, an author of numerous patents and papers, with a Ph.D. in Applied Mathematics in her background.
Virtual | Talk
The session will cover specific actions leaders can take and offer real life examples and use cases. Attendees will walk away with a deeper understanding of how to avoid common pitfalls, how to improve team collaboration and reproducibility in data workflows…more details
Anna Filippova tends to the dbt Community garden of over 25,000 at dbt Labs as the Director of Community. Prior to dbt Labs, Anna built the first Analytics Engineering team at GitHub. Today, she writes about the intersection of modern data tools and open source in the Analytics Engineering Roundup.
In her past life, Anna published research on building, maintaining and sustaining open source communities. She has also studied how distributed and open source communities worked, fought and learned in a Postdoc at Carnegie Mellon, and acquired a PhD in Communication and Media from the National University of Singapore. From time to time you can find Anna traveling the coast of California and working from her campervan and she is always open to an AMA session.
Virtual | Talk | Machine Learning Safety and Security | Research Frontiers | Beginner-Intermediate
In this lecture, we will describe a new technique to address both these problems: a way to produce prediction sets for arbitrary black-box prediction methods that have correct empirical coverage even when the data distribution might change in arbitrary, unanticipated ways and such that we have correct coverage even when we zoom in to focus on demographic groups that can be arbitrary and intersecting…more details
Aaron Roth is the Henry Salvatori Professor of Computer and Cognitive Science, in the Computer and Information Sciences department at the University of Pennsylvania, with a secondary appointment in the Wharton statistics department. He is affiliated with the Warren Center for Network and Data Science, and co-director of the Networked and Social Systems Engineering (NETS) program. He is also an Amazon Scholar at Amazon AWS. He is the recipient of a Presidential Early Career Award for Scientists and Engineers (PECASE) awarded by President Obama in 2016, an Alfred P. Sloan Research Fellowship, an NSF CAREER award, and research awards from Yahoo, Amazon, and Google. His research focuses on the algorithmic foundations of data privacy, algorithmic fairness, game theory, learning theory, and machine learning. Together with Cynthia Dwork, he is the author of the book “The Algorithmic Foundations of Differential Privacy.” Together with Michael Kearns, he is the author of “The Ethical Algorithm”.
Virtual | Talk | Deep Learning | Machine Learning | All Levels
In this session, you will learn what vector search is, why and when you would need it and you will see vector search in action during live demos…more details
Zain Hasan is a Senior Developer Advocate at SeMI Technologies – the company behind the Weaviate vector search engine. He is an engineer and data scientist by training, who pursued his undergraduate and graduate work at the University of Toronto building artificially intelligent assistive technologies for elderly patients. He then founded his company developing a digital health platform that leveraged machine learning to remotely monitor chronically ill patients using data from their medical devices. More recently he practiced as a consultant senior data scientist in Toronto. He is passionate about the field of data science and machine learning and loves to share his love for the field with anyone interested in the domain.
In-person | Keynote | NLP | Machine Learning | Deep Learning | All Levels
What if computers can truly converse with us in our native tongue? Computers will transform into effective, personalized assistants for everybody. Commercial chatbots today are notoriously brittle as they are hardcoded to handle a few possible choices of user inputs. Recently introduced large language neural models, such as GPT-3, are remarkably fluent, but they are prone to hallucinations, often producing incorrect statements. This talk describes how we can tame these neural models into robust, trustworthy, and cost-effective conversational agents across all industries and languages…more details
Monica Lam is a Professor in the Computer Science Department at Stanford University since 1988. She is the faculty director of the Open Virtual Assistant Lab (OVAL). She received a B.Sc. from University of British Columbia in 1980 and a Ph.D. in Computer Science from Carnegie Mellon University in 1987. Monica is a Member of the National Academy of Engineering and an ACM Fellow. She is a co-author of the popular text Compilers, Principles, Techniques, and Tools (2nd Edition), also known as the Dragon book. Professor Lam’s current research is on conversational virtual assistants with an emphasis on privacy protection. Her research uses deep learning to map task-oriented natural language dialogues into formal semantics, represented by a new executable programming language called ThingTalk. Her Almond virtual assistant, trained on open knowledge graphs and IoT API standards, can be easily customized to perform new tasks. She is leading an Open Virtual Assistant Initiative to create the largest, open, crowdsourced language semantics model to promote open access in all languages. Her decentralized Almond virtual assistant that supports fine-grain sharing with privacy has received Popular Science’s Best of What’s New Award in Security in 2019.
Prof. Lam is also an expert in compilers for high-performance machines. Her pioneering work of affine partitioning provides a unifying theory to the field of loop transformations for parallelism and locality. Her software pipelining algorithm is used in commercial systems for instruction level parallelism. Her research team created the first, widely adopted research compiler, SUIF. Her contributions in computer architecture include the CMU Warp Systolic Array and the Stanford DASH Distributed Memory Multiprocessor. She was on the founding team of Tensilica, now a part of Cadence.
She received an NSF Young Investigator award in 1992, the ACM Most Influential Programming Language Design and Implementation Paper Award in 2001, an ACM SIGSOFT Distinguished Paper Award in 2002, the ACM Programming Language Design and Implementation Best Paper Award in 2004, the ACM SIGARCH/SIGPLAN/SIGOPS ASPLOS Influential Paper Awards in two consecutive years, 2021 and 2022. She was the author of two of the papers in “20 Years of PLDI–a Selection (1979-1999)”, and one paper in the “25 Years of the International Symposia on Computer Architecture”. She received the University of British Columbia Computer Science 50th Anniversary Research Award in 2018.
Greg Michaelson is Cofounder and Chief Product Officer at Zerve, a young, stealthy startup that’s rethinking the data science development experience. Previously, Greg was an early joiner at DataRobot where he played many roles, including Chief Customer Officer. Prior to that, he worked as a data scientist in the financial sector after earning a PhD in Applied Statistics from the University of Alabama. In his spare time, Greg manufactures a line of flavored breakfast cereal toppings called Cerup. He lives in Spring Creek, Nevada with his wife, four children, and two Clumber Spaniels.
In-person | Keynote
In-person | Business Talk | Cross Industry | Responsible Ai and Social Good | All | Beginner
We are at a pivotal time in our AI development and adoption where we still have the ability to create a world where AI is a force for good, instead of world where AI is used deepen inequalities and divides. To do this, we much create an ethical AI environment to move forward…more details
Sadie St Lawrence is the Founder and CEO of Women in Data, a community of 30,000+ data leaders, practitioners, and citizens whose mission is to increase diversity in data careers. Women in Data has been named a Top 50 Leading Company of The Year, and has been rated as the #1 community for Women in AI and Tech. Sadie has trained over 400,000 people in data science and has developed multiple programs in machine learning and career development. Sadie has been awarded, Top 30 Most Inspiring Women in AI, Top 10 Most Admired Businesswomen to Watch in 2021, Top 21 Influencer in Data, and is the recipient of the Outstanding Service Award from UC Davis. In addition, she serves on boards, and is the host of the Data Bytes podcast.
Virtual | Talk | Responsible AI | Research Frontier | Intermediate
Our world faces increasingly complex challenges: we destabilized the climate, haven’t beaten all diseases, and haven’t spread the values of democracy and freedom to large parts of the globe, where violence and riots reign supreme. The world must be fixed in our generation – everyone would agree. But in order to take action, build a plan, we need to see the complete picture, and empower decision makers with tools to make those changes. This decade, we have finally reached a critical amount of data to facilitate the creation of such tools…more details
Dr. Kira Radinsky is the CEO and CTO of Diagnostic Robotics, where the most advanced technologies in the field of artificial intelligence are harnessed to make healthcare better, cheaper, and more widely available. In the past, she co-founded SalesPredict, acquired by eBay in 2016, and served as eBay director of data science and IL chief scientist. One of the up-and-coming voices in the data science community, she is pioneering the field of medical data mining. Dr. Radinsky gained international recognition for her work at Microsoft Research, where she developed predictive algorithms that recognized the early warning signs of globally impactful events, including political riots and disease epidemics. In 2013, she was named to the MIT Technology Review’s 35 Young Innovators Under 35, in 2015 as Forbes 30 under 30 rising stars in enterprise technology, and in 2016 selected as “woman of the year” by Globes. She is a frequent presenter at global tech events, including TEDx, Wired, Strata Data Science, Techcrunch and academic conferences, and she publishes in the Harvard Business Review. Radinsky serves as a board member in: Israel Securities Authority, Maccabi Research Institute, and technology board of HSBC bank. Dr. Radinsky also serves as visiting professor at the Technion, Israel’s leading science and technology institute, where she focuses on the application of predictive data mining in medicine.
Guy is currently the VP of Data Science at Diagnostic Robotics. He holds an M.Sc. in computer science from the Technion, under the supervision of DR. Kira Radinsky. Formerly he worked as a data scientist at Mellanox (acquired by Nvidia).
Virtual | Talk | Machine Learning Safety and Security | Machine Learning | Beginner-Intermediate
In this talk, we will present the challenges of learning from dirty data, overview data poisoning attacks on different systems like Spam detection, image classification and rating systems, discuss the problem of learning from web traffic – probably the dirtiest data in the world, and explain different approaches for learning from dirty data and poisoned data. We will focus on threshold-learning mitigation for data poisoning, aiming to reduce the impact of any single data source, and discuss a mundane but crucial aspect of threshold learning – memory complexity. We will present a robust learning scheme optimized to work efficiently on streamed data with bounded memory consumption. We will give examples from the web security arena with robust learning of URLs, parameters, character sets, cookies and more…more details
Experienced Data Scientist and Tech Lead at Imperva’s threat research group where I work on creating machine learning algorithms to help protect our customers against web app and DDoS attacks. Before joining Imperva, I obtained a B.Sc and M.Sc in Bioinformatics from Bar Ilan University.
Virtual | Talk | MLOps and Data Engineering
In this talk, I discuss four tactics that enable successful enterprise analytics efforts. The first concerns data integration. Because essentially all enterprise data resides in data silos, an integration effort is required before meaningful cross-silo analysis is possible. Data science practitioners routinely report spending at least 80% of their time doing “data preparation” (aka data munging)…more details
Dr. Stonebraker has been a pioneer of data base research and technology for more than forty years. He was the main architect of the INGRES relational DBMS, and the object-relational DBMS, POSTGRES. These prototypes were developed at the University of California at Berkeley where Stonebraker was a Professor of Computer Science for twenty five years. More recently at M.I.T. he was a co-architect of the Aurora/Borealis stream processing engine, the C-Store column-oriented DBMS, the H-Store transaction processing engine, the SciDB array DBMS, and the Data Tamer data curation system. Presently he serves as Chief Technology Officer of Hopara and Tamr, Inc.
Professor Stonebraker was awarded the ACM System Software Award in 1992 for his work on INGRES. Additionally, he was awarded the first annual SIGMOD Innovation award in 1994, and was elected to the National Academy of Engineering in 1997. He was awarded the IEEE John Von Neumann award in 2005 and the 2014 Turing Award, and is presently an Adjunct Professor of Computer Science at M.I.T.
Virtual | Talk | Machine Learning | Deep Learning | Intermediate
Federated learning is a growing field that attempts to address this challenge by distributing learning and analytics tasks to end-user devices. Although theoretical federated learning research is growing exponentially to meet these challenges, we are far from putting those theories into practice. In this talk, I will introduce FedScale, a scalable and extensible open-source federated learning and analytics platform. It provides high-level APIs to implement algorithms, a modular design to customize implementations for diverse hardware and software backends, and the ease of deploying the same code at many scales. FedScale also includes a comprehensive benchmark that allows data scientists to evaluate their ideas in realistic, large-scale settings. I will highlight a select few systems successfully built using FedScale and share insights from benchmarking recent algorithms using FedScale…more details
Mosharaf Chowdhury is a Morris Wellman associate professor of CSE at the University of Michigan, Ann Arbor, where he leads the SymbioticLab. His work improves application performance and system efficiency of machine learning and big data workloads. He is also building software solutions to monitor and optimize the impact of machine learning systems on energy consumption and data privacy. His group developed Infiniswap, the first scalable software solution for memory disaggregation; Salus, the first software-only GPU sharing system for deep learning; FedScale, the largest federated learning benchmark and a scalable and extensible federated learning engine; and Zeus, the first GPU energy-vs-training performance tradeoff optimizer for DNN training. In the past, Mosharaf did seminal works on coflows and virtual network embedding, and he was a co-creator of Apache Spark. He has received many individual awards and fellowships, thanks to his stellar students and collaborators. His works have received seven paper awards from top venues, including NSDI, OSDI, and ATC, and over 22,000 citations. Mosharaf received his Ph.D. from UC Berkeley in 2015.
Virtual | Talk | Machine Learning Safety and Security | Responsible AI | Beginner
There are many types of users and stakeholders that require Explainable AI. Without explanations, end-users are less likely to trust and adopt ML-based technologies. Without a means of understanding model decision-making, business stakeholders have a difficult time assessing the value and risks associated with launching a new ML-based product. And without insights into why an ML application is behaving in a certain way, application developers have a harder time troubleshooting issues, and ML scientists have a more difficult time assessing their models for fairness and bias. To further complicate an already challenging problem, the audiences for ML model explanations come from varied backgrounds, have different levels of experience with statistics and mathematical reasoning, and are subject to cognitive biases…more details
Meg is currently a UX Researcher for Google Cloud AI and Industry Solutions, where she focuses her research on Explainable AI and Model Understanding. She has had a varied career working for start-ups and large corporations alike across fields such as EdTech, weather forecasting, and commercial robotics. She has published articles on topics such as information visualization, educational-technology design, human-robot interaction (HRI), and voice user interface (VUI) design. Meg is also a proud alumnus of Virginia Tech, where she received her Ph.D. in Human-Computer Interaction (HCI).
In-person | Talk | Machine Learning Safety and Security | Responsible Ai | All Levels
Dawn Song is a Professor in the Department of Electrical Engineering and Computer Science at UC Berkeley. Her research interest lies in deep learning, security, and blockchain. She has studied diverse security and privacy issues in computer systems and networks, including areas ranging from software security, networking security, distributed systems security, applied cryptography, blockchain and smart contracts, to the intersection of machine learning and security. She is the recipient of various awards including the MacArthur Fellowship, the Guggenheim Fellowship, the NSF CAREER Award, the Alfred P. Sloan Research Fellowship, the MIT Technology Review TR-35 Award, the Faculty Research Award from IBM, Google and other major tech companies, and Best Paper Awards from top conferences in Computer Security and Deep Learning. She is an IEEE Fellow. She is ranked the most cited scholar in computer security (AMiner Award). She obtained her Ph.D. degree from UC Berkeley. Prior to joining UC Berkeley as a faculty, she was a faculty at Carnegie Mellon University from 2002 to 2007. She is also a serial entrepreneur.
Jon Krohn is Chief Data Scientist at the machine learning company untapt. He authored the book Deep Learning Illustrated, which was released by Addison-Wesley in 2019 and became an instant #1 bestseller that was translated into six languages. Jon is renowned for his compelling lectures, which he offers in-person at Columbia University, New York University, and the NYC Data Science Academy, as well as online via O’Reilly, YouTube, and his A4N podcast on A.I. news. Jon holds a doctorate in neuroscience from Oxford and has been publishing on machine learning in leading academic journals since 2010.
In-person | Talk | Research Frontiers | Machine Learning | All Levels
Heart rate variability biofeedback (HRV-B) is a clinically effective therapy in which patients can improve their mental and physical well-being through real-time monitoring of the heart-rate and specialized breathing techniques. HRV-B can improve health outcomes in a number of medical or wellness-related conditions, ranging from depression and anxiety, to cardiovascular disease, asthma, cancer fatigue, women’s health, better sleep, peak athletic performance, and stress resilience…more details
Kirstin Aschbacher is a Data Scientist, with a background in PsychoNeuroImmunology Research from her days as an Associate Professor at the University of California, San Francisco (UCSF), Department of Psychology, Weill Institute for Neurosciences, and the Division of Cardiology. She has a PhD in Clinical Psychology and is also a licensed Psychologist with a certificate in HRV Biofeedback. She uses her cross-functional skill-sets to drive innovative, AI-based products that enhance user well-being and stress-resilience. In her current role as Senior Director of Health Data Science at Meru Health, she has focused on HRV Biofeedback and Precision Care algorithms.
In-person | Track Keynote | Machine Learning | MLOps and Data Engineering | All Levels
In this session, we will describe the challenges in operationalizing machine & deep learning. We’ll explain the production-first approach to MLOps pipelines – using a modular strategy, where the different components provide a continuous, automated, and far simpler way to move from research and development to scalable production pipelines. Without the need to refactor code, add glue logic, and spend significant efforts on data and ML engineering…more details
Yaron Haviv is a serial entrepreneur who has been applying his deep technological experience in data, cloud, AI and networking to leading startups and enterprise companies since the late 1990s. As the co-founder and CTO of Iguazio, Yaron drives the strategy for the company’s MLOps platform and led the shift towards the production-first approach to data science and catering to real-time AI use cases. He also initiated and built Nuclio, a leading open source serverless platform with over 4,000 Github stars and MLRun, Iguazio’s open source MLOps orchestration framework. Prior to co-founding Iguazio in 2014, Yaron was the Vice President of Datacenter Solutions at Mellanox (now NVIDIA), where he led technology innovation, software development and solution integrations. He was also the CTO and Vice President of R&D at Voltaire, a high-performance computing, IO and networking company which floated on the NYSE in 2007. Yaron is an active contributor to the CNCF Working Group and was one of the foundation’s first members. He presents at major industry events and writes tech content for leading publications including TheNewStack, Hackernoon, DZone, Towards Data Science and more.
In-person | Talk | NLP | Intermediate
In this talk, we navigate through the latest buzz around semantic search and separate the noise from the meaningful advancements. Is dense retrieval better than BM25’s keyword search? Do large language models outperform smaller transformers? How well do the models generalize to industry corpora? How can we leverage Question Answering?…more details
Malte Pietsch is CTO & Co-Founder at deepset. His current focus is on building deepset Cloud – a SaaS platform for developers to build, deploy and operate modern NLP pipelines. He holds a M.Sc. with honors from TU Munich and conducted research at Carnegie Mellon University. Before founding deepset he worked as a data scientist for multiple startups. He is an active open-source contributor and author of the NLP framework Haystack.
In-person | Talk | MLOps and Data Engineering | All Levels
MLOps has emerged as key focus area for Enterprise. But why? The answer is simple. To remain competitive in this era of digital transformation it’s become a business imperative to establish a competency around machine learning and deep learning application delivery. Now, enterprises are starting to take the next step in making the MLOps process repeatable, scalable and reproducible, so they can continuously infuse the business with innovation…more details
David has over 20 years of experience in the fields of data, AI and enterprise cloud. He has led teams for EMC Dell, Hitachi and Cisco, working with some of the most innovative companies in the world in both classified and commercial environments. Today, David acts as the Western Regional Director at Iguazio, working with Enterprise customers to help them bring their data science initiatives to life. David is passionate about applying MLOps principles to real-world AI projects, on-premise, in multi-cloud environments, on a SCIF or all of the above. When he’s not working with customers on AI projects, he volunteers at the Salvation Army and Rotary International. He and his wife have twins – a boy and a girl, as well as a 94lb/43kg Labrador that eats everything.
In-person | Talk | Responsible Ai and Social Good | Machine Learning Safety and Security | Beginner-Intermediate
In this talk, I will overview state-of-the-art techniques for protecting confidential data while _in use_. These methods encrypt the data while enabling data scientists to train models and run analytics queries on encrypted data, essentially “”sharing without showing””. I will then discuss our research and open source project called Opaque, which enables confidential analytics, learning and collaboration in an easy to use way. Link to open source project: https://github.com/mc2-project/mc2…more details
Raluca Ada Popa is the Robert E. and Beverly A. Brooks associate professor of computer science at UC Berkeley working in computer security, systems, and applied cryptography. She is a co-founder and co-director of the RISELab and SkyLab at UC Berkeley, as well as a co-founder of Opaque Systems and PreVeil, two cybersecurity companies. Raluca has received her PhD in computer science as well as her Masters and two BS degrees, in computer science and in mathematics, from MIT. She is the recipient of the 2021 ACM Grace Murray Hopper Award, a Sloan Foundation Fellowship award, Jay Lepreau Best Paper Award at OSDI 2021, Distinguished Paper Award at IEEE Euro S&P 2022, Jim and Donna Gray Excellence in Undergraduate Teaching Award, NSF Career Award, Technology Review 35 Innovators under 35, Microsoft Faculty Fellowship, and a George M. Sprowls Award for best MIT CS doctoral thesis.
In-person | Talk | Machine Learning |Deep Learning | NLP
Despite the rapid evolution of AI, projects still fail at a disappointingly high rate. In the past, capturing data at scale and building models was the challenge, but today we’re confronted with the issue of making AI more robust while avoiding the risk of unintended consequences. While the tools are new, many challenges remain the same. In this talk, I will share real-world stories and applied examples that demonstrate:
* How to build the business case for an AI project (and get buy-in)
* Navigating AI project management to prevent failure
* How to mitigate the risks of unintended consequence from using AI..more details
Cal Al-Dhubaib is a data scientist, entrepreneur, and professional speaker on Artificial Intelligence. He founded Pandata to help organizations plan, design, and scale human-centered AI solutions. Pandata has overseen 80+ transformative projects with leading global brands including Parker Hannifin, the Cleveland Museum of Art, FirstEnergy, and Penn State University. Cal is especially passionate about orchestrating inclusive teams that are empowered to build Trusted AI solutions. He has been recognized as a Notable Immigrant Entrepreneur, Crain’s Cleveland 20 in their 20s, and two-time Cleveland Smart 50 recipient. In addition to becoming the first data science graduate from Case Western Reserve University, Cal is also known for his role in advocating for careers and educational pathways in Data Science through workforce development initiatives.
Cloud computing promises to simplify infrastructure, but somehow MLOps remains deeply technical, even in the cloud. The complexity of MLOps tends to lead to an organizational antipattern: data scientists who know the data and models best have to mind-meld with data engineers who know the infrastructure best. This is particularly problematic in the highest-value stage of the ML lifecycle — managing models in production…more details
Joseph M. Hellerstein is the Jim Gray Professor of Computer Science at the University of California, Berkeley, whose work focuses on data-centric systems and the way they drive computing. He is an ACM Fellow, an Alfred P. Sloan Research Fellow and the recipient of three ACM-SIGMOD “Test of Time” awards for his research. Fortune Magazine has included him in their list of 50 smartest people in technology , and MIT’s Technology Review magazine included his work on their TR10 list of the 10 technologies “most likely to change our world”.
Hellerstein is a co-founder of Aqueduct, which is bringing new open source technology for Prediction Infrastructure to market. Previously he co-founded Trifacta, the pioneering company in Data Preparation, where he served as founding CEO and Chief Strategy Officer. Hellerstein has served on the technical advisory boards of a number of computing and Internet companies including Dell EMC, SurveyMonkey, Datometry and Acryl Data.
Virtual | Talk | NLP
In this talk, I will present work on enhancing the important aspects of unification, generalization, and efficiency in large-scale pretrained models across vision and language modalities, via different methods and directions of visual grounding for improving both multimodal and text-only NLU tasks. We will start by discussing joint vision and language pretraining models such as LXMERT (large-scale cross-modal pretraining). Next, we will present VL-T5 to unify several multimodal tasks (such as visual question answering, referring expression comprehension, visual reasoning/entailment, visual commonsense reasoning, captioning, and multimodal machine translation) by treating all these tasks as text generation…more details
Dr. Mohit Bansal is the John R. & Louise S. Parker Professor and the Director of the MURGe-Lab in the Computer Science department at University of North Carolina (UNC) Chapel Hill. He received his PhD from UC Berkeley and his BTech from IIT Kanpur. His research expertise is in natural language processing and multimodal machine learning, with a particular focus on grounded and embodied semantics, human-like language generation, and interpretable and generalizable deep learning. He is a recipient of DARPA Director’s Fellowship, NSF CAREER Award, Army Young Investigator Award, Google Focused Research Award, Microsoft Investigator Fellowship, and outstanding paper awards at ACL, CVPR, EACL, COLING, and CoNLL. His service includes ACL Executive Committee, ACM Doctoral Dissertation Award Committee, Program Co-Chair for CoNLL 2019, ACL Americas Sponsorship Co-Chair, and Associate/Action Editor for TACL, CL, IEEE/ACM TASLP, and CSL journals. Webpage: https://www.cs.unc.edu/~mbansal/
Virtual | Talk | Deep Learning | Research Frontiers | Beginner
Attendees will also see a comparison of greedy versus epsilon greedy, and why epsilon greedy can solve tasks that cannot be solved using a greedy approach. Some of the preceding concepts will be illustrated during the presentation of the n-chain task in RL, whose solution clearly requires an epsilon greedy algorithm. The target audience for this session is for beginners who have no experience with RL…more details
Bio Coming Soon!
Virtual | Talk | Machine Learning Safety and Security | Deep Learning | NLP | Intermediate
Most deep learning (DL) models assume ideal conditions and rely on the assumption that test/production data comes from the in-distribution samples from the training data. However, this assumption is not satisfied in most real-world applications. Test data could differ from the training data either due to adversarial perturbations, new classes, generated content, noise, or other distribution changes. These shifts in the input data can lead to classifying unknown types, classes that do not appear during training, as known with high confidence. On the other hand, adversarial perturbations in the input data can cause a sample to be incorrectly classified. In this talk, we will discuss approaches based on group-based and individual subset scanning methods from the anomalous pattern detection domain and how they can be applied over off-the-shelf DL models.
Celia Cintas is a Research Scientist at IBM Research Africa – Nairobi. She is a member of the AI Science team at the Kenya Lab. Her current research explores subset scanning for anomalous pattern detection under generative models and the improvement of ML techniques to address challenges in Global Health. Previously, a grantee from the National Scientific and Technical Research Council at LCI-UNS and IPCSH-CONICET. She holds a Ph.D. in Computer Science from Universidad del Sur (Argentina). More info https://celiacintas.github.io/about/
Diffusion-based generative models such as DALL·E 2 have achieved exceptional image generation quality. Unlike other generative models based on explicit representations of probability distributions (e.g., autoregressive) or implicit sampling procedures (e.g., GANs), diffusion models learn directly the vector field of gradients of the data distribution (scores). This framework allows flexible architectures, requires no sampling during training or the use of adversarial training methods. These score-based generative models enable exact likelihood evaluation, achieve state-of-the-art sample quality, and can be used to improve performance in a variety of inverse problems, including medical imaging…more details
Stefano Ermon is an Associate Professor of Computer Science in the CS Department at Stanford University, where he is affiliated with the Artificial Intelligence Laboratory, and a fellow of the Woods Institute for the Environment. His research is centered on techniques for probabilistic modeling of data and is motivated by applications in the emerging field of computational sustainability. He has won several awards, including Best Paper Awards (ICLR, AAAI, UAI and CP), a NSF Career Award, ONR and AFOSR Young Investigator Awards, Microsoft Research Fellowship, Sloan Fellowship, and the IJCAI Computers and Thought Award. Stefano earned his Ph.D. in Computer Science at Cornell University in 2015.
In-person | Talk | Responsible AI | ML Safety and Security | Intermediate
In his talk “Responsible AI Is Not an Option,” Dr. Scott Zoldi brings to bear his decades of experience in delivering analytic innovation in a highly regulated environment, to underscore the urgency with which the topics of AI fairness and bias must be ushered onto Boards of Directors’ agendas. In his dynamic presentation, Dr. Zoldi spells out the “why” and “how” of fulfilling the social covenant and soon, regulatory requirements of enterprises using AI ethically, transparently, securely and in their customers’ best interests…more details
Scott Zoldi is chief analytics officer at FICO responsible for advancing the company's leadership in artificial intelligence (AI) and analytics in its product and technology solutions. At FICO Scott has authored more than 120 analytic patents, with 71 granted and 49 pending. Scott is actively involved in the development of analytics applications, Responsible AI technologies and AI governance frameworks, the latter including FICO's blockchain-based [SZ1] model development governance methodology. Scott is a member of the Board of Advisors of FinRegLab, a Cybersecurity Advisory Board Member of the California Technology Council, and a Board Member of Tech San Diego and the San Diego Cyber Center of Excellence. He is also a member of the CNBC Technology Executive Council. Scott received his Ph.D. in theoretical and computational physics from Duke University.
In-person | Business Talk
Taylor will share some of the biggest regrets and lessons he has learned after a decade of building and selling AI startups. From Sequoia’s HireVue to his own deep-learning startup Zeff which was acquired by DataRobot. Taylor will discuss key competencies and lessons needed when it comes to selling AI software in the market…more details
Ben Taylor has over 17 years of machine-learning experience. After studying chemical engineering, Taylor joined Intel and Micron and worked in their photolithography, process control, and yield prediction groups. Pursuing his love for high-performance computing (HPC) and predictive modeling, Taylor joined an artificial intelligence hedge fund (AIQ) as their AI expert. Taylor then joined a young HR startup called HireVue and built out their data science group and helped o launch HireVue’s AI insights product using video/audio from candidate interviews. In 2017 Taylor co-founded Zeff.ai to pursue deep learning for image, audio, video, and text for the enterprise.
In-person | Talk | Machine Learning | Research Frontiers
This talk will provide a brief overview of the following topics: • The broad application of machine learning in finance: opportunities and challenges. • Machine Learning techniques for asset pricing, enhancing complex quant models (i.e., PDE, Monte Carlo) for an efficient pricing of derivative securities and pricing of illiquid securities using data driven methods. • Collaborative filtering techniques for illiquid instrument pricing, use of data driven cohorts/drivers to inform price movements in a target instrument based on observations on other instruments in the market…more details
Arun heads the Bloomberg Quantitative Research Solutions Team. Arun’s work initially focused on Stochastic Volatility Models for Derivatives & Exotics pricing/hedging and more generally around asset pricing using traditional quantitative finance methods. More recently, he has enjoyed working at the intersection of diverse areas such as data science, innovative quantitative finance models and using AI/Machine Learning methods to help reveal embedded signals in traditional & alternative data such as Company Financials, ESG, News/Social, Supply Chain, Geolocational & Extreme Weather and their potential impact on capital markets. Most recently in an attempt to complete a full circle, he has been exploring use of ML methods in asset pricing , e.g. Derivatives pricing and illiquid instrument pricing.
Prior to joining Bloomberg, he earned his Ph.D from Cornell University in the areas of computer science and applied mathematics and a B. Tech in Computer Science from IIT Delhi, India. Arun is also an editorial board member of The Journal of Financial Data Science.
In-person | Talk
Consulting leader that has advised and worked closely with Fortune 500 enterprises to implement and adopt usage of cloud, analytics/AI and IoT solutions. He is a consulting leader at Tiger Analytics, solving complex problems for supply chain, manufacturing, and sustainability while creating business value. In addition, he has served as innovation thought leader with US federal agencies – DoE, NSF, and Materials Genome Initiative to drive usage of Cloud/AI in R&D and manufacturing to launch consumer products.
In-person | Talk | Machine Learning | Beginner-Intermediate
Did my data change after a certain intervention? This is a common question with data observed over time. Classical statistical and engineering approaches include control charts to see if the series falls outside of the normal boundaries of expected data. A Bayesian approach to this problem calculates the probability that the data series changes at every point along the series. Bayesian change point analysis allows the analyst to evaluate a whole series and look where the highest probability of change occurred. Has the financial asset lost value after the recent financial report? Are the healthcare outcomes at this hospital better after our new process to help patients? Did the manufacturing process improve after upgrading the machinery? All these questions and more can be answered with these techniques which will be shown in R…more details
A Teaching Associate Professor at the Institute for Advanced Analytics, Dr. Aric LaBarr is passionate about helping people solve challenges using their data. There he helps design the innovative program to prepare a modern workforce to wisely communicate and handle a data-driven future at the nation’s first master of science in the analytics degree program. He teaches courses in predictive modeling, forecasting, simulation, financial analytics, and risk management.
Previously, he was Director and Senior Scientist at Elder Research, where he mentored and lead a team of data scientists and software engineers. As director of the Raleigh, NC office he worked closely with clients and partners to solve problems in the fields of banking, consumer product goods, healthcare, and government.
Dr. LaBarr holds a B.S. in economics, as well as a B.S., M.S., and Ph.D. in statistics — all from NC State University.
In-person | Talk | Deep Learning | Machine Learning | All Levels
Is your Generative adversarial neural network (GANs) not producing representative synthetic data? If yes, that is no surprise because training a GAN to produce quality data representative of the natural distributions is more complex than traditional predictive modeling. Ensuring the data is representative often requires an analysis of the covariate relationships and a comparison of the moments in the synthetic and natural (actual) distributions. This presentation will detail how a genetic algorithm can be combined with a set of pseudo discriminators to automate constructing a better GAN…more details
Robert teaches machine learning for SAS and specializes in neural networks. Before joining SAS, Robert worked under the Senior Vice Provost at North Carolina State University where he built models pertaining to student success, faculty development and resource management. Prior to working in academia, Robert was a member of the research and development group on the Workforce Optimization team at Travelers Insurance. His models at Travelers focused on forecasting and optimizing resources. Robert graduated with a master’s degree in Business Analytics and Project Management from the University of Connecticut and a master’s degree in Applied and Resource Economics from East Carolina University.
In-person | Talk | Deep Learning
Mohamed is the Co-founder & CEO of Kolena and the author of Manning’s book: “Deep Learning for Vision Systems”. Previously, he built and managed AI/ML organizations at Amazon, Twilio, Rakuten, and Synapse (acq. by Palantir). Mohamed regularly speaks at AI conferences like Amazon’s DevCon, O’Reilly’s AI conference, and Google’s I/O.
In this talk, we’ll see how the dataset on-boarding process for machine learning can be greatly simplified by using the dabl library in Python, which provides interactive suggestions for data cleaning and…more details
Andreas Mueller is a Principal Research SDE at Microsoft (previously Columbia, NYU, Amazon), and author of the O’Reilly book “Introduction to machine learning with Python”, describing a practical approach to machine learning with python and scikit-learn. He is one of the core developers of the scikit-learn machine learning library, and has been co-maintaining it for several years. Andreas is also a Software Carpentry instructor.
In-person | Talk | Machine Learning | All Levels
Data Lakehouses combine the best of both worlds for databases and data lakes. Databases provide relative simplicity and ACID transactional protection for your data, while data lakes provide flexibility, scalability, and support for non-structured data on cheap object stores. In this session, we describe Delta Lake, an open-source storage framework that brings reliability by providing a transactional layer on top of data lakes. We will talk about key features of Delta Lake that enable the Lakehouse Architecture that ensures reliability around your Machine Learning pipelines. Finally, we will talk about the work we are doing to build the ecosystem around Delta Lake, including supporting multiple languages (Python, Rust, Java, etc) as well as data processing systems (Apache Pulsar, Apache Flink, Apache Hive, PrestoDB, TrinoDB, Apache Spark™, etc)…more details
Allison Portis is a software engineer at Databricks working on Delta Lake. She recently graduated from Cornell University where she studied computer science. Allison previously worked on open source feature engineering projects as an intern at Feature Labs and is excited to now be a part of the Delta Lake community.
Vini Jaiswal is a Developer Advocate at Databricks. She co-leads the advocacy for the open-source project Delta Lake. She helped advance data science and AI uses for over a decade with companies of different sizes. She loves to help with social causes through data and AI skills, and actively contributes to modern Data Science and Eng.
In-Person | Talk
In this talk, I’ll discuss how to create adhoc data science teams to help with local elections. We’ll cover how to structure projects to collect data, continually inform your campaign strategy, and communicate with a variety of backgrounds…more details
Kyle is the Chief Architect at Noteable and a core developer of the IPython/Jupyter project. He wants to help build great environments for collaborative analysis, development, and production workloads for everyone; from small teams to massive scale. His passion for open source has enabled him to build better systems with staying power, enable peers, support companies he’s worked for, and drive growth. As an active member of local politics, Kyle has focused on schools, active transportation, transit, and housing all to have a good impact on climate change and equity.
Virtual | Talk | MLOps and Data Engineering | All Levels
Traditional orchestrators think in terms of “tasks”. This talk discusses an alternative, declarative approach to data orchestration that puts data assets at the center. This approach, called “software-defined assets”, is implemented in Dagster, an open source data orchestrator…more details
Sandy works at Elementl as the lead engineer for the Dagster project. Prior, he led machine learning and data science teams at KeepTruckin and Clover Health. He’s a committer on Spark and Hadoop, and co-authored O’Reilly’s Advanced Analytics with Spark.
Virtual | Talk | Machine Learning | Intermediate
With breakthroughs in areas such as image recognition, natural language understanding and board games, AI and machine learning are revolutionizing various industries such as healthcare, manufacturing and finance. As complex machine learning models are being deployed into production, the understanding of them is becoming very important. The lack of a deep understanding can result in models propagating bias and we’ve seen examples of this in criminal justice, politics, retail, facial recognition and language understanding. Explaining or interpreting AI is a hot topic in research and the industry, as modern machine learning algorithms are black boxes and nobody really understands how they work. Moreover, there is EU regulation now to explain AI under the GDPR “right to explanation”. Interpretable AI is therefore a very important topic for AI practitioners. In this talk, I will give an overview of a few state-of-the-art interpretability techniques and how you could build explainable AI systems…more details
Ajay Thampi is a machine learning engineer at Meta where he works on large recommender systems, responsible AI and fairness. He holds a PhD and his research was focused on signal processing and machine learning. He has published papers at leading conferences and journals on reinforcement learning, convex optimization, and classical machine learning techniques applied to 5G cellular networks.
Virtual | Talk | All Levels
CML is a project to help ML and data science practitioners automate their ML model training and model evaluation using best practices and tools from software engineering, such as GitLab CI/CD (as well as GitHub Actions and BitBucket Pipelines). The idea is to automatically train your model and test it in a production-like environment every time your data or code changes…more details
Alex Kim is a Solutions Engineer at Iterative. His background is in physics, software engineering, and machine learning. In the last couple of years, he became increasingly interested in the engineering side of ML projects: processes and tools needed to go from an idea to a production solution.
Data Lake Technology provides a powerful way to process, refine, and present huge volumes of diverse data. But this comes at a cost. As a Data Lake evolves, it grows in size and complexity. If not properly managed, a Data Lake can outgrow the abilities and resources of the team that manages it, negatively impacting the usefulness of an organization’s data and slowing or halting the team’s implementation of new analytics and applications. In this talk, Roger Dev showcases how the open source HPCC Systems platform has developed an open-source data curation and governance system called Tombolo to complement the powerful storage and compute capabilities of the HPCC Systems Data Lake operating system…more details
Roger is a Senior Architect leading the Machine Learning and Analytics Library team at LexisNexis Risk Solutions. Roger has been involved in the implementation and utilization of machine learning and AI techniques for many years, and he has more than 20 patents in diverse areas of software technology.
This talk will go through high level details of how the model was built and trained, and some of the precursors models that it built upon. Then it will move to some comments on the ethical implications of AI art models generally and ethical considerations for those using stable diffusion and models like it…more details
Hunter Kempf is a Data Scientist working in the cybersecurity industry and a Z by HP Global Data Science Ambassador. In his free time he works on various side projects relating to Data Science and some of those projects end up as articles for his Medium blog. Previously Hunter worked as a Data Scientist at AT&T working on preventing Fraud and Security incidents and graduated from the Georgia Institute of Technology (Georgia Tech) with a masters in Cybersecurity and the University of Notre Dame with a masters in Applied and Computational Mathematics and Statistics.
Virtual | Talk | NLP | All Levels
This talk presents my lab’s work toward building general-purpose models in NLP and how to systematically evaluate them. I present a new meta-dataset – called super-Natural Instructions – that includes a variety of NLP tasks and their descriptions to evaluate cross-task generalization. Then, I introduce a new meta training approach that can solve more than 1600 NLP tasks only from their descriptions and a few examples…more details
Hanna Hajishirzi is an Associate Professor in the Paul G. Allen School of Computer Science & Engineering at the University of Washington and a Senior Research Manager at the Allen Institute for AI. Her research spans different areas in NLP and AI, focusing on developing general-purpose machine learning algorithms that can solve diverse NLP tasks. Applications for these algorithms include question answering, representation learning, green AI, knowledge extraction, and conversational dialogue. Honors include the NSF CAREER Award, Sloan Fellowship, Allen Distinguished Investigator Award, Intel rising star award, best paper and honorable mention awards, and several industry research faculty awards. Hanna received her PhD from University of Illinois and spent a year as a postdoc at Disney Research and CMU.
In-person | Business Talk | AI for Cross Industry | All Levels
Global spend on AI continues to explode, even during the current economic climate, as C-Suite and senior executives realize the importance AI has and will have on their business models. Despite these massive investments, enterprises are challenged to show ROI from their AI initiatives due to complexity and lack of business adoption…more details
Gaurav is currently the Executive Vice President and General Manager of Machine Learning and AI at AtScale. He is responsible for defining and leading the business that extends the company’s semantic layer platform to address the rapidly expanding set of Enterprise AI and machine learning applications.
Most recently, Gaurav served as VP of Product at Neural Magic – innovators in software acceleration for deep learning utilizing sparse model architectures. Previously, he served in a number of executive roles at IBM spanning product, engineering, and sales that were focused on taking cutting edge data science, machine learning, and AI products and solutions to market; specializing in model training, serving, mlops, and trusted AI in the context of driving business outcomes for enterprise applications. He is also an advisor to data and AI companies.
In-person | Career Talk | Beginner
Want to land your dream job in data? Learn what makes a Data resume stand out, how a portfolio project is a job hunting cheat code when you avoid these 6 mistakes, why cold email is a networking super-power, and how to craft a winning personal story for the behavioral interview. These tips led Nick Singh, best-selling author of Ace the Data Science interview, to work at Facebook & Google, and helped 200+ of his coaching clients land top jobs in tech…more details
Nick Singh is an Ex-Facebook & Google Engineer turned best-selling author of Ace the Data Science Interview, and founder of SQL Interview Platform DataLemur.com. His career advice on LinkedIn has earned him 100,000 followers, and he’s successfully career coached 578 people to land their dream job in data!
In-person | Women’s Ignite | All Levels
Vishakha Gupta-Cledat is Co-founder and CEO of ApertureData. Prior to that, she worked at Intel Labs for over 7 years where she led the design and development of VDMS (the Visual Data Management System) which forms the core of ApertureData’s product, ApertureDB. Vishakha holds a Ph.D in Computer Science from the Georgia Institute of Technology and a M.S. in Information Networking from Carnegie Mellon University. She has worked on scheduling in heterogeneous multi-core environments, graph based storage and applications on non volatile memory systems, and visual data management challenges for analytics use cases.
Deepti Chafekar, PhD did her Masters from University of Georgia and finished her PhD from Virginia Tech in C.S, where she focused on ML techniques and complex algorithmic and optimization techniques. She received the prestigious Outstanding Dissertation award at Virginia Tech for her doctoral research. Deepti has been working in the field of AI and ML for more than 15 years, has published several research papers in prestigious conferences and journals and has multiple patents . She has worked as a scientist at Microsoft and Nokia research labs and has taken leadership roles in many successful AI start-ups. She is currently a co-founder at Weav.ai, where she is working on building AI solutions that can be easily available for solving multiple business problems.
Causal inference is increasingly an indispensable tool of data science, machine learning, and data-driven decision-making. In this talk I will present the state-of-play in causal machine learning. I cover the problems that matter in practice, with emphasis on the tech and retail industries. I will also talk about trends in opensource tools for causal inference. Finally, I’ll show examples from DoWhy and its sister package EconML, which together form the PyTorch of causal inference…more details
Robert Osazuwa Ness is a researcher at Microsoft Research and author of the book Causal Machine Learning. He leads the development of MSR’s causal machine learning platform and conducts research into probabilistic models for advanced causal reasoning. He has worked as a machine learning engineer in various machine learning startups. He attended graduate school at both Johns Hopkins SAIS (Hopkins-Nanjing Center) and Purdue University. He received his Ph.D. in Statistics from Purdue, where his dissertation research focused on Bayesian active learning models for causal discovery.
Virtual | Talk | Machine Learning | Responsible AI | All Levels
Data-centric AI broadly describes the idea that *data*, rather than models, is increasingly the crux of success or failure in AI for many settings and use cases. More specifically, data-centric AI defines ML development workflows that center around principally iterating on the *training data*–e.g. labeling, sampling, slicing, augmenting, etc.–rather than the model architecture. In this talk, I’ll describe how programmatic or weak supervision can not only facilitate these data-centric workflows (in ways that manual labeling cannot), but more importantly, will present an overview about how it can serve as an API for rich organizational knowledge sources, presenting recent technical results and user case studies…more details
Alex Ratner has Ph.D. in computer science at Stanford, advised by Chris Re, where his researched focuses on weak supervision: the idea of using higher-level, noisier input from domain experts to train complex state-of-the-art models where limited or no hand-labeled training data is available. He leads the development of the Snorkel framework (snorkel.stanford.edu) for weakly supervised ML, which has been applied to machine learning problems in domains like genomics, radiology, and political science. He is supported by a Stanford Bio-X SIGF fellowship.
Virtual | Talk | Machine Learning | All Levels
Data-centric AI is bridging the gap between research and practice. Instead of optimizing our algorithms and architectures, pivoting to focus on data as the primary way to improve our machine learning models is yielding tremendous results. But this shift to data has left some gaps in our development process, and with this shift, we need to rethink how we develop AI from tooling to processes…more details
Jimmy Whitaker is the Chief Scientist of AI at Pachyderm. He focuses on creating a great data science experience and sharing best practices for how to use Pachyderm. When he isn’t at work, he’s either playing music or trying to learn something new, because “You suddenly understand something you’ve understood all your life, but in a new way.”
Virtual | Talk | Cybersecurity
Cybersecurity and policing in the metaverse. You can buy virtual assets in the Metaverse; real estate, investment commodities, stock. This of course means that the Metaverse will need to have security and policing…more details
Jack McCauley an Innovator in Residence at Jacobs Institute for Design Innovation at UC Berkeley, Professor at UC Berkeley, Co-Founder of Oculus, an American engineer, hardware designer, inventor, video game developer and philanthropist. Jack is best known for designing the guitars and drums for the Guitar Hero video game series, and as a co-founder and former chief engineer at Oculus VR. At Oculus, Jack designed and built the Oculus
DK1 and DK2 virtual reality headsets. Oculus was acquired by Facebook for $2 Billion. McCauley holds numerous U.S. patents for inventions in software, audio effects, virtual reality, motion control, computer peripherals, and video game hardware and controllers. Jack was awarded a full scholarship to attend University of California, Berkeley where he earned as BSc., EECS in Electrical Engineering and Computer Science in 1986. Jack has authored numerous research papers in the field of artificial intelligence (AI) and mathematical modeling of AI-based systems and is currently pursuing new projects at his private R&D facility and hardware incubator in Livermore, California.
Virtual | Talk | Machine Learning | Deep Learning | All Levels
In this talk, we will explore production-quality, real-time data science using the current leading open, real-time technologies: Kafka, redpanda, ksqlDB, Materialize, and Deephaven Community Core…more details
Chip Kent is the chief data scientist at Deephaven Data Labs. He holds a Ph.D. from CalTech, with decades of quantitative, mathematical, and computer science experience. Chip comes from a background in quantitative private investment, using data to make investments at Walleye Capital.
Virtual | Talk | Responsible AI | All Levels
Climate change is one of the greatest challenges that society faces today, requiring rapid action from all corners. In this talk, I will describe how machine learning can be a potentially powerful tool for addressing climate change, when applied in coordination with policy, engineering, and other areas of action. From energy to agriculture to disaster response, I will describe high impact problems where machine learning can help through avenues such as distilling decision-relevant information, optimizing complex systems, and accelerating scientific experimentation…more details
Priya Donti is a Co-founder and Chair of Climate Change AI, a non-profit initiative to catalyze impactful work at the intersection of climate change and machine learning, which she is currently running through the Cornell Tech Runway Startup Postdoc Program. She will also join MIT EECS as an Assistant Professor in Fall 2023. Her research focuses on machine learning for forecasting, optimization, and control in high-renewables power grids. Specifically, her work explores methods to incorporate the physics and hard constraints associated with electric power systems into deep learning models. Priya received her Ph.D. in Computer Science and Public Policy from Carnegie Mellon University, and is a recipient of the MIT Technology Review’s 2021 “35 Innovators Under 35” award, the Siebel Scholarship, the U.S. Department of Energy Computational Science Graduate Fellowship, and best paper awards at ICML (honorable mention), ACM e-Energy (runner-up), PECI, the Duke Energy Data Analytics Symposium, and the NeurIPS workshop on AI for Social Good.
As the amount and complexity of data rapidly increases, machine learning tools are being used for a wide array of analytical tasks. These tasks include supervised and unsupervised prediction and forecasting as well as sophisticated normalization and integration of heterogeneous data sets. Although machine learning has shown great promise in almost every area it has been applied to, mistaken assumptions about the data being used to train such models can lead to erroneous evaluations and to models that do not actually work as well (or at all) in practice. In this session, we will talk concretely about five interrelated pitfalls that one might encounter when using supervised machine learning and how to avoid them. Importantly, these pitfalls are not domain specific — they can, and do, occur in every industry, and failing to appreciate their significance can cause projects to fail that would otherwise succeed…more details
Jacob Schreiber is a fifth year Ph.D. student and NSF IGERT big data fellow in the Computer Science and Engineering department at the University of Washington. His primary research focus is on the application of machine larning methods, primarily deep learning ones, to the massive amount of data being generated in the field of genome science. His research projects have involved using convolutional neural networks to predict the three dimensional structure of the genome and using deep tensor factorization to learn a latent representation of the human epigenome. He routinely contributes to the Python open source community, currently as the core developer of the pomegranate package for flexible probabilistic modeling, and in the past as a developer for the scikit-learn project. Future projects include graduating.
In-person | Talk | All Levels
Dr. Bryan Bischof is the Head of Data Science at Weights and Biases, and adjunct professor of Data Science at Rutgers University. He’s previously worked in Time Series Signal Processing at Scale, Demand Forecasting, Global Optimization and Logistics, and Personalized Recommendations. He’s obsessed with math, and has a dog named Ravioli.
In-person | Talk | Machine Learning | Deep Learning | All | All Levels
AI projects can start small, from pretty much anywhere. Data scientists work from their laptop, workstation, cloud resources, or from resources within powerful servers and storage in a data center. Fast and reliable results will vary depending on the data, models and the infrastructure resources powering your data ingestion, analysis, model building, training, and optimization. Learn about the TCO considerations as you scale AI from exploratory pilot phases to production…more details
Justin Emerson is a Principal Technology Evangelist at Pure Storage focused on the FlashBlade product portfolio. He joined Pure in 2020 as a FlashBlade Data Architect for the San Francisco Bay Area. Prior to that, he worked at storage-focused reseller partners for more than a decade.
In-person | Talk | MLOps and Data Engineering
Jennifer Prendki is currently the VP of Machine Learning at Figure Eight, the essential human-in-the-loop AI platform for data science and machine learning teams. She has spent most of her career creating a data-driven culture wherever she went, succeeding in sometimes highly skeptical environments. She is particularly skilled at building and scaling high-performance Machine Learning teams, and is known for enjoying a good challenge. Trained as a particle physicist (she holds a PhD in Particle Physics from Sorbonne University), she likes to use her analytical mind not only when building complex models, but also as part of her leadership philosophy. She is pragmatic yet detail-oriented. Jennifer also takes great pleasure in addressing both technical and non-technical audiences at conferences and seminars, and is passionate about attracting more women to careers in STEM.