Abstract: Machine learning has traditionally relied on creating models around data that can be represented in tabular format such as SQL tables, Pandas dataframes, and the like. Inherent in this data is the assumption that there is no relationship between each entry (row) of the data. In certain cases this is an accurate assumption. However, there are many common use cases for machine learning where this assumption is not entirely accurate. In these cases, by considering the relationships among those individual data points, models can be significantly enhanced and measurable improvements can be made to the appropriate metrics of that model. Such use cases can include common data science and machine learning tasks such as churn prediction and automated recommendation engines.
In this talk, we will compare and contrast models created with individual data points to those made entirely with graphs and hybrids of the two. We will explore a variety of techniques that are used for creating graph embeddings, the vectors for representing graphs that are created in a similar fashion to the feature engineering and vector embeddings associated with traditional machine learning. We will focus on the optimization of the graph embeddings and explore some real-world examples of their use individually and in conjunction with the traditional types of machine learning embeddings. Special emphasis will be placed on the benefits of using graph embeddings with significant class imbalance. We will also discuss the use of these embeddings with traditional machine learning packages and workflows, such as through the use of scikit-learn and TensorFlow.
Bio: Dr. Clair Sullivan is currently a graph data science advocate at Neo4j, working to expand the community of data scientists and machine learning engineers using graphs to solve challenging problems. She received her doctorate degree in nuclear engineering from the University of Michigan in 2002. After that, she began her career in nuclear emergency response at Los Alamos National Laboratory where her research involved signal processing of spectroscopic data. She spent 4 years working in the federal government on related subjects and returned to academic research in 2012 as an assistant professor in the Department of Nuclear, Plasma, and Radiological Engineering at the University of Illinois at Urbana-Champaign. While there, her research focused on using machine learning to analyze the data from large sensor networks. Deciding to focus more on machine learning, she accepted a job at GitHub as a machine learning engineer while maintaining adjunct assistant professor status at the University of Illinois. In 2021 she joined Neo4j as a Graph Data Science Advocate. Additionally, she founded a company, La Neige Analytics, whose purpose is to provide data science expertise to the ski industry. She has authored 4 book chapters, over 20 peer-reviewed papers, and more than 30 conference papers. Dr. Sullivan was the recipient of the DARPA Young Faculty Award in 2014 and the American Nuclear Society's Mary J. Oestmann Professional Women's Achievement Award in 2015.