Abstract: Geometry underlies data structures and algorithms, and understanding the geometry of data and algorithms can help you wrangle data and get better results from your data. This talk will tour 'The Shape of Data' and how geometry influences analytics on data visualizations, network analytics, text analytics, image analytics, and small data problems. The emphasis will be on familiarizing the audience with tools that exist and thinking geometrically about data rather than diving into code or the mathematics of the methods. Some types of problems include misinformation spread on social networks, supervised models for image analytics, embeddings and models with small volumes of text data, and visualizing education data.
The talk will overview the general influence of geometry on data and data science, starting with metric geometry and embeddings of text and image data. We'll then dive into network geometry, survey data analytics, and the geometry of supervised learning algorithms for small data methods. We'll end with some exciting geometry tools in quantum computing for network science and image analytics.
It would be helpful if the audience knew a bit about wrangling data, creating models in Python, and visualizing data, but this isn't required. We'll build up the tools and keep the talk to general overviews of methods/problems.
Bio: Colleen M. Farrelly is a lead data scientist whose expertise spans generative AI, topological data analysis, network science, and NLP, among others. She's recently focused her research on the geometry of generative AI models and how this impacts their performance on tasks such as bias detection, and her volunteer work includes mentoring African machine learning students. She and Dr. Yae Gaba are the authors of The Shape of Data, an overview of machine learning from a geometric perspective.