
Abstract: Statistics do not come intuitively to humans; they always try to find simple ways to describe complex things. Given a complex dataset, they may feel tempted to use simple summary statistics like the mean, median, or standard deviation to describe it. However, these numbers are not a replacement for visualizing the distribution.
To illustrate this fact, researchers have generated many datasets that are very different visually, but share the same summary statistics. In this talk, I will discuss """"Data Morph"""" (https://github.com/stefmolin/data-morph), an open source package that builds on previous research from Autodesk (the """"Datasaurus Dozen"""" (https://damassets.autodesk.net/content/dam/autodesk/research/publications-assets/pdf/same-stats-different-graphs.pdf)) using simulated annealing to perturb an arbitrary input dataset into a variety of shapes, while preserving the mean, standard deviation, and correlation to multiple decimal points. I will showcase how it works, discuss the challenges faced during development, and explore the limitations of this approach.
Session Outline:
Learning objectives: why data visualization is crucial, not to exclusively rely on numerical summaries of data, see a practical example of building upon existing research, the importance of contributing to open source, high-level introduction to building a Python package and the considerations that go into it.
Bio: Stefanie Molin is a software engineer and data scientist at Bloomberg in New York City, where she tackles tough problems in information security, particularly those revolving around data wrangling/visualization, building tools for gathering data, and knowledge sharing. She is also the author of "Hands-On Data Analysis with Pandas," which is currently in its second edition. She holds a bachelor’s of science degree in operations research from Columbia University's Fu Foundation School of Engineering and Applied Science, as well as a master’s degree in computer science, with a specialization in machine learning, from Georgia Tech. In her free time, she enjoys traveling the world, inventing new recipes, and learning new languages spoken among both people and computers.

Stefanie Molin
Title
Software Engineer, Data Scientist, Chief Information Security Office, Author of Hands-On Data Analysis with Pandas | Bloomberg
