Stefanie Molin

Stefanie Molin

Data Scientist, Software Engineer, Author of Hands-On Data Analysis with Pandas at Bloomberg

    Stefanie Molin is a software engineer and data scientist at Bloomberg in New York City, where she tackles tough problems in information security, particularly those revolving around data wrangling/visualization, building tools for gathering data, and knowledge sharing. She is also the author of "Hands-On Data Analysis with Pandas," which is currently in its second edition. She holds a bachelor’s of science degree in operations research from Columbia University's Fu Foundation School of Engineering and Applied Science, as well as a master’s degree in computer science, with a specialization in machine learning, from Georgia Tech. In her free time, she enjoys traveling the world, inventing new recipes, and learning new languages spoken among both people and computers.

    All Sessions by Stefanie Molin

    Day 2 extra event 04/24/2024
    3:00 pm - 3:30 pm

    Book Author : Hands-On Data Analysis with Pandas - Second Edition: A Python Data Science Handbook for Data Collection, Wrangling, Analysis, and Visualization

    Day 3 04/25/2024
    11:00 am - 11:30 am

    Data Morph: A Cautionary Tale of Summary Statistics

    <span class="etn-schedule-location"> <span class="firstfocus">Data Visualization</span> </span>

    Statistics do not come intuitively to humans; they always try to find simple ways to describe complex things. Given a complex dataset, they may feel tempted to use simple summary statistics like the mean, median, or standard deviation to describe it. However, these numbers are not a replacement for visualizing the distribution. To illustrate this fact, researchers have generated many datasets that are very different visually, but share the same summary statistics. In this talk, I will discuss """"""""Data Morph"""""""" (https://github.com/stefmolin/data-morph), an open source package that builds on previous research from Autodesk (the """"""""Datasaurus Dozen"""""""" (https://damassets.autodesk.net/content/dam/autodesk/research/publications-assets/pdf/same-stats-different-graphs.pdf)) using simulated annealing to perturb an arbitrary input dataset into a variety of shapes, while preserving the mean, standard deviation, and correlation to multiple decimal points. I will showcase how it works, discuss the challenges faced during development, and explore the limitations of this approach.

    Open Data Science

     

     

     

    Open Data Science
    One Broadway
    Cambridge, MA 02142
    info@odsc.com

    Privacy Settings
    We use cookies to enhance your experience while using our website. If you are using our Services via a browser you can restrict, block or remove cookies through your web browser settings. We also use content and scripts from third parties that may use tracking technologies. You can selectively provide your consent below to allow such third party embeds. For complete information about the cookies we use, data we collect and how we process them, please check our Privacy Policy
    Youtube
    Consent to display content from - Youtube
    Vimeo
    Consent to display content from - Vimeo
    Google Maps
    Consent to display content from - Google