Abstract: By 2018, the world will be producing 50TB of data per second. This explosion of data threatens the privacy of individuals. As a result, the global populace is demanding that technology companies be held responsible for safeguarding individual data. This has driven countries to implement laws governing individuals’ data rights such as Europe’s General Data Protection Regulation (GDPR) and the Judicial Redress Act in the US, making it even more important that data scientists ensure their techniques maintain privacy to avoid excessive fines. This presentation examines key points of the data privacy regulations currently in place, and how they impact the way data analysis is currently performed. It will also cover fine-grained details, benefits, pitfalls, and assumptions of techniques designed to maintain privacy, such as K-anonymization and Differential Privacy. Furthermore, some assume that the nonlinear nature of AI protects it from privacy violations. This talk will cover how ML algorithms are still vulnerable to privacy violations and the impact these techniques have on Machine Learning algorithms. It will conclude with a case study showcasing how a Differentially Private neural network outperformed a traditionally trained network.
Bio: Jim Klucar is the Director of Data Science at Immuta, a unified data platform for the world’s most secure organizations. After a dozen years of developing high performance radar processing techniques, in 2010 he switched to developing Hadoop-based data warehouse and analysis systems. Jim has contributed to many open source projects including Apache Accumulo, Mesos and Myriad. Jim holds a BS in Electrical Engineering from Pennsylvania State University and a MS in Applied and Computational Mathematics from Johns Hopkins University.