Abstract: Data Science relies on machine learning/statistics and is a pillar of A.I. Human judgement calls are ubiquitous at every step of a data science life cycle and they are often responsible for the """"dangers"""" of A.I. To maximally mitigate these dangers, we introduce a framework based on three core principles: Predictability, Computability and Stability (PCS), for a veridical (truthful) data science process. The PCS framework unifies, streamlines, and expands on the ideas and best practices of machine learning and statistics. PCS emphasizes reality check through predictability and takes a full account of uncertainty sources in the data science life cycle including those from human judgment calls such as those in data curation/cleaning, and choices of data and algorithms. Consisting of a workflow and documentation, PCS is supported by our software package v-flow (https://github.com/Yu-Group/veridical-flow). As an illustration, we unpack PCS in the context of developing clinical decision rules.
Bio: Bin Yu is Chancellor’s Professor in the Departments of Statistics and of Electrical Engineering & Computer Sciences at the University of California at Berkeley. Her current research interests focus on statistics and machine learning theory, methodologies, and algorithms for solving high-dimensional data problems. Her group is engaged in interdisciplinary research with scientists from genomics, neuroscience, and remote sensing.
She obtained her B.S. degree in Mathematics from Peking University in 1984, her M.A. and Ph.D. degrees in Statistics from the University of California at Berkeley in 1987 and 1990, respectively. She held faculty positions at the University of Wisconsin-Madison and Yale University and was a Member of Technical Staff at Bell Labs, Lucent. She was Chair of Department of Statistics at UC Berkeley from 2009 to 2012, and is a founding co-director of the Microsoft Lab on Statistics and Information Technology at Peking University, China, and Chair of the Scientific Advisory Committee of the Statistical Science Center at Peking University.
She is Member of the U.S. National Academy of Sciences and Fellow of the American Academy of Arts and Sciences. She was a Guggenheim Fellow in 2006, an Invited Speaker at ICIAM in 2011, and the Tukey Memorial Lecturer of the Bernoulli Society in 2012. She was President of IMS (Institute of Mathematical Statistics) in 2013-2014, and will be the Rietz Lecturer of IMS in 2016. She is a Fellow of IMS, ASA, AAAS and IEEE.
She served on the Board of Mathematics Sciences and Applications (BMSA) of NAS and as co-chair of SAMSI advisory committee. She is serving on the Board of Trustees at ICERM and Scientific Advisory Board of IPAM. She has served or is serving on numerous editorial boards, including Journal of Machine Learning Research (JMLR), Annals of Statistics, and Journal of American Statistical Association (JASA).