Abstract: Data Cards are transparency artifacts that provide structured summaries of ML datasets with explanations of processes and rationale that shape the data. They also describe how the data may be used to train or evaluate ML models. In practice, two critical factors determine the success of a transparency artifact: (1) the ability to identify the information decision-makers use and (2) the establishment of processes and guidance needed to acquire that information. To initiate practice-oriented foundations in transparency that support responsible AI development in cross-functional groups and organizations, we created the Data Cards Playbook — an open-sourced, self-service, comprehensive toolkit consisting of participatory activities, frameworks, and guidance designed to address specific challenges faced by teams, product areas and companies when setting up an AI dataset transparency effort. Through a multi-pronged approach that included surveys, artifact analysis, interviews, and workshops; as well as through insights from design, policy, and technology experts across the industry and academia, we assembled and organized the Playbook into four modules: (1) Ask, (2) Inspect, (3) Answer, and (4) Audit. The Ask and Inspect modules help create and evaluate Data Card templates for organizational needs and principles. The Answer and Audit modules help data teams complete the templates and evaluate the resulting Data Cards. In this tutorial, to help others foster transparent, purposeful and human-centered documentation of datasets within the practical contexts of industry and research, we will walk through each of these modules, highlighting scalable concepts and frameworks that explore foundational aspects of transparency. We will also provide evidence-based patterns to help anticipate challenges faced when producing transparent documentation.
This proposed tutorial is completely agnostic to data science tools and languages. Practitioners at any level should be able to apply insights from this tutorial into their respective workflows.
Bio: Andrew Zaldivar is a member of the Responsible AI & Human-Centered Technology organization in Google Research. His role is to advocate for the responsible development and use of AI by disseminating and democratizing research findings from his organization. Andrew works with researchers and designers that are examining and shaping the socio-technical processes underpinning AI technologies through participatory, culturally-inclusive, and intersectional equity-oriented approaches. Before joining Google Research, Andrew was a Senior Strategist in Google’s Trust and Safety team, protecting the integrity of some of Google’s key products by utilizing machine learning to scale, optimize, and automate abuse fighting efforts. Andrew also holds a doctorate in cognitive neuroscience from the University of California, Irvine and was an Insight Data Science fellow.