The Data Cards Playbook: A Toolkit for Transparency in Dataset Documentation

Abstract: 

Data Cards are transparency artifacts that provide structured summaries of ML datasets with explanations of processes and rationale that shape the data. They also describe how the data may be used to train or evaluate ML models. In practice, two critical factors determine the success of a transparency artifact: (1) the ability to identify the information decision-makers use and (2) the establishment of processes and guidance needed to acquire that information. To initiate practice-oriented foundations in transparency that support responsible AI development in cross-functional groups and organizations, we created the Data Cards Playbook — an open-sourced, self-service, comprehensive toolkit consisting of participatory activities, frameworks, and guidance designed to address specific challenges faced by teams, product areas and companies when setting up an AI dataset transparency effort. Through a multi-pronged approach that included surveys, artifact analysis, interviews, and workshops; as well as through insights from design, policy, and technology experts across the industry and academia, we assembled and organized the Playbook into four modules: (1) Ask, (2) Inspect, (3) Answer, and (4) Audit. The Ask and Inspect modules help create and evaluate Data Card templates for organizational needs and principles. The Answer and Audit modules help data teams complete the templates and evaluate the resulting Data Cards. In this tutorial, to help others foster transparent, purposeful and human-centered documentation of datasets within the practical contexts of industry and research, we will walk through each of these modules, highlighting scalable concepts and frameworks that explore foundational aspects of transparency. We will also provide evidence-based patterns to help anticipate challenges faced when producing transparent documentation.

Background Knowlege:

This proposed tutorial is completely agnostic to data science tools and languages. Practitioners at any level should be able to apply insights from this tutorial into their respective workflows.

Bio: 

Mahima Pushkarna is a design lead at the People + AI Research Initiative at Google. She brings design thinking and human-centered design into Human-AI Research. Her work explores advanced technologies, including generative AI, and draws from a mix of human-centered, participatory, and speculative design practices to bridge the gap between upstream developer practices and their impact on end user experiences and society. Mahima has designed tools and frameworks for explainability and interpretability that are widely used across industries and academia. She believes design can be a powerful tool for understanding and addressing the needs of people impacted by technology. Mahima is also interested in exploring the intersection of design, technology, and society, and is always looking for new ways to use design to make the world a better place.

Mahima holds a masters degree in Information Design and Data Visualization from Northeastern University, Boston, MA. She has published in leading academic journals and conferences, including IEEE Vis, FAccT, and workshops at NeuRIPS. Prior to Google, Mahima worked as a product designer at Innovation by Design, a global think-tank, consulted at MIT's Design Lab, and designed visualization tools at Ion Interactive. This bio was written with assistance from a language-driven model.

Open Data Science

 

 

 

Open Data Science
One Broadway
Cambridge, MA 02142
info@odsc.com

Privacy Settings
We use cookies to enhance your experience while using our website. If you are using our Services via a browser you can restrict, block or remove cookies through your web browser settings. We also use content and scripts from third parties that may use tracking technologies. You can selectively provide your consent below to allow such third party embeds. For complete information about the cookies we use, data we collect and how we process them, please check our Privacy Policy
Youtube
Consent to display content from Youtube
Vimeo
Consent to display content from Vimeo
Google Maps
Consent to display content from Google