skip to main content
review-article
Free Access

Datasheets for datasets

Published:19 November 2021Publication History
Skip Abstract Section

Abstract

Documentation to facilitate communication between dataset creators and consumers.

References

  1. Andrews, D., Bonta, J., and Wormith, J. The recent past and near future of risk and/or need assessment. Crime & Delinquency 52, 1 (2006), 7--27.Google ScholarGoogle ScholarCross RefCross Ref
  2. Bender, E. and Friedman, B. Data Statements for Natural Language Processing: Toward Mitigating System Bias and Enabling Better Science. Trans. of the Assoc. for Computational Linguistics 6 (2018), 587--604.Google ScholarGoogle ScholarCross RefCross Ref
  3. Bhardwaj, A. et al. DataHub: Collaborative data science & dataset version management at scale. CoRR abs/1409.0798 (2014).Google ScholarGoogle Scholar
  4. Bolukbasi, T., Chang, K., Zou, J., Saligrama, V., and Kalai, A. Man is to computer programmer as woman is to homemaker? Debiasing Word Embeddings. In Advances in Neural Information Processing Systems (2016).Google ScholarGoogle Scholar
  5. Buolamwini, J. and Gebru, T. Gender shades: Intersectional accuracy disparities in commercial gender classification. In Proceedings of the Conf. on Fairness, Accountability, and Transparency (2018). 77--91.Google ScholarGoogle Scholar
  6. Cao, Y. and Daumé, H. Toward gender-inclusive coreference resolution. In Proceedings of the Conf. of the Assoc. for Computational Linguistics (2020). abs/1910.13913.Google ScholarGoogle Scholar
  7. Cao, Y. and Daumé, H. Toward gender-inclusive coreference resolution. In Proceedings of the Conf. of the Assoc. for Computational Linguistics (2020).Google ScholarGoogle ScholarCross RefCross Ref
  8. Cheney, J., Chiticariu, L., and Tan, W. Provenance in databases: Why, how, and where. Foundations and Trends in Databases 1, 4 (2009), 379--474.Google ScholarGoogle Scholar
  9. Chmielinski, K. et al. The dataset nutrition label (2nd Gen): Leveraging context to mitigate harms in artificial intelligence. In NeurIPS Workshop on Dataset Curation and Security, 2020.Google ScholarGoogle Scholar
  10. Choi, E. et al. QuAC: Question answering in context. In Proceedings of the 2018 Conf. on Empirical Methods in Natural Language Processing.Google ScholarGoogle Scholar
  11. Chui, G. Project will use AI to prevent or minimize electric grid failures, 2017.Google ScholarGoogle Scholar
  12. Dastin, J. Amazon scraps secret AI recruiting tool that showed bias against women, 2018; https://reut.rs/3imOH4d.Google ScholarGoogle Scholar
  13. Garvie, C., Bedoya, A., and Frankle, J. The Perpetual Line-Up: Unregulated Police Face Recognition in America. Georgetown Law, Center on Privacy & Technology, Washington, D.C., 2016.Google ScholarGoogle Scholar
  14. Hind, M. et al. Varshney. Increasing trust in AI services through supplier's declarations of conformity. CoRR abs/1808.07261 (2018).Google ScholarGoogle Scholar
  15. Holstein, K., Vaughan, J., Daumé, H, Dudík, M., and Wallach, H. Improving fairness in machine learning systems: What do industry practitioners need? In Proceedings of 2019 ACM CHI Conf. on Human Factors in Computing Systems.Google ScholarGoogle Scholar
  16. Huang, G., Ramesh, M., Berg, T., and Learned-Miller, E. Labeled Faces in the Wild: A Database for Studying Face Recognition in Unconstrained Environments. Technical Report 07-49. University of Massachusetts Amherst, 2007.Google ScholarGoogle Scholar
  17. Krasin, I. et al. OpenImages: A public dataset for large-scale multi-label and multi-class image classification, 2017.Google ScholarGoogle Scholar
  18. Lin, T. The new investor. UCLA Law Review 60 (2012), 678.Google ScholarGoogle Scholar
  19. Mann, G. and O'Neil, C. Hiring Algorithms Are Not Neutral, 2016; https://hbr.org/2016/12/hiring-algorithms-are-not-neutral.Google ScholarGoogle Scholar
  20. Mitchell, M. et al. Model cards for model reporting. In Proceedings of the Conf. on Fairness, Accountability, and Transparency (2019). 220--229.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. O'Connor, M. How AI Could Smarten Up Our Water System, 2017.Google ScholarGoogle Scholar
  22. Pang, B. and Lee, L. A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts. In Proceedings of the 42nd Annual Meeting of the Assoc. for Computational Linguistics. 2004, 271.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Seck, I., Dahmane, K., Duthon, P., and Loosli, G. Baselines and a datasheet for the Cerema AWP dataset. CoRR abs/1806.04016 (2018). http://arxiv.org/abs/1806.04016Google ScholarGoogle Scholar
  24. Doha Supply Systems. Facial Recognition, 2017.Google ScholarGoogle Scholar
  25. World Economic Forum Global Future Council on Human Rights 2016--2018. How to Prevent Discriminatory Outcomes in Machine Learning; 2018. https://www.weforum.org/whitepapers/how-to-prevent-discriminatory-outcomes-inmachine-learning.Google ScholarGoogle Scholar
  26. Yagcioglu, S., Erdem, A., Erdem, E., and Ikizler-Cinbis, N. RecipeQA: A challenge dataset for multimodal comprehension of cooking recipes. In Proceedings of the 2018 Conf. on Empirical Methods in Natural Language Processing.Google ScholarGoogle Scholar

Index Terms

  1. Datasheets for datasets

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image Communications of the ACM
        Communications of the ACM  Volume 64, Issue 12
        December 2021
        101 pages
        ISSN:0001-0782
        EISSN:1557-7317
        DOI:10.1145/3502158
        Issue’s Table of Contents

        Copyright © 2021 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 19 November 2021

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • review-article
        • Popular
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format .

      View HTML Format