Abstract
Documentation to facilitate communication between dataset creators and consumers.
- Andrews, D., Bonta, J., and Wormith, J. The recent past and near future of risk and/or need assessment. Crime & Delinquency 52, 1 (2006), 7--27.Google ScholarCross Ref
- Bender, E. and Friedman, B. Data Statements for Natural Language Processing: Toward Mitigating System Bias and Enabling Better Science. Trans. of the Assoc. for Computational Linguistics 6 (2018), 587--604.Google ScholarCross Ref
- Bhardwaj, A. et al. DataHub: Collaborative data science & dataset version management at scale. CoRR abs/1409.0798 (2014).Google Scholar
- Bolukbasi, T., Chang, K., Zou, J., Saligrama, V., and Kalai, A. Man is to computer programmer as woman is to homemaker? Debiasing Word Embeddings. In Advances in Neural Information Processing Systems (2016).Google Scholar
- Buolamwini, J. and Gebru, T. Gender shades: Intersectional accuracy disparities in commercial gender classification. In Proceedings of the Conf. on Fairness, Accountability, and Transparency (2018). 77--91.Google Scholar
- Cao, Y. and Daumé, H. Toward gender-inclusive coreference resolution. In Proceedings of the Conf. of the Assoc. for Computational Linguistics (2020). abs/1910.13913.Google Scholar
- Cao, Y. and Daumé, H. Toward gender-inclusive coreference resolution. In Proceedings of the Conf. of the Assoc. for Computational Linguistics (2020).Google ScholarCross Ref
- Cheney, J., Chiticariu, L., and Tan, W. Provenance in databases: Why, how, and where. Foundations and Trends in Databases 1, 4 (2009), 379--474.Google Scholar
- Chmielinski, K. et al. The dataset nutrition label (2nd Gen): Leveraging context to mitigate harms in artificial intelligence. In NeurIPS Workshop on Dataset Curation and Security, 2020.Google Scholar
- Choi, E. et al. QuAC: Question answering in context. In Proceedings of the 2018 Conf. on Empirical Methods in Natural Language Processing.Google Scholar
- Chui, G. Project will use AI to prevent or minimize electric grid failures, 2017.Google Scholar
- Dastin, J. Amazon scraps secret AI recruiting tool that showed bias against women, 2018; https://reut.rs/3imOH4d.Google Scholar
- Garvie, C., Bedoya, A., and Frankle, J. The Perpetual Line-Up: Unregulated Police Face Recognition in America. Georgetown Law, Center on Privacy & Technology, Washington, D.C., 2016.Google Scholar
- Hind, M. et al. Varshney. Increasing trust in AI services through supplier's declarations of conformity. CoRR abs/1808.07261 (2018).Google Scholar
- Holstein, K., Vaughan, J., Daumé, H, Dudík, M., and Wallach, H. Improving fairness in machine learning systems: What do industry practitioners need? In Proceedings of 2019 ACM CHI Conf. on Human Factors in Computing Systems.Google Scholar
- Huang, G., Ramesh, M., Berg, T., and Learned-Miller, E. Labeled Faces in the Wild: A Database for Studying Face Recognition in Unconstrained Environments. Technical Report 07-49. University of Massachusetts Amherst, 2007.Google Scholar
- Krasin, I. et al. OpenImages: A public dataset for large-scale multi-label and multi-class image classification, 2017.Google Scholar
- Lin, T. The new investor. UCLA Law Review 60 (2012), 678.Google Scholar
- Mann, G. and O'Neil, C. Hiring Algorithms Are Not Neutral, 2016; https://hbr.org/2016/12/hiring-algorithms-are-not-neutral.Google Scholar
- Mitchell, M. et al. Model cards for model reporting. In Proceedings of the Conf. on Fairness, Accountability, and Transparency (2019). 220--229.Google ScholarDigital Library
- O'Connor, M. How AI Could Smarten Up Our Water System, 2017.Google Scholar
- Pang, B. and Lee, L. A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts. In Proceedings of the 42nd Annual Meeting of the Assoc. for Computational Linguistics. 2004, 271.Google ScholarDigital Library
- Seck, I., Dahmane, K., Duthon, P., and Loosli, G. Baselines and a datasheet for the Cerema AWP dataset. CoRR abs/1806.04016 (2018). http://arxiv.org/abs/1806.04016Google Scholar
- Doha Supply Systems. Facial Recognition, 2017.Google Scholar
- World Economic Forum Global Future Council on Human Rights 2016--2018. How to Prevent Discriminatory Outcomes in Machine Learning; 2018. https://www.weforum.org/whitepapers/how-to-prevent-discriminatory-outcomes-inmachine-learning.Google Scholar
- Yagcioglu, S., Erdem, A., Erdem, E., and Ikizler-Cinbis, N. RecipeQA: A challenge dataset for multimodal comprehension of cooking recipes. In Proceedings of the 2018 Conf. on Empirical Methods in Natural Language Processing.Google Scholar
Index Terms
- Datasheets for datasets
Recommendations
Datasheets for Energy Datasets: An Ethically-Minded Approach to Documentation
e-Energy '23 Companion: Companion Proceedings of the 14th ACM International Conference on Future Energy SystemsThis work presents an argument for the use of specific documentation for the ethical development, use, and sharing of energy datasets, and an evaluation of current practice in the energy AI community. Drawing on a recently developed resource from the ...
Augmented Datasheets for Speech Datasets and Ethical Decision-Making
FAccT '23: Proceedings of the 2023 ACM Conference on Fairness, Accountability, and TransparencySpeech datasets are crucial for training Speech Language Technologies (SLT); however, the lack of diversity of the underlying training data can lead to serious limitations in building equitable and robust SLT products, especially along dimensions of ...
What is in our datasets?: describing a structure of datasets
ACSW '16: Proceedings of the Australasian Computer Science Week MulticonferenceIn order to facilitate research based on datasets in empirical software engineering, the meaning of data must be able to be interpreted correctly. Datasets contain measurements that are associated with metrics and entities. In some datasets, it is not ...
Comments