ABSTRACT
Computers are increasingly used to make decisions that have significant impact on people's lives. Often, these predictions can affect different population subgroups disproportionately. As a result, the issue of fairness has received much recent interest, and a number of fairness-enhanced classifiers have appeared in the literature. This paper seeks to study the following questions: how do these different techniques fundamentally compare to one another, and what accounts for the differences? Specifically, we seek to bring attention to many under-appreciated aspects of such fairness-enhancing interventions that require investigation for these algorithms to receive broad adoption.
We present the results of an open benchmark we have developed that lets us compare a number of different algorithms under a variety of fairness measures and existing datasets. We find that although different algorithms tend to prefer specific formulations of fairness preservations, many of these measures strongly correlate with one another. In addition, we find that fairness-preserving algorithms tend to be sensitive to fluctuations in dataset composition (simulated in our benchmark by varying training-test splits) and to different forms of preprocessing, indicating that fairness interventions might be more brittle than previously thought.
- Philip Adler, Casey Falk, Sorelle A Friedler, Tionney Nix, Gabriel Rybeck, Carlos Scheidegger, Brandon Smith, and Suresh Venkatasubramanian. 2018. Auditing black-box models for indirect influence. Knowledge and Information Systems 54, 1 (2018), 95--122. Google ScholarDigital Library
- Julia Angwin, Jeff Larson, Surya Mattu, and Lauren Kirchner. 2016. Machine Bias. ProPublica (May 23, 2016).Google Scholar
- Solon Barocas and Andrew D Selbst. 2016. Big data's disparate impact. Col. L. Rev. 104 (2016), 671.Google Scholar
- Toon Calders and Sicco Verwer. 2010. Three Naive Bayes Approaches for Discrimination-Free Classification. Data Mining and Knowledge Discovery 21, 2 (2010), 277--292. Google ScholarDigital Library
- Flavio Calmon, Dennis Wei, Bhanukiran Vinzamuri, Karthikeyan Natesan Ramamurthy, and Kush R Varshney. 2017. Optimized Pre-Processing for Discrimination Prevention. In Advances in Neural Information Processing Systems 30. 3995--4004. Google ScholarDigital Library
- Alexandra Chouldechova. 2017. Fair prediction with disparate impact: A study of bias in recidivism prediction instruments. Big data 5, 2 (2017), 153--163.Google Scholar
- Cynthia Dwork, Moritz Hardt, Toniann Pitassi, Omer Reingold, and Richard Zemel. 2012. Fairness Through Awareness. In Proc. of Innovations in Theoretical Computer Science. Google ScholarDigital Library
- Danielle Ensign, Sorelle A. Friedler, Scott Neville, Carlos Scheidegger, and Suresh Venkatasubramanian. 2018. Decision Making with Limited Feedback: Error bounds for Recidivism Prediction and Predictive Policing.. In Algorithmic Learning Theory (ALT).Google Scholar
- Danielle Ensign, Sorelle A. Friedler, Scott Neville, Carlos Scheidegger, and Suresh Venkatasubramanian. 2018. Runaway Feedback Loops in Predictive Policing. In Conf. on Fairness, Accountability and Transparency in Computer Science (FAT*).Google Scholar
- Michael Feldman, Sorelle A. Friedler, John Moeller, Carlos Scheidegger, and Suresh Venkatasubramanian. 2015. Certifying and removing disparate impact. Proc. 21st ACM KDD (2015), 259--268. Google ScholarDigital Library
- Sainyam Galhotra, Yuriy Brun, and Alexandra Meliou. 2017. Fairness testing: testing software for discrimination. In Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering. ACM, 498--510. Google ScholarDigital Library
- Riccardo Guidotti, Anna Monreale, Franco Turini, Dino Pedreschi, and Fosca Giannotti. 2018. A Survey Of Methods For Explaining Black Box Models. arXiv preprint arXiv:1802.01933 (2018).Google Scholar
- Moritz Hardt, Eric Price, Nati Srebro, et al. 2016. Equality of opportunity in supervised learning. In Adv. in Neural Inf. Processing Systems. 3315--3323. Google ScholarDigital Library
- Úrsula Hébert-Johnson, Michael P. Kim, Omer Reingold, and Guy N. Rothblum. 2017. Calibration for the (Computationally-Identifiable) Masses. arXiv:1711.08513 (Nov. 2017).Google Scholar
- Shahin Jabbari, Matthew Joseph, Michael Kearns, Jamie Morgenstern, and Aaron Roth. 2017. Fairness in Reinforcement Learning. In PMLR. 1617--1626. Google ScholarDigital Library
- Matthew Joseph, Michael Kearns, Jamie Morgenstern, Seth Neel, and Aaron Roth. 2016. Fair Algorithms for Infinite and Contextual Bandits. arXiv:1610.09559 {cs} (Oct. 2016).Google Scholar
- Matthew Joseph, Michael Kearns, Jamie H Morgenstern, and Aaron Roth. 2016. Fairness in Learning: Classic and Contextual Bandits. In Advances in Neural Information Processing Systems 29. 325--333. Google ScholarDigital Library
- Faisal Kamiran and Toon Calders. 2009. Classifying without Discriminating. In Proc. of the IEEE International Conf. on Computer, Control and Communication.Google ScholarCross Ref
- F. Kamiran and T. Calders. 2012. Data preprocessing techniques for classification without discrimination. Knowledge and Information Systems 33 (2012), 1--33. Google ScholarDigital Library
- Faisal Kamiran, Toon Calders, and Mykola Pechenizkiy. 2010. Discrimination aware decision tree learning. In Proc. of IEEE Intl. Conf. on Data Mining. 869--874. Google ScholarDigital Library
- Toshihiro Kamishima, Shotaro Akaho, Hideki Asoh, and Jun Sakuma. 2012. Fairness-aware Classifier with Prejudice Remover Regularizer. Machine Learning and Knowledge Discovery in Databases (2012), 35--50.Google Scholar
- Michael Kearns, Seth Neel, Aaron Roth, and Zhiwei Steven Wu. 2017. Preventing Fairness Gerrymandering: Auditing and Learning for Subgroup Fairness. arXiv preprint arXiv:1711.05144 (2017).Google Scholar
- Jon Kleinberg, Sendhil Mullainathan, and Manish Raghavan. 2017. Inherent trade-offs in the fair determination of risk scores. In Proceedings of Innovations in Theoretical Computer Science (ITCS).Google Scholar
- David Lehr and Paul Ohm. 2017. Playing with the Data: What Legal Scholars Should Learn About Machine Learning. UC Davis Law Review 51, 2 (2017), 653--718.Google Scholar
- M. Lichman. 2013. UCI Machine Learning Repository. (2013). http://archive.ics.uci.edu/mlGoogle Scholar
- Weiwen Miao. 2011. Did the Results of Promotion Exams Have a Disparate Impact on Minorities? Using Statistical Evidence in Ricci v. DeStefano. J. of Stat. Ed. 19, 1 (2011).Google Scholar
- Arvind Narayanan. 2018. 21 Fairness Definitions and Their Politics. (Feb. 23 2018). Tutorial presented at the Conf. on Fairness, Accountability, and Transparency.Google Scholar
- Andrea Romei and Salvatore Ruggieri. 2013. A Multidisciplinary Survey on Discrimination Analysis. The Knowledge Engineering Review (April 3 2013), 1--57.Google Scholar
- Supreme Court of the United States. 2009. Ricci v. DeStefano. 557 U.S. 557, 174. (2009), 2658 pages.Google Scholar
- Florian Tramèr, Vaggelis Atlidakis, Roxana Geambasu, Daniel J. Hsu, Jean-Pierre Hubaux, Mathias Humbert, Ari Juels, and Huang Lin. 2015. Discovering Unwarranted Associations in Data-Driven Applications with the FairTest Testing Toolkit. CoRR abs/1510.02377 (2015). arXiv:1510.02377Google Scholar
- Indré Žliobaitė. 2017. Measuring discrimination in algorithmic decision making. Data Mining and Knowledge Discovery 31, 4 (July 2017), 1060--1089. Google ScholarDigital Library
- Blake Woodworth, Suriya Gunasekar, Mesrob I Ohannessian, and Nathan Srebro. 2017. Learning non-discriminatory predictors. arXiv preprint arXiv: 1702.06081 (2017).Google Scholar
- Muhammad Bilal Zafar, Isabel Valera, Manuel Gomez Rogriguez, and Krishna P Gummadi. 2017. Fairness Constraints: Mechanisms for Fair Classification. In Artificial Intelligence and Statistics. 962--970.Google Scholar
- Meike Zehlike, Carlos Castillo, Francesco Bonchi, Ricardo Baeza-Yates, Sara Hajian, and Mohamed Megahed. 2017. Fairness Measures: A Platform for Data Collection and Benchmarking in discrimination-aware ML. http://fairness-measures.org. (Jun 2017).Google Scholar
- Rich Zemel, Yu Wu, Kevin Swersky, Toni Pitassi, and Cynthia Dwork. 2013. Learning Fair Representations. In Proc. of Intl. Conf. on Machine Learning. 325--333. Google ScholarDigital Library
Index Terms
- A comparative study of fairness-enhancing interventions in machine learning
Recommendations
A Comprehensive Java Benchmark Study on Memory and Garbage Collection Behavior of DaCapo, DaCapo Scala, and SPECjvm2008
ICPE '17: Proceedings of the 8th ACM/SPEC on International Conference on Performance EngineeringBenchmark suites are an indispensable part of scientific research to compare different approaches against each another. The diversity of benchmarks is an important asset to evaluate novel approaches for effectiveness and weaknesses. In this paper, we ...
A Study of Machine Learning Inference Benchmarks
ICAIP '20: Proceedings of the 4th International Conference on Advances in Image ProcessingMachine learning (ML) is becoming a powerful tool for a variety of applications where artificial intelligence solutions are required. A ML benchmark is a standard suite to measure, evaluate and compare the performance and efficiency of ML systems. This ...
Large System Performance of SPEC OMP2001 Benchmarks
ISHPC '02: Proceedings of the 4th International Symposium on High Performance ComputingPerformance characteristics of application programs on large-scale systems are often significantly different from those on smaller systems. SPEC OMP2001 is a benchmark suite intended for measuring performance of modern shared memory parallel systems. ...
Comments