research-article

A comparative study of fairness-enhancing interventions in machine learning

Authors:
Sorelle A. Friedler

Haverford College

Haverford College
View Profile

,
Carlos Scheidegger

University of Arizona

University of Arizona
View Profile

,
Suresh Venkatasubramanian

University of Utah

University of Utah
View Profile

,
Sonam Choudhary

University of Utah

University of Utah
View Profile

,
Evan P. Hamilton

Haverford College

Haverford College
View Profile

,
Derek Roth

Haverford College

Haverford College
View Profile

FAT* '19: Proceedings of the Conference on Fairness, Accountability, and TransparencyJanuary 2019Pages 329–338https://doi.org/10.1145/3287560.3287589

Published:29 January 2019Publication History

FAT* '19: Proceedings of the Conference on Fairness, Accountability, and Transparency

Pages 329–338

ABSTRACT

Computers are increasingly used to make decisions that have significant impact on people's lives. Often, these predictions can affect different population subgroups disproportionately. As a result, the issue of fairness has received much recent interest, and a number of fairness-enhanced classifiers have appeared in the literature. This paper seeks to study the following questions: how do these different techniques fundamentally compare to one another, and what accounts for the differences? Specifically, we seek to bring attention to many under-appreciated aspects of such fairness-enhancing interventions that require investigation for these algorithms to receive broad adoption.

We present the results of an open benchmark we have developed that lets us compare a number of different algorithms under a variety of fairness measures and existing datasets. We find that although different algorithms tend to prefer specific formulations of fairness preservations, many of these measures strongly correlate with one another. In addition, we find that fairness-preserving algorithms tend to be sensitive to fluctuations in dataset composition (simulated in our benchmark by varying training-test splits) and to different forms of preprocessing, indicating that fairness interventions might be more brittle than previously thought.

References

Philip Adler, Casey Falk, Sorelle A Friedler, Tionney Nix, Gabriel Rybeck, Carlos Scheidegger, Brandon Smith, and Suresh Venkatasubramanian. 2018. Auditing black-box models for indirect influence. Knowledge and Information Systems 54, 1 (2018), 95--122. Google ScholarDigital Library
Julia Angwin, Jeff Larson, Surya Mattu, and Lauren Kirchner. 2016. Machine Bias. ProPublica (May 23, 2016).Google Scholar
Solon Barocas and Andrew D Selbst. 2016. Big data's disparate impact. Col. L. Rev. 104 (2016), 671.Google Scholar
Toon Calders and Sicco Verwer. 2010. Three Naive Bayes Approaches for Discrimination-Free Classification. Data Mining and Knowledge Discovery 21, 2 (2010), 277--292. Google ScholarDigital Library
Flavio Calmon, Dennis Wei, Bhanukiran Vinzamuri, Karthikeyan Natesan Ramamurthy, and Kush R Varshney. 2017. Optimized Pre-Processing for Discrimination Prevention. In Advances in Neural Information Processing Systems 30. 3995--4004. Google ScholarDigital Library
Alexandra Chouldechova. 2017. Fair prediction with disparate impact: A study of bias in recidivism prediction instruments. Big data 5, 2 (2017), 153--163.Google Scholar
Cynthia Dwork, Moritz Hardt, Toniann Pitassi, Omer Reingold, and Richard Zemel. 2012. Fairness Through Awareness. In Proc. of Innovations in Theoretical Computer Science. Google ScholarDigital Library
Danielle Ensign, Sorelle A. Friedler, Scott Neville, Carlos Scheidegger, and Suresh Venkatasubramanian. 2018. Decision Making with Limited Feedback: Error bounds for Recidivism Prediction and Predictive Policing.. In Algorithmic Learning Theory (ALT).Google Scholar
Danielle Ensign, Sorelle A. Friedler, Scott Neville, Carlos Scheidegger, and Suresh Venkatasubramanian. 2018. Runaway Feedback Loops in Predictive Policing. In Conf. on Fairness, Accountability and Transparency in Computer Science (FAT*).Google Scholar
Michael Feldman, Sorelle A. Friedler, John Moeller, Carlos Scheidegger, and Suresh Venkatasubramanian. 2015. Certifying and removing disparate impact. Proc. 21st ACM KDD (2015), 259--268. Google ScholarDigital Library
Sainyam Galhotra, Yuriy Brun, and Alexandra Meliou. 2017. Fairness testing: testing software for discrimination. In Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering. ACM, 498--510. Google ScholarDigital Library
Riccardo Guidotti, Anna Monreale, Franco Turini, Dino Pedreschi, and Fosca Giannotti. 2018. A Survey Of Methods For Explaining Black Box Models. arXiv preprint arXiv:1802.01933 (2018).Google Scholar
Moritz Hardt, Eric Price, Nati Srebro, et al. 2016. Equality of opportunity in supervised learning. In Adv. in Neural Inf. Processing Systems. 3315--3323. Google ScholarDigital Library
Úrsula Hébert-Johnson, Michael P. Kim, Omer Reingold, and Guy N. Rothblum. 2017. Calibration for the (Computationally-Identifiable) Masses. arXiv:1711.08513 (Nov. 2017).Google Scholar
Shahin Jabbari, Matthew Joseph, Michael Kearns, Jamie Morgenstern, and Aaron Roth. 2017. Fairness in Reinforcement Learning. In PMLR. 1617--1626. Google ScholarDigital Library
Matthew Joseph, Michael Kearns, Jamie Morgenstern, Seth Neel, and Aaron Roth. 2016. Fair Algorithms for Infinite and Contextual Bandits. arXiv:1610.09559 {cs} (Oct. 2016).Google Scholar
Matthew Joseph, Michael Kearns, Jamie H Morgenstern, and Aaron Roth. 2016. Fairness in Learning: Classic and Contextual Bandits. In Advances in Neural Information Processing Systems 29. 325--333. Google ScholarDigital Library
Faisal Kamiran and Toon Calders. 2009. Classifying without Discriminating. In Proc. of the IEEE International Conf. on Computer, Control and Communication.Google ScholarCross Ref
F. Kamiran and T. Calders. 2012. Data preprocessing techniques for classification without discrimination. Knowledge and Information Systems 33 (2012), 1--33. Google ScholarDigital Library
Faisal Kamiran, Toon Calders, and Mykola Pechenizkiy. 2010. Discrimination aware decision tree learning. In Proc. of IEEE Intl. Conf. on Data Mining. 869--874. Google ScholarDigital Library
Toshihiro Kamishima, Shotaro Akaho, Hideki Asoh, and Jun Sakuma. 2012. Fairness-aware Classifier with Prejudice Remover Regularizer. Machine Learning and Knowledge Discovery in Databases (2012), 35--50.Google Scholar
Michael Kearns, Seth Neel, Aaron Roth, and Zhiwei Steven Wu. 2017. Preventing Fairness Gerrymandering: Auditing and Learning for Subgroup Fairness. arXiv preprint arXiv:1711.05144 (2017).Google Scholar
Jon Kleinberg, Sendhil Mullainathan, and Manish Raghavan. 2017. Inherent trade-offs in the fair determination of risk scores. In Proceedings of Innovations in Theoretical Computer Science (ITCS).Google Scholar
David Lehr and Paul Ohm. 2017. Playing with the Data: What Legal Scholars Should Learn About Machine Learning. UC Davis Law Review 51, 2 (2017), 653--718.Google Scholar
M. Lichman. 2013. UCI Machine Learning Repository. (2013). http://archive.ics.uci.edu/mlGoogle Scholar
Weiwen Miao. 2011. Did the Results of Promotion Exams Have a Disparate Impact on Minorities? Using Statistical Evidence in Ricci v. DeStefano. J. of Stat. Ed. 19, 1 (2011).Google Scholar
Arvind Narayanan. 2018. 21 Fairness Definitions and Their Politics. (Feb. 23 2018). Tutorial presented at the Conf. on Fairness, Accountability, and Transparency.Google Scholar
Andrea Romei and Salvatore Ruggieri. 2013. A Multidisciplinary Survey on Discrimination Analysis. The Knowledge Engineering Review (April 3 2013), 1--57.Google Scholar
Supreme Court of the United States. 2009. Ricci v. DeStefano. 557 U.S. 557, 174. (2009), 2658 pages.Google Scholar
Florian Tramèr, Vaggelis Atlidakis, Roxana Geambasu, Daniel J. Hsu, Jean-Pierre Hubaux, Mathias Humbert, Ari Juels, and Huang Lin. 2015. Discovering Unwarranted Associations in Data-Driven Applications with the FairTest Testing Toolkit. CoRR abs/1510.02377 (2015). arXiv:1510.02377Google Scholar
Indré Žliobaitė. 2017. Measuring discrimination in algorithmic decision making. Data Mining and Knowledge Discovery 31, 4 (July 2017), 1060--1089. Google ScholarDigital Library
Blake Woodworth, Suriya Gunasekar, Mesrob I Ohannessian, and Nathan Srebro. 2017. Learning non-discriminatory predictors. arXiv preprint arXiv: 1702.06081 (2017).Google Scholar
Muhammad Bilal Zafar, Isabel Valera, Manuel Gomez Rogriguez, and Krishna P Gummadi. 2017. Fairness Constraints: Mechanisms for Fair Classification. In Artificial Intelligence and Statistics. 962--970.Google Scholar
Meike Zehlike, Carlos Castillo, Francesco Bonchi, Ricardo Baeza-Yates, Sara Hajian, and Mohamed Megahed. 2017. Fairness Measures: A Platform for Data Collection and Benchmarking in discrimination-aware ML. http://fairness-measures.org. (Jun 2017).Google Scholar
Rich Zemel, Yu Wu, Kevin Swersky, Toni Pitassi, and Cynthia Dwork. 2013. Learning Fair Representations. In Proc. of Intl. Conf. on Machine Learning. 325--333. Google ScholarDigital Library

Index Terms

A comparative study of fairness-enhancing interventions in machine learning
1. Computing methodologies
  1. Machine learning
2. Software and its engineering
  1. Software notations and tools
    1. Software libraries and repositories

Recommendations

A Comprehensive Java Benchmark Study on Memory and Garbage Collection Behavior of DaCapo, DaCapo Scala, and SPECjvm2008
ICPE '17: Proceedings of the 8th ACM/SPEC on International Conference on Performance Engineering

Benchmark suites are an indispensable part of scientific research to compare different approaches against each another. The diversity of benchmarks is an important asset to evaluate novel approaches for effectiveness and weaknesses. In this paper, we ...
Read More
A Study of Machine Learning Inference Benchmarks
ICAIP '20: Proceedings of the 4th International Conference on Advances in Image Processing

Machine learning (ML) is becoming a powerful tool for a variety of applications where artificial intelligence solutions are required. A ML benchmark is a standard suite to measure, evaluate and compare the performance and efficiency of ML systems. This ...
Read More
Large System Performance of SPEC OMP2001 Benchmarks
ISHPC '02: Proceedings of the 4th International Symposium on High Performance Computing

Performance characteristics of application programs on large-scale systems are often significantly different from those on smaller systems. SPEC OMP2001 is a benchmark suite intended for measuring performance of modern shared memory parallel systems. ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

FAT* '19: Proceedings of the Conference on Fairness, Accountability, and Transparency
January 2019
388 pages
ISBN:9781450361255
DOI:10.1145/3287560

Copyright © 2019 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 29 January 2019
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Fairness-aware machine learning
benchmarks
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Upcoming Conference

FAccT '24

The 2024 ACM Conference on Fairness, Accountability, and Transparency

June 3 - 6, 2024

Rio de Janeiro , Brazil
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 267
  Total Citations
  View Citations
- 5,229
  Total Downloads
- Downloads (Last 12 months)682
- Downloads (Last 6 weeks)80
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

A comparative study of fairness-enhancing interventions in machine learning

FAT* '19: Proceedings of the Conference on Fairness, Accountability, and Transparency

ABSTRACT

References

Cited By

Index Terms

Recommendations

A Comprehensive Java Benchmark Study on Memory and Garbage Collection Behavior of DaCapo, DaCapo Scala, and SPECjvm2008

A Study of Machine Learning Inference Benchmarks

Large System Performance of SPEC OMP2001 Benchmarks

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

A comparative study of fairness-enhancing interventions in machine learning

FAT* '19: Proceedings of the Conference on Fairness, Accountability, and Transparency

ABSTRACT

References

Cited By

Index Terms

Recommendations

A Comprehensive Java Benchmark Study on Memory and Garbage Collection Behavior of DaCapo, DaCapo Scala, and SPECjvm2008

A Study of Machine Learning Inference Benchmarks

Large System Performance of SPEC OMP2001 Benchmarks

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media