Data-Centric Factors in Algorithmic Fairness
OPEN ACCESS
Author / Producer
Date
2022-07
Publication Type
Conference Paper
ETH Bibliography
yes
Citations
Altmetric
OPEN ACCESS
Data
Rights / License
Abstract
Notwithstanding the widely held view that data generation and data curation processes are prominent sources of bias in machine learning algorithms, there is little empirical research seeking to document and understand the specific data dimensions affecting algorithmic unfairness. Contra the previous work, which has focused on modeling using simple, small-scale benchmark datasets, we hold the model constant and methodically intervene on relevant dimensions of a much larger, more diverse dataset. For this purpose, we introduce a new dataset on recidivism in 1.5 million criminal cases from courts in the U.S. state of Wisconsin, 2000-2018. From this main dataset, we generate multiple auxiliary datasets to simulate different kinds of biases in the data. Focusing on algorithmic bias toward different race/ethnicity groups, we assess the relevance of training data size, base rate difference between groups, representation of groups in the training data, temporal aspects of data curation, including race/ethnicity or neighborhood characteristics as features, and training separate classifiers by race/ethnicity or crime type. We find that these factors often do influence fairness metrics holding the classifier specification constant, without having a corresponding effect on accuracy metrics. The methodology and the results in the paper provide a useful reference point for a data-centric approach to studying algorithmic fairness in recidivism prediction and beyond.
Permanent link
Publication status
published
External links
Editor
Book title
AIES '22: Proceedings of the 2022 AAAI/ACM Conference on AI, Ethics, and Society
Journal / series
Volume
Pages / Article No.
396 - 410
Publisher
Association for Computing Machinery
Event
AAAI/ACM Conference on AI, Ethics, and Society (AIES 2022)
Edition / version
Methods
Software
Geographic location
Date collected
Date created
Subject
Algorithmic Fairness; Datasets; Recidivism Prediction; Machine Learning
Organisational unit
09627 - Ash, Elliott / Ash, Elliott