Who Decides if AI is Fair? The Labels Problem in Algorithmic Auditing. (arXiv:2111.08723v1 [cs.CL])

Labelled “ground truth” datasets are routinely used to evaluate and audit AI
algorithms applied in high-stakes settings. However, there do not exist widely
accepted benchmarks for the quality of labels in these datasets. We provide
empirical evidence that quality of labels can significantly distort the results
of algorithmic audits in real-world settings. Using data annotators typically
hired by AI firms in India, we show that fidelity of the ground truth data can
lead to spurious differences in performance of ASRs between urban and rural
populations. After a rigorous, albeit expensive, label cleaning process, these
disparities between groups disappear. Our findings highlight how trade-offs
between label quality and data annotation costs can complicate algorithmic
audits in practice. They also emphasize the need for development of
consensus-driven, widely accepted benchmarks for label quality.

Source: https://arxiv.org/abs/2111.08723


Related post