Understanding Stereotypes in Language Models: Towards Robust Measurement and Zero-Shot Debiasing

Mattern, Justus; Jin, Zhijing; Sachan, Mrinmaya; Mihalcea, Rada; Schölkopf, Bernhard

doi:10.48550/ARXIV.2212.10678

Download

Full text (PDF, 421.6Kb)

Open access

Author

Date

2022-12-22

Type

Working Paper

ETH Bibliography

yes

Altmetrics

Download

Full text (PDF, 421.6Kb)

Rights / license

Creative Commons Attribution 4.0 International

Abstract

Generated texts from large pretrained language models have been shown to exhibit a variety of harmful, human-like biases about various demographics. These findings prompted large efforts aiming to understand and measure such effects, with the goal of providing benchmarks that can guide the developm Show more

Permanent link

https://doi.org/10.3929/ethz-b-000653515

Publication status

published

External links

https://doi.org/10.48550/ARXIV.2212.10678

arxiv:2212.10678

Journal / series

arXiv

Pages / Article No.

2212.10678

Publisher

Cornell University

Edition / version

v1

Subject

Computation and Language (cs.CL); Machine Learning (cs.LG); FOS: Computer and information sciences

Organisational unit

09684 - Sachan, Mrinmaya / Sachan, Mrinmaya

More

Show all metadata

ETH Bibliography

yes

Altmetrics

Research Collection

Search

Understanding Stereotypes in Language Models: Towards Robust Measurement and Zero-Shot Debiasing Mendeley CSV RIS BibTeX

Understanding Stereotypes in Language Models: Towards Robust Measurement and Zero-Shot Debiasing

Mendeley

CSV

RIS

BibTeX