Reinforcement Learning from Human Feedback in LLMs: Whose Culture, Whose Values, Whose Perspectives?

Philosophy and Technology 38 (2):1-26 (2025)
  Copy   BIBTEX

Abstract

We argue for the epistemic and ethical advantages of pluralism in Reinforcement Learning from Human Feedback (RLHF) in the context of Large Language Models (LLMs). Drawing on social epistemology and pluralist philosophy of science, we suggest ways in which RHLF can be made more responsive to human needs and how we can address challenges along the way. The paper concludes with an agenda for change, i.e. concrete, actionable steps to improve LLM development.

Other Versions

No versions found

Links

PhilArchive

    This entry is not archived by us. If you are the author and have permission from the publisher, we recommend that you archive it. Many publishers automatically grant permission to authors to archive pre-prints. By uploading a copy of your work, you will enable us to better index it, making it easier to find.

    Upload a copy of this work     Papers currently archived: 103,839

External links

Setup an account with your affiliations in order to access resources via your University's proxy server

Through your library

Similar books and articles

Addressing Social Misattributions of Large Language Models: An HCXAI-based Approach.Andrea Ferrario, Alberto Termine & Alessandro Facchini - forthcoming - Available at Https://Arxiv.Org/Abs/2403.17873 (Extended Version of the Manuscript Accepted for the Acm Chi Workshop on Human-Centered Explainable Ai 2024 (Hcxai24).

Analytics

Added to PP
2025-03-21

Downloads
0

6 months
0

Historical graph of downloads

Sorry, there are not enough data points to plot this chart.
How can I increase my downloads?

Author Profiles

Simon Lohse
Radboud University Nijmegen
Henk W. de Regt
Radboud University

Citations of this work

No citations found.

Add more citations