Possibilities and challenges in the moral growth of large language models: a philosophical perspective

Ethics and Information Technology 27 (1):1-11 (2025)
  Copy   BIBTEX

Abstract

With the rapid expansion of parameters in large language models (LLMs) and the application of Reinforcement Learning with Human Feedback (RLHF), there has been a noticeable growth in the moral competence of LLMs. However, several questions warrant further exploration: Is it really possible for LLMs to fully align with human values through RLHF? How can the current moral growth be philosophically contextualized? We identify similarities between LLMs’ moral growth and Deweyan ethics in terms of the discourse of human moral development. We then attempt to use Dewey’s theory on an experimental basis to examine and further explain the extent to which the current alignment pathway enables the development of LLMs. A beating experiment serves as the foundational case for analyzing LLMs’ moral competence across various parameters and stages, including basic moral cognition, moral dilemma judgment, and moral behavior. The results demonstrate that the moral competence of the GPT series has seen a significant improvement, and Dewey’s Impulse-Habit-Character theory of moral development can be used to explain this: the moral competence of LLMs has been enhanced through experience-based learning, supported by human feedback. Nevertheless, LLMs’ moral development through RLHF remains constrained and does not reach the character stage described by Dewey, possibly due to their lack of self-consciousness. This fundamental difference between humans and LLMs underscores both the limitations of LLMs’ moral growth and the challenges of applying RLHF for AI alignment. It also emphasizes the need for external societal governance and legal regulation.

Other Versions

No versions found

Links

PhilArchive



    Upload a copy of this work     Papers currently archived: 103,401

External links

Setup an account with your affiliations in order to access resources via your University's proxy server

Through your library

Similar books and articles

Large Language Models and the Reverse Turing Test.Terrence Sejnowski - 2023 - Neural Computation 35 (3):309–342.

Analytics

Added to PP
2024-12-21

Downloads
14 (#1,321,670)

6 months
14 (#181,413)

Historical graph of downloads
How can I increase my downloads?

Author Profiles

Citations of this work

No citations found.

Add more citations

References found in this work

Intelligence without representation.Rodney A. Brooks - 1991 - Artificial Intelligence 47 (1--3):139-159.
Artificial Intelligence, Values, and Alignment.Iason Gabriel - 2020 - Minds and Machines 30 (3):411-437.
The Trolley Problem.Judith Thomson - 1985 - Yale Law Journal 94 (6):1395-1415.

View all 8 references / Add more references