Search results for `AI Alignment`

Artificial Intelligence Safety in Philosophy of Cognitive Science

Export citation

Bookmark

1 citation

263

Values in science and AI alignment research.Leonard Dung - manuscript

Roughly, empirical AI alignment research (AIA) is an area of AI research which investigates empirically how to design AI systems in line with human goals. This paper examines the role of non-epistemic values in AIA. It argues that: (1) Sciences differ in the degree to which values influence them. (2) AIA is strongly value-laden. (3) This influence of values is managed inappropriately and thus threatens AIA’s epistemic integrity and ethical beneficence. (4) AIA should strive to achieve value transparency, critical (...)

Representation in Artificial Intelligence in Philosophy of Cognitive Science

Science and Values in General Philosophy of Science

Export citation

Bookmark

56

Expanding AI and AI Alignment Discourse: An Opportunity for Greater Epistemic Inclusion.A. E. Williams - manuscript

The AI and AI alignment communities have been instrumental in addressing existential risks, developing alignment methodologies, and promoting rationalist problem-solving approaches. However, as AI research ventures into increasingly uncertain domains, there is a risk of premature epistemic convergence, where prevailing methodologies influence not only the evaluation of ideas but also determine which ideas are considered within the discourse. This paper examines critical epistemic blind spots in AI alignment research, particularly the lack of predictive frameworks to differentiate problems (...)

No categories

Artificial Intelligence Safety in Philosophy of Cognitive Science

Export citation

Bookmark

335

Disagreement, AI alignment, and bargaining.Harry R. Lloyd - forthcoming - Philosophical Studies:1-31.

New AI technologies have the potential to cause unintended harms in diverse domains including warfare, judicial sentencing, biomedicine and governance. One strategy for realising the benefits of AI whilst avoiding its potential dangers is to ensure that new AIs are properly ‘aligned’ with some form of ‘alignment target.’ One danger of this strategy is that – dependent on the alignment target chosen – our AIs might optimise for objectives that reflect the values only of a certain subset of (...)

Autonomous Weapons in Philosophy of Cognitive Science

Direct download (5 more)

Export citation

Bookmark

25

Beyond Preferences in AI Alignment.Tan Zhi-Xuan, Micah Carroll, Matija Franklin & Hal Ashton - forthcoming - Philosophical Studies:1-51.

The dominant practice of AI alignment assumes (1) that preferences are an adequate representation of human values, (2) that human rationality can be understood in terms of maximizing the satisfaction of preferences, and (3) that AI systems should be aligned with the preferences of one or more humans to ensure that they behave safely and in accordance with our values. Whether implicitly followed or explicitly endorsed, these commitments constitute what we term apreferentistapproach to AI alignment. In this paper, (...)

No categories

Export citation

Bookmark

42

Philosophical Investigations into AI Alignment: A Wittgensteinian Framework.José Antonio Pérez-Escobar & Deniz Sarikaya - 2024 - Philosophy and Technology 37 (3):1-25.

We argue that the later Wittgenstein’s philosophy of language and mathematics, substantially focused on rule-following, is relevant to understand and improve on the Artificial Intelligence (AI) alignment problem: his discussions on the categories that influence alignment between humans can inform about the categories that should be controlled to improve on the alignment problem when creating large data sets to be used by supervised and unsupervised learning algorithms, as well as when introducing hard coded guardrails for AI models. (...)

No categories

Export citation

Bookmark

3 citations

929

AI, alignment, and the categorical imperative.Fritz McDonald - 2023 - AI and Ethics 3:337-344.

Tae Wan Kim, John Hooker, and Thomas Donaldson make an attempt, in recent articles, to solve the alignment problem. As they define the alignment problem, it is the issue of how to give AI systems moral intelligence. They contend that one might program machines with a version of Kantian ethics cast in deontic modal logic. On their view, machines can be aligned with human values if such machines obey principles of universalization and autonomy, as well as a deontic (...)

Artificial Intelligence Methodology in Philosophy of Cognitive Science

Export citation

Bookmark

94

Social Choice Should Guide AI Alignment in Dealing with Diverse Human Feedback.Vincent Conitzer, Rachel Freedman, Jobst Heitzig, Wesley H. Holliday, Bob M. Jacobs, Nathan Lambert, Milan Mosse, Eric Pacuit, Stuart Russell, Hailey Schoelkopf, Emanuel Tewolde & William S. Zwicker - forthcoming - Proceedings of the Forty-First International Conference on Machine Learning.

Foundation models such as GPT-4 are fine-tuned to avoid unsafe or otherwise problematic behavior, such as helping to commit crimes or producing racist text. One approach to fine-tuning, called reinforcement learning from human feedback, learns from humans' expressed preferences over multiple outputs. Another approach is constitutional AI, in which the input from humans is a list of high-level principles. But how do we deal with potentially diverging input from humans? How can we aggregate the input into consistent data about "collective" (...)

Artificial Intelligence Safety in Philosophy of Cognitive Science

Ethics of Artificial Intelligence, Misc in Philosophy of Cognitive Science

Large Language Models in Philosophy of Cognitive Science

Reinforcement Learning in Philosophy of Cognitive Science

Social Choice Theory, Misc in Social and Political Philosophy

Export citation

Bookmark

17

Kantian Fallibilist Ethics for AI alignment.Vadim Chaly - 2024 - Journal of Philosophical Investigations 18 (47):303-318.

The problem of AI alignment has parallels in Kantian ethics and can benefit from its concepts and arguments. The Kantian framework allows us to better answer the question of what exactly AI is being aligned to, what are the problems of alignment of rational agents in general, and what are the prospects for achieving a state of alignment. Having described the state of discussions about alignment in AI, I will reformulate them in Kantian terms. Thus, the (...)

No categories

Philosophy of Artificial Intelligence in Philosophy of Cognitive Science

Export citation

Bookmark

13

Reflections on the AI alignment problem.Dan Bruiger - forthcoming - AI and Society:1-10.

The Alignment Problem in artificial intelligence concerns how to insure that artificial general intelligence (AGI) conforms to human goals and values and remains under human control. The concept of general intelligence, modelled on human and animal behavior, lacks coherence. The ideal of autonomy inherent in AGI conflicts with the ideal of external control. Truly autonomous agents are necessarily _embodied,_ but embodiment implies more than physical instantiation or sensory input. It means being an autopoietic system (like a natural organism), with (...)

Export citation

Bookmark

40

Aesthetic Value and the AI Alignment Problem.Alice C. Helliwell - 2024 - Philosophy and Technology 37 (4):1-21.

The threat from possible future superintelligent AI has given rise to discussion of the so-called “value alignment problem”. This is the problem of how to ensure artificially intelligent systems align with human values, and thus (hopefully) mitigate risks associated with them. Naturally, AI value alignment is often discussed in relation to morally relevant values, such as the value of human lives or human wellbeing. However, solutions to the value alignment problem target all human values, not only morally (...)

No categories

Export citation

Bookmark

59

Calibrating machine behavior: a challenge for AI alignment.Erez Firt - 2023 - Ethics and Information Technology 25 (3):1-8.

When discussing AI alignment, we usually refer to the problem of teaching or training advanced autonomous AI systems to make decisions that are aligned with human values or preferences. Proponents of this approach believe it can be employed as means to stay in control over sophisticated intelligent systems, thus avoiding certain existential risks. We identify three general obstacles on the path to implementation of value alignment: a technological/technical obstacle, a normative obstacle, and a calibration problem. Presupposing, for the (...)

Computer Ethics in Applied Ethics

Export citation

Bookmark

1 citation

86

Discovering Our Blind Spots and Cognitive Biases in AI Research and Alignment.A. E. Williams - manuscript

The challenge of AI alignment is not just a technological issue but fundamentally an epistemic one. AI safety research predominantly relies on empirical validation, often detecting failures only after they manifest. However, certain risks—such as deceptive alignment and goal misspecification—may not be empirically testable until it is too late, necessitating a shift toward leading-indicator logical reasoning. This paper explores how mainstream AI research systematically filters out deep epistemic insight, hindering progress in AI safety. We assess the rarity of (...)

No categories

Ethics of Artificial Intelligence, Misc in Philosophy of Cognitive Science

Export citation

Bookmark

126

Aligning artificial intelligence with moral intuitions: an intuitionist approach to the alignment problem.Dario Cecchini, Michael Pflanzer & Veljko Dubljevic - 2024 - AI and Ethics:1-11.

As artificial intelligence (AI) continues to advance, one key challenge is ensuring that AI aligns with certain values. However, in the current diverse and democratic society, reaching a normative consensus is complex. This paper delves into the methodological aspect of how AI ethicists can effectively determine which values AI should uphold. After reviewing the most influential methodologies, we detail an intuitionist research agenda that offers guidelines for aligning AI applications with a limited set of reliable moral intuitions, each underlying a (...)

Moral Intuition in Normative Ethics

Export citation

Bookmark

1 citation

13

A Note on “Philosophical Investigations into AI Alignment: A Wittgensteinean Framework” by J.A. Pérez-Escobar and D. Sarikaya. [REVIEW]Sorin Bangu - 2024 - Philosophy and Technology 37 (3):1-5.

No categories

Philosophy of Cognitive Science

Export citation

Bookmark

17

Democratizing value alignment: from authoritarian to democratic AI ethics.Linus Ta-Lun Huang, Gleb Papyshev & James K. Wong - 2024 - AI and Ethics.

Value alignment is essential for ensuring that AI systems act in ways that are consistent with human values. Existing approaches, such as reinforcement learning with human feedback and constitutional AI, however, exhibit power asymmetries and lack transparency. These “authoritarian” approaches fail to adequately accommodate a broad array of human opinions, raising concerns about whose values are being prioritized. In response, we introduce the Dynamic Value Alignment approach, theoretically grounded in the principles of parallel constraint satisfaction, which models moral (...)

Applied Ethics

Philosophy of Computing and Information

Social and Political Philosophy

Value Theory, Miscellaneous

Philosophy of Artificial Intelligence in Philosophy of Cognitive Science

Export citation

Bookmark

165

Applying AI for social good: Aligning academic journal ratings with the United Nations Sustainable Development Goals (SDGs).David Steingard, Marcello Balduccini & Akanksha Sinha - 2023 - AI and Society 38 (2):613-629.

This paper offers three contributions to the burgeoning movements of AI for Social Good (AI4SG) and AI and the United Nations Sustainable Development Goals (SDGs). First, we introduce the SDG-Intense Evaluation framework (SDGIE) that aims to situate variegated automated/AI models in a larger ecosystem of computational approaches to advance the SDGs. To foster knowledge collaboration for solving complex social and environmental problems encompassed by the SDGs, the SDGIE framework details a benchmark structure of data-algorithm-output to effectively standardize AI approaches to (...)

Philosophy of Artificial Intelligence in Philosophy of Cognitive Science

Export citation

Bookmark

108

Aligning artificial intelligence with human values: reflections from a phenomenological perspective.Shengnan Han, Eugene Kelly, Shahrokh Nikou & Eric-Oluf Svee - 2022 - AI and Society 37 (4):1383-1395.

Artificial Intelligence (AI) must be directed at humane ends. The development of AI has produced great uncertainties of ensuring AI alignment with human values (AI value alignment) through AI operations from design to use. For the purposes of addressing this problem, we adopt the phenomenological theories of material values and technological mediation to be that beginning step. In this paper, we first discuss the AI value alignment from the relevant AI studies. Second, we briefly present what are (...)

Artificial Intelligence Safety in Philosophy of Cognitive Science

Export citation

Bookmark

5 citations

191

Is Alignment Unsafe?Cameron Domenico Kirk-Giannini - 2024 - Philosophy and Technology 37 (110):1–4.

Inchul Yum (2024) argues that the widespread adoption of language agent architectures would likely increase the risk posed by AI by simplifying the process of aligning artificial systems with human values and thereby making it easier for malicious actors to use them to cause a variety of harms. Yum takes this to be an example of a broader phenomenon: progress on the alignment problem is likely to be net safety-negative because it makes artificial systems easier for malicious actors to (...)

Explainability in Artificial Intelligence in Philosophy of Cognitive Science

Interpretability in Artificial Intelligence in Philosophy of Cognitive Science

Large Language Models in Philosophy of Cognitive Science

Direct download (4 more)

Export citation

Bookmark

42

AI in the noosphere: an alignment of scientific and wisdom traditions.Stephen D. Edwards - 2021 - AI and Society 36 (1):397-399.

Philosophy of Artificial Intelligence in Philosophy of Cognitive Science

Artificial Intelligence Safety in Philosophy of Cognitive Science

Export citation

Bookmark

431

Artificial Intelligence, Values, and Alignment.Iason Gabriel - 2020 - Minds and Machines 30 (3):411-437.

This paper looks at philosophical questions that arise in the context of AI alignment. It defends three propositions. First, normative and technical aspects of the AI alignment problem are interrelated, creating space for productive engagement between people working in both domains. Second, it is important to be clear about the goal of alignment. There are significant differences between AI that aligns with instructions, intentions, revealed preferences, ideal preferences, interests and values. A principle-based approach to AI alignment, (...)

Ethics of Artificial Intelligence, Misc in Philosophy of Cognitive Science

Philosophy of Artificial Intelligence in Philosophy of Cognitive Science

Export citation

Bookmark

72 citations

35

An explanation space to align user studies with the technical development of Explainable AI.Garrick Cabour, Andrés Morales-Forero, Élise Ledoux & Samuel Bassetto - 2023 - AI and Society 38 (2):869-887.

Providing meaningful and actionable explanations for end-users is a situated problem requiring the intersection of multiple disciplines to address social, operational, and technical challenges. However, the explainable artificial intelligence community has not commonly adopted or created tangible design tools that allow interdisciplinary work to develop reliable AI-powered solutions. This paper proposes a formative architecture that defines the explanation space from a user-inspired perspective. The architecture comprises five intertwined components to outline explanation requirements for a task: (1) the end-users’ mental models, (...)

Artificial Intelligence Safety in Philosophy of Cognitive Science

Export citation

Bookmark

1 citation

308

‘Interpretability’ and ‘Alignment’ are Fool’s Errands: A Proof that Controlling Misaligned Large Language Models is the Best Anyone Can Hope For.Marcus Arvan - forthcoming - AI and Society.

This paper uses famous problems from philosophy of science and philosophical psychology—underdetermination of theory by evidence, Nelson Goodman’s new riddle of induction, theory-ladenness of observation, and “Kripkenstein’s” rule-following paradox—to show that it is empirically impossible to reliably interpret which functions a large language model (LLM) AI has learned, and thus, that reliably aligning LLM behavior with human values is provably impossible. Sections 2 and 3 show that because of how complex LLMs are, researchers must interpret their learned functions largely in (...)

Artificial Intelligence and the Law in Philosophy of Cognitive Science

Explainability in Artificial Intelligence in Philosophy of Cognitive Science

Generative Artificial Intelligence in Philosophy of Cognitive Science

Impact of Artificial Intelligence, Misc in Philosophy of Cognitive Science

Interpretability in Artificial Intelligence in Philosophy of Cognitive Science

Large Language Models in Philosophy of Cognitive Science

Machine Learning, Misc in Philosophy of Cognitive Science

Underdetermination of Theory by Data, Misc in General Philosophy of Science

Export citation

Bookmark

639

Saliva Ontology: An ontology-based framework for a Salivaomics Knowledge Base.Jiye Ai, Barry Smith & David Wong - 2010 - BMC Bioinformatics 11 (1):302.

The Salivaomics Knowledge Base (SKB) is designed to serve as a computational infrastructure that can permit global exploration and utilization of data and information relevant to salivaomics. SKB is created by aligning (1) the saliva biomarker discovery and validation resources at UCLA with (2) the ontology resources developed by the OBO (Open Biomedical Ontologies) Foundry, including a new Saliva Ontology (SALO). We define the Saliva Ontology (SALO; http://www.skb.ucla.edu/SALO/) as a consensus-based controlled vocabulary of terms and relations dedicated to the salivaomics (...)

Medicine in Professional Areas

Ontology in Metaphysics

Export citation

Bookmark

4 citations

4

Multi-Value Alignment for Ml/Ai Development Choices.Hetvi Jethwani & Anna C. F. Lewis - 2025 - American Philosophical Quarterly 62 (2):133-152.

We outline a four-step process for ML/AI developers to align development choices with multiple values, by adapting a widely-utilized framework from bioethics: (1) identify the values that matter, (2) specify identified values, (3) find solution spaces that allow for maximal alignment with identified values, and 4) make hard choices if there are unresolvable trade-offs between the identified values. Key to this approach is identifying resolvable trade-offs between values (Step 3). We survey ML/AI methods that could be used to this (...)

No categories

Justice in Social and Political Philosophy

Export citation

Bookmark

23

Toleration and Justice in the Laozi: Engaging with Tao Jiang's Origins of Moral-Political Philosophy in Early China.Ai Yuan - 2023 - Philosophy East and West 73 (2):466-475.

In lieu of an abstract, here is a brief excerpt of the content:Toleration and Justice in the Laozi:Engaging with Tao Jiang's Origins of Moral-Political Philosophy in Early ChinaAi Yuan (bio)IntroductionThis review article engages with Tao Jiang's ground-breaking monograph on the Origins of Moral-Political Philosophy in Early China with particular focus on the articulation of toleration and justice in the Laozi (otherwise called the Daodejing).1 Jiang discusses a naturalistic turn and the re-alignment of values in the Laozi, resulting in a (...) naturalization of justice (impartiality) by rejecting artificial humaneness and rigid hierarchical moral-political structures. In this Laozian just world, there is no room for human intervention and rigid top-down enforcement of values, thus leaving justice to naturalized Heaven or Dao as the ultimate source of the cosmos.Based on Jiang's interpretative context, I show that there is a twofold justification for the "paradoxical" and "elusive"2 value of political toleration.3 A negative expression of toleration focuses on non-interference with choices, and non-enforcement of values toward those whom one reasonably disagrees with and regards as morally wrong. Such an attitude of toleration results in practical advantages such as avoiding entrenchment in bloodshed, abandonment, and deprivation.A positive expression of toleration operating in the Laozian ideal world requires equal protection of people as capable knowers, including those to whom we object. Such protection includes their ways of expression and an unbiased recognition of them as equally capable knowers so that they are heard without bias and prejudice. Such a kind of toleration is revealed as a constituent component of justice since it enables equal contribution toward the shaping of society without discrimination. In other words, toleration is seen as naturalized justice in a Laozian world within which no one is "being wronged with the capacity as a knower"—a kind of epistemic justice articulated by Miranda Fricker (Fricker 2007).4 [End Page 466]This article discusses how Jiang's political reconstruction of Laozian philosophy contains the seeds for a discussion of toleration, a value associated with distributive justice and against epistemic injustice. First, it introduces Jiang's arguments on a naturalization of justice and impartiality as Heavenly attributes in the Laozi. Second, it articulates the value of tolerance in the Laozi. Finally, I compare Laozian toleration with Confucian toleration using youwei- wuwei metaethical criticism.Moral-Political Philosophy in the LaoziJiang's interpretative framework about the origins of distributive justice in the Laozi starts with a discussion of the cosmogonic-mystical worldviews that signal "a new understanding of the nature of the cosmos, as well as a broad reorientation in the Heaven-human relationship in the mid-Warring States period" (p. 192). Differing from the interests in the origins of human culture or civilization in the Ru and Mo traditions, Jiang agrees with Franklin Perkins and identifies a "cosmogonic turn" in the late fourth century b.c. that reveals a demotion of anthropomorphic Heaven. By paring Heaven with earth and accordingly decentering human beings in the intellectual discourse, the Laozi directs equal values to the ten thousand things, with humans as just one among them (Perkins 2016, quoted in Jiang 2021, p. 196).5 Laozi's philosophy thus stands out by rejecting Heaven as caring for human affairs while taking Dao as the ultimate origin of the cosmos, as claimed in chapter 5 of the received text: "Heaven and earth are not humane; they treat ten-thousand things in the world as straw dogs; the sage is not humane; he treats people as straw dogs" (p. 199).With Jiang's reading of a naturalistic heaven, we see a re-alignment between cosmos and the human since sage-rulers follow the Heaven-Dao. The Laozian Dao is "self-so-ing" (ziran 自然), "accommodating" (rong 容), and "impartial" (gong 公), and thus marks a sharp contrast to Confucian familial bias resulting in "actively intervening in human affairs on behalf of the (certain) humans by appointing them to carry out its mission" (p. 200). With the sage emulating the ultimate Dao, the Laozi replaces the Confucian anthropocentric Heaven, which focuses on partiality (qin 親) and humaneness (ren 仁).This Laozian understanding of the cosmos also grounds its philosophical departure from Mohism despite both having a political... (shrink)

Asian Philosophy

Philosophical Traditions, Misc in Philosophical Traditions, Miscellaneous

Export citation

Bookmark

63

Honor Ethics: The Challenge of Globalizing Value Alignment in AI.Stephen Tze-Inn Wu, Dan Demetriou & Rudwan Ali Husain - 2023 - 2023 Acm Conference on Fairness, Accountability, and Transparency (Facct '23), June 12-15, 2023.

Some researchers have recognized that privileged communities dominate the discourse on AI Ethics, and other voices need to be heard. As such, we identify the current ethics milieu as arising from WEIRD (Western, Educated, Industrialized, Rich, Democratic) contexts, and aim to expand the discussion to non-WEIRD global communities, who are also stakeholders in global sociotechnical systems. We argue that accounting for honor, along with its values and related concepts, would better approximate a global ethical perspective. This complex concept already underlies (...)

Technology Ethics, Misc in Applied Ethics

Ethics of Artificial Intelligence, Misc in Philosophy of Cognitive Science

Export citation

Bookmark

74

Human-aligned artificial intelligence is a multiobjective problem.Peter Vamplew, Richard Dazeley, Cameron Foale, Sally Firmin & Jane Mummery - 2018 - Ethics and Information Technology 20 (1):27-40.

As the capabilities of artificial intelligence systems improve, it becomes important to constrain their actions to ensure their behaviour remains beneficial to humanity. A variety of ethical, legal and safety-based frameworks have been proposed as a basis for designing these constraints. Despite their variations, these frameworks share the common characteristic that decision-making must consider multiple potentially conflicting factors. We demonstrate that these alignment frameworks can be represented as utility functions, but that the widely used Maximum Expected Utility paradigm provides (...)

Artificial Intelligence Safety in Philosophy of Cognitive Science

Export citation

Bookmark

11 citations

151

Current cases of AI misalignment and their implications for future risks.Leonard Dung - 2023 - Synthese 202 (5):1-23.

How can one build AI systems such that they pursue the goals their designers want them to pursue? This is the alignment problem. Numerous authors have raised concerns that, as research advances and systems become more powerful over time, misalignment might lead to catastrophic outcomes, perhaps even to the extinction or permanent disempowerment of humanity. In this paper, I analyze the severity of this risk based on current instances of misalignment. More specifically, I argue that contemporary large language models (...)

Existential Risk in Philosophy of Action

Moral Motivation in Meta-Ethics

Export citation

Bookmark

7 citations

71

Value Alignment for Advanced Artificial Judicial Intelligence.Christoph Winter, Nicholas Hollman & David Manheim - 2023 - American Philosophical Quarterly 60 (2):187-203.

This paper considers challenges resulting from the use of advanced artificial judicial intelligence (AAJI). We argue that these challenges should be considered through the lens of value alignment. Instead of discussing why specific goals and values, such as fairness and nondiscrimination, ought to be implemented, we consider the question of how AAJI can be aligned with goals and values more generally, in order to be reliably integrated into legal and judicial systems. This value alignment framing draws on AI (...)

No categories

Philosophy of Artificial Intelligence in Philosophy of Cognitive Science

Export citation

Bookmark

2 citations

15

A Justifiable Investment in AI for Healthcare: Aligning Ambition with Reality.Kassandra Karpathakis, Jessica Morley & Luciano Floridi - 2024 - Minds and Machines 34 (4):1-40.

Healthcare systems are grappling with critical challenges, including chronic diseases in aging populations, unprecedented health care staffing shortages and turnover, scarce resources, unprecedented demands and wait times, escalating healthcare expenditure, and declining health outcomes. As a result, policymakers and healthcare executives are investing in artificial intelligence (AI) solutions to increase operational efficiency, lower health care costs, and improve patient care. However, current level of investment in developing healthcare AI among members of the global digital health partnership does not seem to (...)

Augustine in Medieval and Renaissance Philosophy

Export citation

Bookmark

346

Variable Value Alignment by Design; averting risks with robot religion.Jeffrey White - forthcoming - Embodied Intelligence 2023.

Abstract: One approach to alignment with human values in AI and robotics is to engineer artiTicial systems isomorphic with human beings. The idea is that robots so designed may autonomously align with human values through similar developmental processes, to realize project ideal conditions through iterative interaction with social and object environments just as humans do, such as are expressed in narratives and life stories. One persistent problem with human value orientation is that different human beings champion different values as (...)

Developmental Psychology in Philosophy of Cognitive Science

Embodiment and Situated Cognition in Philosophy of Cognitive Science

Philosophy of AI, Misc in Philosophy of Cognitive Science

Robotics in Philosophy of Cognitive Science

Value Theory, Misc in Value Theory, Miscellaneous

Philosophy of Artificial Intelligence in Philosophy of Cognitive Science

Export citation

Bookmark

110

Value alignment, human enhancement, and moral revolutions.Ariela Tubert & Justin Tiehen - forthcoming - Inquiry: An Interdisciplinary Journal of Philosophy.

Human beings are internally inconsistent in various ways. One way to develop this thought involves using the language of value alignment: the values we hold are not always aligned with our behavior, and are not always aligned with each other. Because of this self-misalignment, there is room for potential projects of human enhancement that involve achieving a greater degree of value alignment than we presently have. Relatedly, discussions of AI ethics sometimes focus on what is known as the (...)

Philosophy of Artificial Intelligence in Philosophy of Cognitive Science

Export citation

Bookmark

34

Automation, Alignment, and the Cooperative Interface.Julian David Jonker - 2024 - The Journal of Ethics 28 (3):483-504.

The paper demonstrates that social alignment is distinct from value alignment as it is currently understood in the AI safety literature, and argues that social alignment is an important research agenda. Work provides an important example for the argument, since work is a cooperative endeavor, and it is part of the larger manifold of social cooperation. These cooperative aspects of work are individually and socially valuable, and so they must be given a central place when evaluating the (...)

Value Theory

Direct download (4 more)

Export citation

Bookmark

138

“Desired behaviors”: alignment and the emergence of a machine learning ethics.Katia Schwerzmann & Alexander Campolo - forthcoming - AI and Society:1-14.

The concept of alignment has undergone a remarkable rise in recent years to take center stage in the ethics of artificial intelligence. There are now numerous philosophical studies of the values that should be used in this ethical framework as well as a technical literature operationalizing these values in machine learning models. This article takes a step back to address a more basic set of critical questions: Where has the ethical imperative of alignment come from? What is the (...)

Ethics of Artificial Intelligence in Philosophy of Cognitive Science

Export citation

Bookmark

83

Explicability as an AI Principle: Technology and Ethics in Cooperation.Moto Kamiura - forthcoming - Proceedings of the 39Th Annual Conference of the Japanese Society for Artificial Intelligence, 2025.

This paper categorizes current approaches to AI ethics into four perspectives and briefly summarizes them: (1) Case studies and technical trend surveys, (2) AI governance, (3) Technologies for AI alignment, (4) Philosophy. In the second half, we focus on the fourth perspective, the philosophical approach, within the context of applied ethics. In particular, the explicability of AI may be an area in which scientists, engineers, and AI developers are expected to engage more actively relative to other ethical issues in (...)

Applied Ethics

From Confucius to Coding and Avicenna to Algorithms: Cultivating Ethical AI Development through Cross-Cultural Ancient Wisdom.Ammar Younas & Yi Zeng - manuscript

Export citation

Bookmark

841

This paper explores the potential of integrating ancient educational principles from diverse eastern cultures into modern AI ethics curricula. It draws on the rich educational traditions of ancient China, India, Arabia, Persia, Japan, Tibet, Mongolia, and Korea, highlighting their emphasis on philosophy, ethics, holistic development, and critical thinking. By examining these historical educational systems, the paper establishes a correlation with modern AI ethics principles, advocating for the inclusion of these ancient teachings in current AI development and education. The proposed integration (...)

Chinese Philosophy in Asian Philosophy

Teaching Philosophy

Export citation

Bookmark

393

(1 other version)An Enactive Approach to Value Alignment in Artificial Intelligence: A Matter of Relevance.Michael Cannon - 2021 - In Vincent C. Müller, Philosophy and Theory of AI. Springer Cham. pp. 119-135.

The “Value Alignment Problem” is the challenge of how to align the values of artificial intelligence with human values, whatever they may be, such that AI does not pose a risk to the existence of humans. Existing approaches appear to conceive of the problem as "how do we ensure that AI solves the problem in the right way", in order to avoid the possibility of AI turning humans into paperclips in order to “make more paperclips” or eradicating the human (...)

No categories

Deep Learning in Philosophy of Cognitive Science

Export citation

Bookmark

234

The linguistic dead zone of value-aligned agency, natural and artificial.Travis LaCroix - 2024 - Philosophical Studies:1-23.

The value alignment problem for artificial intelligence (AI) asks how we can ensure that the “values”—i.e., objective functions—of artificial systems are aligned with the values of humanity. In this paper, I argue that linguistic communication is a necessary condition for robust value alignment. I discuss the consequences that the truth of this claim would have for research programmes that attempt to ensure value alignment for AI systems—or, more loftily, those programmes that seek to design robustly beneficial or (...)

Ethics of Artificial Intelligence, Misc in Philosophy of Cognitive Science

Evolution of Language in Philosophy of Language

Moral Normativity in Meta-Ethics

Normativity in Value Theory, Miscellaneous

Philosophy of Linguistics, Miscellaneous in Philosophy of Language

Value Theory, Misc in Value Theory, Miscellaneous

Direct download (4 more)

Export citation

Bookmark

51

Instilling moral value alignment by means of multi-objective reinforcement learning.Juan Antonio Rodriguez-Aguilar, Maite Lopez-Sanchez, Marc Serramia & Manel Rodriguez-Soto - 2022 - Ethics and Information Technology 24 (1).

AI research is being challenged with ensuring that autonomous agents learn to behave ethically, namely in alignment with moral values. Here, we propose a novel way of tackling the value alignment problem as a two-step process. The first step consists on formalising moral values and value aligned behaviour based on philosophical foundations. Our formalisation is compatible with the framework of (Multi-Objective) Reinforcement Learning, to ease the handling of an agent’s individual and ethical objectives. The second step consists in (...)

No categories

Artificial Intelligence Safety in Philosophy of Cognitive Science

Export citation

Bookmark

406

Robustness to Fundamental Uncertainty in AGI Alignment.G. G. Worley Iii - 2020 - Journal of Consciousness Studies 27 (1-2):225-241.

The AGI alignment problem has a bimodal distribution of outcomes with most outcomes clustering around the poles of total success and existential, catastrophic failure. Consequently, attempts to solve AGI alignment should, all else equal, prefer false negatives (ignoring research programs that would have been successful) to false positives (pursuing research programs that will unexpectedly fail). Thus, we propose adopting a policy of responding to points of philosophical and practical uncertainty associated with the alignment problem by limiting and (...)

Philosophy of Mind, Misc in Philosophy of Mind

Thought and Thinking in Philosophy of Mind

Artificial Intelligence Safety in Philosophy of Cognitive Science

Export citation

Bookmark

98

Challenges of Aligning Artificial Intelligence with Human Values.Margit Sutrop - 2020 - Acta Baltica Historiae Et Philosophiae Scientiarum 8 (2):54-72.

As artificial intelligence systems are becoming increasingly autonomous and will soon be able to make decisions on their own about what to do, AI researchers have started to talk about the need to align AI with human values. The AI ‘value alignment problem’ faces two kinds of challenges—a technical and a normative one—which are interrelated. The technical challenge deals with the question of how to encode human values in artificial intelligence. The normative challenge is associated with two questions: “Which (...)

Ethics of Artificial Intelligence, Misc in Philosophy of Cognitive Science

Philosophy of Artificial Intelligence in Philosophy of Cognitive Science

Export citation

Bookmark

5 citations

39

A comment on the pursuit to align AI: we do not need value-aligned AI, we need AI that is risk-averse.Rebecca Raper - forthcoming - AI and Society:1-3.

Philosophy of Artificial Intelligence in Philosophy of Cognitive Science

Export citation

Bookmark

3 citations

25

The problem of alignment.Tsvetelina Hristova, Liam Magee & Karen Soldatic - forthcoming - AI and Society:1-15.

Large language models (LLMs) produce sequences learned as statistical patterns from large corpora. Their emergent status as representatives of the advances in artificial intelligence (AI) have led to an increased attention to the possibilities of regulating the automated production of linguistic utterances and interactions with human users in a process that computer scientists refer to as ‘alignment’—a series of technological and political mechanisms to impose a normative model of morality on algorithms and networks behind the model. Alignment, which (...)

Export citation

Bookmark

29

Instilling moral value alignment by means of multi-objective reinforcement learning.M. Rodriguez-Soto, M. Serramia, M. Lopez-Sanchez & J. Antonio Rodriguez-Aguilar - 2022 - Ethics and Information Technology 24 (9).

AI research is being challenged with ensuring that autonomous agents learn to behave ethically, namely in alignment with moral values. Here, we propose a novel way of tackling the value alignment problem as a two-step process. The first step consists on formalising moral values and value aligned behaviour based on philosophical foundations. Our formalisation is compatible with the framework of (Multi-Objective) Reinforcement Learning, to ease the handling of an agent’s individual and ethical objectives. The second step consists in (...)

Computer Ethics in Applied Ethics

Export citation

Bookmark

26

Aligning Anatomy Ontologies in the Ontology Alignment Evaluation Initiative.Patrick Lambrix, Qiang Liu & He Tan - forthcoming - The Swedish Ai Society Workshop May 27-28, 2009 Ida, Linköping University.

No categories

Existential Risk in Philosophy of Action

Export citation

Bookmark

1400

AI Survival Stories: a Taxonomic Analysis of AI Existential Risk.Herman Cappelen, Simon Goldstein & John Hawthorne - forthcoming - Philosophy of Ai.

Since the release of ChatGPT, there has been a lot of debate about whether AI systems pose an existential risk to humanity. This paper develops a general framework for thinking about the existential risk of AI systems. We analyze a two-premise argument that AI systems pose a threat to humanity. Premise one: AI systems will become extremely powerful. Premise two: if AI systems become extremely powerful, they will destroy humanity. We use these two premises to construct a taxonomy of ‘survival (...)

Impact of Artificial Intelligence, Misc in Philosophy of Cognitive Science

Philosophy of AI, Misc in Philosophy of Cognitive Science

Philosophy of Artificial Intelligence in Philosophy of Cognitive Science

Export citation

Bookmark

1 citation

16

The state as a model for AI control and alignment.Micha Elsner - forthcoming - AI and Society:1-11.

Debates about the development of artificial superintelligence and its potential threats to humanity tend to assume that such a system would be historically unprecedented, and that its behavior must be predicted from first principles. I argue that this is not true: we can analyze multiagent intelligent systems (the best candidates for practical superintelligence) by comparing them to states, which also unite heterogeneous intelligences to achieve superhuman goals. States provide a model for several problems discussed in the literature on superintelligence, such (...)

Export citation

Bookmark

20

Knowledge-augmented face perception: Prospects for the Bayesian brain-framework to align AI and human vision.Martin Maier, Florian Blume, Pia Bideau, Olaf Hellwich & Rasha Abdel Rahman - 2022 - Consciousness and Cognition 101:103301.

Cognitive Sciences

Export citation

Bookmark

2 citations

19

Minangkabaunese matrilineal: The correlation between the Qur’an and gender.Halimatussa’Diyah Halimatussa’Diyah, Kusnadi Kusnadi, Ai Y. Yuliyanti, Deddy Ilyas & Eko Zulfikar - 2024 - HTS Theological Studies 80 (1):7.

Upon previous research, the matrilineal system seems to oppose Islamic teaching. However, the matrilineal system practiced by the Minangkabau society in West Sumatra, Indonesia has its uniqueness. Thus, this study aims to examine the correlation between the Qur’an and gender roles within the context of Minangkabau customs, specifically focusing on the matrilineal aspect. The present study employs qualitative methods for conducting library research through critical analysis. This study discovered that the matrilineal system practiced by the Minangkabau society aligns with Qur’anic (...)

Arts and Humanities