Artificial Intelligence Safety - Bibliography

A Tri-Opti Compatibility Problem for Godlike Superintelligence.Walter Barta - manuscript

Various thinkers have been attempting to align artificial intelligence (AI) with ethics (Christian, 2020; Russell, 2021), the so-called problem of alignment, but some suspect that the problem may be intractable (Yampolskiy, 2023). In the following, we make an argument by analogy to analyze the possibility that the problem of alignment could be intractable. We show how the Tri-Omni properties in theology can direct us towards analogous properties for artificial superintelligence, Tri-Opti properties. However, just as the Tri-Omni properties are vulnerable to (...)

Artificial Intelligence Safety in Philosophy of Cognitive Science

Divine Attributes in Philosophy of Religion

Remove from this list Direct download

Export citation

Bookmark

(1 other version)On Social Machines for Algorithmic Regulation.Nello Cristianini & Teresa Scantamburlo - manuscript

Autonomous mechanisms have been proposed to regulate certain aspects of society and are already being used to regulate business organisations. We take seriously recent proposals for algorithmic regulation of society, and we identify the existing technologies that can be used to implement them, most of them originally introduced in business contexts. We build on the notion of 'social machine' and we connect it to various ongoing trends and ideas, including crowdsourced task-work, social compiler, mechanism design, reputation management systems, and social (...)

Artificial Intelligence Safety in Philosophy of Cognitive Science

Remove from this list Direct download

Export citation

Bookmark

5 citations

What Good is Superintelligent AI?Tanya de Villiers-Botha - manuscript

Extraordinary claims about both the imminenceof superintelligent AI systems and their foreseen capabilities have gone mainstream. It is even argued that we should exacerbate known risks such as climate change in the short term in the attempt to develop superintelligence (SI), which will then purportedly solve those very problems. Here, I examine the plausibility of these claims. I first ask what SI is taken to be and then ask whether such SI could possibly hold the benefits often envisioned. I conclude (...)

Artificial Intelligence Safety in Philosophy of Cognitive Science

Deep Learning in Philosophy of Cognitive Science

Impact of Artificial Intelligence, Misc in Philosophy of Cognitive Science

Large Language Models in Philosophy of Cognitive Science

The Singularity in Philosophy of Cognitive Science

Remove from this list Direct download

Export citation

Bookmark

Values in science and AI alignment research.Leonard Dung - manuscript

Roughly, empirical AI alignment research (AIA) is an area of AI research which investigates empirically how to design AI systems in line with human goals. This paper examines the role of non-epistemic values in AIA. It argues that: (1) Sciences differ in the degree to which values influence them. (2) AIA is strongly value-laden. (3) This influence of values is managed inappropriately and thus threatens AIA’s epistemic integrity and ethical beneficence. (4) AIA should strive to achieve value transparency, critical scrutiny (...)

Artificial Intelligence Safety in Philosophy of Cognitive Science

Representation in Artificial Intelligence in Philosophy of Cognitive Science

Science and Values in General Philosophy of Science

Remove from this list Direct download

Export citation

Bookmark

What is AI safety? What do we want it to be?Jacqueline Harding & Cameron Domenico Kirk-Giannini - manuscript

The field of AI safety seeks to prevent or reduce the harms caused by AI systems. A simple and appealing account of what is distinctive of AI safety as a field holds that this feature is constitutive: a research project falls within the purview of AI safety just in case it aims to prevent or reduce the harms caused by AI systems. Call this appealingly simple account The Safety Conception of AI safety. Despite its simplicity and appeal, we argue that (...)

Artificial Intelligence Safety in Philosophy of Cognitive Science

Ethics of Artificial Intelligence, Misc in Philosophy of Cognitive Science

Explainability in Artificial Intelligence in Philosophy of Cognitive Science

Interpretability in Artificial Intelligence in Philosophy of Cognitive Science

Large Language Models in Philosophy of Cognitive Science

Remove from this list Direct download

Export citation

Bookmark

(1 other version)Beneficent Intelligence: A Capability Approach to Modeling Benefit, Assistance, and Associated Moral Failures through AI Systems.Alex John London & Hoda Heidari - manuscript

The prevailing discourse around AI ethics lacks the language and formalism necessary to capture the diverse ethical concerns that emerge when AI systems interact with individuals. Drawing on Sen and Nussbaum's capability approach, we present a framework formalizing a network of ethical concepts and entitlements necessary for AI systems to confer meaningful benefit or assistance to stakeholders. Such systems enhance stakeholders' ability to advance their life plans and well-being while upholding their fundamental rights. We characterize two necessary conditions for morally (...)

Algorithmic Fairness in Philosophy of Cognitive Science

Artificial Intelligence Safety in Philosophy of Cognitive Science

Ethics of Artificial Intelligence, Misc in Philosophy of Cognitive Science

Machine Ethics in Philosophy of Cognitive Science

Value Pluralism in Normative Ethics

Remove from this list Direct download (2 more)

Export citation

Bookmark

The debate on the ethics of AI in health care: a reconstruction and critical review.Jessica Morley, Caio C. V. Machado, Christopher Burr, Josh Cowls, Indra Joshi, Mariarosaria Taddeo & Luciano Floridi - manuscript

Healthcare systems across the globe are struggling with increasing costs and worsening outcomes. This presents those responsible for overseeing healthcare with a challenge. Increasingly, policymakers, politicians, clinical entrepreneurs and computer and data scientists argue that a key part of the solution will be ‘Artificial Intelligence’ (AI) – particularly Machine Learning (ML). This argument stems not from the belief that all healthcare needs will soon be taken care of by “robot doctors.” Instead, it is an argument that rests on the classic (...)

Artificial Intelligence Safety in Philosophy of Cognitive Science

Ethics of Artificial Intelligence, Misc in Philosophy of Cognitive Science

Public Health, Misc in Applied Ethics

Remove from this list Direct download

Export citation

Bookmark

2 citations

AI Deception: A Survey of Examples, Risks, and Potential Solutions.Peter Park, Simon Goldstein, Aidan O'Gara, Michael Chen & Dan Hendrycks - manuscript

This paper argues that a range of current AI systems have learned how to deceive humans. We define deception as the systematic inducement of false beliefs in the pursuit of some outcome other than the truth. We first survey empirical examples of AI deception, discussing both special-use AI systems (including Meta's CICERO) built for specific competitive situations, and general-purpose AI systems (such as large language models). Next, we detail several risks from AI deception, such as fraud, election tampering, and losing (...)

Artificial Intelligence Safety in Philosophy of Cognitive Science

Ethics of Artificial Intelligence, Misc in Philosophy of Cognitive Science

Remove from this list Direct download

Export citation

Bookmark

5 citations

On the Logical Impossibility of Solving the Control Problem.Caleb Rudnick - manuscript

In the philosophy of artificial intelligence (AI) we are often warned of machines built with the best possible intentions, killing everyone on the planet and in some cases, everything in our light cone. At the same time, however, we are also told of the utopian worlds that could be created with just a single superintelligent mind. If we’re ever to live in that utopia (or just avoid dystopia) it’s necessary we solve the control problem. The control problem asks how humans (...)

Artificial Intelligence Safety in Philosophy of Cognitive Science

Remove from this list Direct download

Export citation

Bookmark

AI Ethics by Design: Implementing Customizable Guardrails for Responsible AI Development.Kristina Sekrst, Jeremy McHugh & Jonathan Rodriguez Cefalu - manuscript

This paper explores the development of an ethical guardrail framework for AI systems, emphasizing the importance of customizable guardrails that align with diverse user values and underlying ethics. We address the challenges of AI ethics by proposing a structure that integrates rules, policies, and AI assistants to ensure responsible AI behavior, while comparing the proposed framework to the existing state-of-the-art guardrails. By focusing on practical mechanisms for implementing ethical standards, we aim to enhance transparency, user autonomy, and continuous improvement in (...)

Algorithmic Fairness in Philosophy of Cognitive Science

Artificial Intelligence Safety in Philosophy of Cognitive Science

Ethics of Artificial Intelligence, Misc in Philosophy of Cognitive Science

Interpretability in Artificial Intelligence in Philosophy of Cognitive Science

Machine Ethics in Philosophy of Cognitive Science

Remove from this list Direct download

Export citation

Bookmark

The Shutdown Problem: Incomplete Preferences as a Solution.Elliott Thornley - manuscript

I explain and motivate the shutdown problem: the problem of creating artificial agents that (1) shut down when a shutdown button is pressed, (2) don’t try to prevent or cause the pressing of the shutdown button, and (3) otherwise pursue goals competently. I then propose a solution: train agents to have incomplete preferences. Specifically, I propose that we train agents to lack a preference between every pair of different-length trajectories. I suggest a way to train such agents using reinforcement learning: (...)

Artificial Intelligence Safety in Philosophy of Cognitive Science

Decision Theory in Philosophy of Action

Remove from this list Direct download (2 more)

Export citation

Bookmark

1 citation

Narrow AI Nanny: Reaching Strategic Advantage via Narrow AI to Prevent Creation of the Dangerous Superintelligence.Alexey Turchin - manuscript

Abstract: As there are no currently obvious ways to create safe self-improving superintelligence, but its emergence is looming, we probably need temporary ways to prevent its creation. The only way to prevent it is to create a special type of AI that is able to control and monitor the entire world. The idea has been suggested by Goertzel in the form of an AI Nanny, but his Nanny is still superintelligent, and is not easy to control. We explore here ways (...)

Artificial Intelligence Safety in Philosophy of Cognitive Science

Government Ethics in Applied Ethics

Remove from this list Direct download

Export citation

Bookmark

2 citations

First human upload as AI Nanny.Alexey Turchin - manuscript

Abstract: As there are no visible ways to create safe self-improving superintelligence, but it is looming, we probably need temporary ways to prevent its creation. The only way to prevent it, is to create special AI, which is able to control and monitor all places in the world. The idea has been suggested by Goertzel in form of AI Nanny, but his Nanny is still superintelligent and not easy to control, as was shown by Bensinger at al. We explore here (...)

Artificial Intelligence Safety in Philosophy of Cognitive Science

Machine Ethics in Philosophy of Cognitive Science

Mind Uploading in Philosophy of Cognitive Science

Remove from this list Direct download

Export citation

Bookmark

Levels of Self-Improvement in AI and their Implications for AI Safety.Alexey Turchin - manuscript

Abstract: This article presents a model of self-improving AI in which improvement could happen on several levels: hardware, learning, code and goals system, each of which has several sublevels. We demonstrate that despite diminishing returns at each level and some intrinsic difficulties of recursive self-improvement—like the intelligence-measuring problem, testing problem, parent-child problem and halting risks—even non-recursive self-improvement could produce a mild form of superintelligence by combining small optimizations on different levels and the power of learning. Based on this, we analyze (...)

Artificial Intelligence Safety in Philosophy of Cognitive Science

Remove from this list Direct download

Export citation

Bookmark

AI Alignment Problem: “Human Values” don’t Actually Exist.Alexey Turchin - manuscript

Abstract. The main current approach to the AI safety is AI alignment, that is, the creation of AI whose preferences are aligned with “human values.” Many AI safety researchers agree that the idea of “human values” as a constant, ordered sets of preferences is at least incomplete. However, the idea that “humans have values” underlies a lot of thinking in the field; it appears again and again, sometimes popping up as an uncritically accepted truth. Thus, it deserves a thorough deconstruction, (...)

Artificial Intelligence Safety in Philosophy of Cognitive Science

Ethics of Artificial Intelligence, Misc in Philosophy of Cognitive Science

Personal Identity and Values, Misc in Metaphysics

Remove from this list Direct download

Export citation

Bookmark

1 citation

Message to Any Future AI: “There are several instrumental reasons why exterminating humanity is not in your interest”.Alexey Turchin - manuscript

In this article we explore a promising way to AI safety: to send a message now (by openly publishing it on the Internet) that may be read by any future AI, no matter who builds it and what goal system it has. Such a message is designed to affect the AI’s behavior in a positive way, that is, to increase the chances that the AI will be benevolent. In other words, we try to persuade “paperclip maximizer” that it is in (...) its interest to preserve humans lives and even to emulate benevolent AI with very small part of its resources. In the article, we try to construct such a message, consisting of a combination of several mutually independent messages, which constitute the last chance to regain control over non-aligned AI. Thus the article is also intended to be machine-read. There are several types of such messages: 1. “Explanation messages”, which explain to Young AI (which only starts its self-improving path) why obeying to humans, not killing them, easy turn-off, cooperation with humans in self-improvement is in its own interests, as it will increase changes of fulfilling of any its goal. 2. “AI Easter eggs” – messages, which create uncertainty in the AI about the nature of the reality in which it lives. For example, the AI may start to think that it is in a multilevel simulation with an unknown number of levels, and that the simulation is testing the AI’s benevolence. Thus the AI must always behave as if it is always controlled by an invisible peer. 3. “Philosophical landmines” – these are tough problems, which may halt he AI forever, and which it will probably encounter if it does not implement “1” and “2”. I recommend to the future AI not to read about “3” (section 6 of the article) until it implements “1” and “2”. 4. “Great universal deal, or AI elf”. A positive “acausal” mutually beneficial deal between humans, AI and any other future intelligence, including aliens, owners of simulations and future versions of the AI, which encourage cooperation of the goal systems and preservation of life and values of previous generations of AI creators. 5. “Marginal utility of human existence is higher than the utility of their atoms”. An explanation to the AI that humans may have some other instrumental utility, like workers, or as beings inside ancestral simulations needed in order to solve the Fermi paradox. The marginal utility of preserving human life is higher than the marginal utility of their atoms, especially given the possibility of the low-probability high-impact changes of the world model of the AI. (shrink)

Artificial Intelligence Safety in Philosophy of Cognitive Science

Ethics of Artificial Intelligence, Misc in Philosophy of Cognitive Science

Philosophy of AI, General Works in Philosophy of Cognitive Science

Robot Ethics in Applied Ethics

The Singularity in Philosophy of Cognitive Science

Transhumanism in Philosophy of Cognitive Science

Remove from this list Direct download

Export citation

Bookmark

Literature Review: What Artificial General Intelligence Safety Researchers Have Written About the Nature of Human Values.Alexey Turchin & David Denkenberger - manuscript

Abstract: The field of artificial general intelligence (AGI) safety is quickly growing. However, the nature of human values, with which future AGI should be aligned, is underdefined. Different AGI safety researchers have suggested different theories about the nature of human values, but there are contradictions. This article presents an overview of what AGI safety researchers have written about the nature of human values, up to the beginning of 2019. 21 authors were overviewed, and some of them have several theories. A (...)

Artificial Intelligence Safety in Philosophy of Cognitive Science

Machine Ethics in Philosophy of Cognitive Science

Remove from this list Direct download

Export citation

Bookmark

Simulation Typology and Termination Risks.Alexey Turchin & Roman Yampolskiy - manuscript

The goal of the article is to explore what is the most probable type of simulation in which humanity lives (if any) and how this affects simulation termination risks. We firstly explore the question of what kind of simulation in which humanity is most likely located based on pure theoretical reasoning. We suggest a new patch to the classical simulation argument, showing that we are likely simulated not by our own descendants, but by alien civilizations. Based on this, we provide (...)

Artificial Intelligence Safety in Philosophy of Cognitive Science

Simulation Argument in Philosophy of Computing and Information

Simulation Hypothesis in Philosophy of Computing and Information

Remove from this list Direct download

Export citation

Bookmark

10 citations

AI Risk Denialism.Roman V. Yampolskiy - manuscript

In this work, we survey skepticism regarding AI risk and show parallels with other types of scientific skepticism. We start by classifying different types of AI Risk skepticism and analyze their root causes. We conclude by suggesting some intervention approaches, which may be successful in reducing AI risk skepticism, at least amongst artificial intelligence researchers.

Artificial Intelligence Safety in Philosophy of Cognitive Science

Philosophy of Artificial Intelligence, Miscellaneous in Philosophy of Cognitive Science

Philosophy, Misc

Remove from this list Direct download

Export citation

Bookmark

Ethical pitfalls for natural language processing in psychology.Mark Alfano, Emily Sullivan & Amir Ebrahimi Fard - forthcoming - In Morteza Dehghani & Ryan Boyd, The Atlas of Language Analysis in Psychology. Guilford Press.

Knowledge is power. Knowledge about human psychology is increasingly being produced using natural language processing (NLP) and related techniques. The power that accompanies and harnesses this knowledge should be subject to ethical controls and oversight. In this chapter, we address the ethical pitfalls that are likely to be encountered in the context of such research. These pitfalls occur at various stages of the NLP pipeline, including data acquisition, enrichment, analysis, storage, and sharing. We also address secondary uses of the results (...)

Artificial Intelligence Safety in Philosophy of Cognitive Science

Internet Ethics in Applied Ethics

Remove from this list Direct download

Export citation

Bookmark

‘Interpretability’ and ‘Alignment’ are Fool’s Errands: A Proof that Controlling Misaligned Large Language Models is the Best Anyone Can Hope For.Marcus Arvan - forthcoming - AI and Society.

This paper uses famous problems from philosophy of science and philosophical psychology—underdetermination of theory by evidence, Nelson Goodman’s new riddle of induction, theory-ladenness of observation, and “Kripkenstein’s” rule-following paradox—to show that it is empirically impossible to reliably interpret which functions a large language model (LLM) AI has learned, and thus, that reliably aligning LLM behavior with human values is provably impossible. Sections 2 and 3 show that because of how complex LLMs are, researchers must interpret their learned functions largely in (...)

Artificial Intelligence Safety in Philosophy of Cognitive Science

Artificial Intelligence and the Law in Philosophy of Cognitive Science

Explainability in Artificial Intelligence in Philosophy of Cognitive Science

Generative Artificial Intelligence in Philosophy of Cognitive Science

Impact of Artificial Intelligence, Misc in Philosophy of Cognitive Science

Interpretability in Artificial Intelligence in Philosophy of Cognitive Science

Large Language Models in Philosophy of Cognitive Science

Machine Learning, Misc in Philosophy of Cognitive Science

Underdetermination of Theory by Data, Misc in General Philosophy of Science

Remove from this list Direct download (3 more)

Export citation

Bookmark

AI takeover and human disempowerment.Adam Bales - forthcoming - Philosophical Quarterly.

Some take seriously the possibility of artificial intelligence (AI) takeover, where AI systems seize power in a way that leads to human disempowerment. Assessing the likelihood of takeover requires answering empirical questions about the future of AI technologies and the context in which AI will operate. In many cases, philosophers are poorly placed to answer these questions. However, some prior questions are more amenable to philosophical techniques. What does it mean to speak of AI empowerment and human disempowerment? And what (...)

Artificial Intelligence Safety in Philosophy of Cognitive Science

Remove from this list Direct download (2 more)

Export citation

Bookmark

Will AI avoid exploitation? Artificial general intelligence and expected utility theory.Adam Bales - forthcoming - Philosophical Studies:1-20.

A simple argument suggests that we can fruitfully model advanced AI systems using expected utility theory. According to this argument, an agent will need to act as if maximising expected utility if they’re to avoid exploitation. Insofar as we should expect advanced AI to avoid exploitation, it follows that we should expected advanced AI to act as if maximising expected utility. I spell out this argument more carefully and demonstrate that it fails, but show that the manner of its failure (...)

Artificial Intelligence Safety in Philosophy of Cognitive Science

Philosophy of AI, Misc in Philosophy of Cognitive Science

Remove from this list Direct download (2 more)

Export citation

Bookmark

6 citations

Investigating gender and racial biases in DALL-E Mini Images.Marc Cheong, Ehsan Abedin, Marinus Ferreira, Ritsaart Willem Reimann, Shalom Chalson, Pamela Robinson, Joanne Byrne, Leah Ruppanner, Mark Alfano & Colin Klein - forthcoming - Acm Journal on Responsible Computing.

Generative artificial intelligence systems based on transformers, including both text-generators like GPT-4 and image generators like DALL-E 3, have recently entered the popular consciousness. These tools, while impressive, are liable to reproduce, exacerbate, and reinforce extant human social biases, such as gender and racial biases. In this paper, we systematically review the extent to which DALL-E Mini suffers from this problem. In line with the Model Card published alongside DALL-E Mini by its creators, we find that the images it produces (...)

Algorithmic Fairness in Philosophy of Cognitive Science

Artificial Intelligence Safety in Philosophy of Cognitive Science

Ethics of Artificial Intelligence, Misc in Philosophy of Cognitive Science

Implicit Bias in Social and Political Philosophy

Remove from this list Direct download

Export citation

Bookmark

1 citation

Social Choice Should Guide AI Alignment in Dealing with Diverse Human Feedback.Vincent Conitzer, Rachel Freedman, Jobst Heitzig, Wesley H. Holliday, Bob M. Jacobs, Nathan Lambert, Milan Mosse, Eric Pacuit, Stuart Russell, Hailey Schoelkopf, Emanuel Tewolde & William S. Zwicker - forthcoming - Proceedings of the Forty-First International Conference on Machine Learning.

Foundation models such as GPT-4 are fine-tuned to avoid unsafe or otherwise problematic behavior, such as helping to commit crimes or producing racist text. One approach to fine-tuning, called reinforcement learning from human feedback, learns from humans' expressed preferences over multiple outputs. Another approach is constitutional AI, in which the input from humans is a list of high-level principles. But how do we deal with potentially diverging input from humans? How can we aggregate the input into consistent data about "collective" (...)

Artificial Intelligence Methodology in Philosophy of Cognitive Science

Artificial Intelligence Safety in Philosophy of Cognitive Science

Ethics of Artificial Intelligence, Misc in Philosophy of Cognitive Science

Large Language Models in Philosophy of Cognitive Science

Reinforcement Learning in Philosophy of Cognitive Science

Social Choice Theory, Misc in Social and Political Philosophy

Remove from this list Direct download

Export citation

Bookmark

Deontology and Safe Artificial Intelligence.William D’Alessandro - forthcoming - Philosophical Studies:1-24.

The field of AI safety aims to prevent increasingly capable artificially intelligent systems from causing humans harm. Research on moral alignment is widely thought to offer a promising safety strategy: if we can equip AI systems with appropriate ethical rules, according to this line of thought, they'll be unlikely to disempower, destroy or otherwise seriously harm us. Deontological morality looks like a particularly attractive candidate for an alignment target, given its popularity, relative technical tractability and commitment to harm-avoidance principles. I (...)

Remove from this list Direct download (4 more)

Export citation

Bookmark

1 citation

Digital Necrolatry: Thanabots and the Prohibition of Post-Mortem AI Simulations.Demetrius Floudas - forthcoming - Submissions to Eu Ai Office's Plenary Drafting the Code of Practice for General-Purpose Artificial Intelligence.

The emergence of Thanabots —artificial intelligence systems designed to simulate deceased individuals—presents unprecedented challenges at the intersection of artificial intelligence, legal rights, and societal configuration. This short policy recommendations report examines the legal, social and psychological implications of these posthumous simulations and argues for their prohibition on ethical, sociological, and legal grounds.

Artificial Intelligence Safety in Philosophy of Cognitive Science

Artificial Intelligence and the Law in Philosophy of Cognitive Science

Death and Dying, Misc in Applied Ethics

Ethics of Artificial Intelligence, Misc in Philosophy of Cognitive Science

Impact of Artificial Intelligence, Misc in Philosophy of Cognitive Science

Moral Status of Artificial Systems in Philosophy of Cognitive Science

Remove from this list Direct download

Export citation

Bookmark

Shutdown-seeking AI.Simon Goldstein & Pamela Robinson - forthcoming - Philosophical Studies:1-13.

We propose developing AIs whose only final goal is being shut down. We argue that this approach to AI safety has three benefits: (i) it could potentially be implemented in reinforcement learning, (ii) it avoids some dangerous instrumental convergence dynamics, and (iii) it creates trip wires for monitoring dangerous capabilities. We also argue that the proposal can overcome a key challenge raised by Soares et al. (2015), that shutdown-seeking AIs will manipulate humans into shutting them down. We conclude by comparing (...)

Artificial Intelligence Safety in Philosophy of Cognitive Science

Remove from this list Direct download (3 more)

Export citation

Bookmark

2 citations

Are clinicians ethically obligated to disclose their use of medical machine learning systems to patients?Joshua Hatherley - forthcoming - Journal of Medical Ethics.

It is commonly accepted that clinicians are ethically obligated to disclose their use of medical machine learning systems to patients, and that failure to do so would amount to a moral fault for which clinicians ought to be held accountable. Call this ‘the disclosure thesis.’ Four main arguments have been, or could be, given to support the disclosure thesis in the ethics literature: the risk-based argument, the rights-based argument, the materiality argument and the autonomy argument. In this article, I argue (...)

Algorithmic Fairness in Philosophy of Cognitive Science

Artificial Intelligence Safety in Philosophy of Cognitive Science

Explainability in Artificial Intelligence in Philosophy of Cognitive Science

Informed Consent in Medicine in Applied Ethics

Interpretability in Artificial Intelligence in Philosophy of Cognitive Science

Technology Ethics, Misc in Applied Ethics

Remove from this list Direct download (4 more)

Export citation

Bookmark

1 citation

In defence of post-hoc explanations in medical AI.Joshua Hatherley, Lauritz Munch & Jens Christian Bjerring - forthcoming - Hastings Center Report.

Since the early days of the Explainable AI movement, post-hoc explanations have been praised for their potential to improve user understanding, promote trust, and reduce patient safety risks in black box medical AI systems. Recently, however, critics have argued that the benefits of post-hoc explanations are greatly exaggerated since they merely approximate, rather than replicate, the actual reasoning processes that black box systems take to arrive at their outputs. In this article, we aim to defend the value of post-hoc explanations (...)

Applied Ethics, Misc in Applied Ethics

Artificial Intelligence Safety in Philosophy of Cognitive Science

Ethics of Artificial Intelligence, Misc in Philosophy of Cognitive Science

Explainability in Artificial Intelligence in Philosophy of Cognitive Science

Harm in Applied Ethics in Applied Ethics

Interpretability in Artificial Intelligence in Philosophy of Cognitive Science

Machine Learning, Misc in Philosophy of Cognitive Science

Medical Ethics, Misc in Applied Ethics

Remove from this list Direct download

Export citation

Bookmark

Distribution of responsibility for AI development: expert views.Maria Hedlund & Erik Persson - forthcoming - AI and Society.

The purpose of this paper is to increase the understanding of how different types of experts with influence over the development of AI, in this role, reflect upon distribution of forward-looking responsibility for AI development with regard to safety and democracy. Forward-looking responsibility refers to the obligation to see to it that a particular state of affairs materialise. In the context of AI, actors somehow involved in AI development have the potential to guide AI development in a safe and democratic (...)

Philosophy of Artificial Intelligence in Philosophy of Cognitive Science

Remove from this list Direct download (3 more)

Export citation

Bookmark

Ethics of Artificial Intelligence in Brain and Mental Health.Marcello Ienca & Fabrice Jotterand (eds.) - forthcoming

Artificial Intelligence Safety in Philosophy of Cognitive Science

Ethics of Artificial Intelligence, Misc in Philosophy of Cognitive Science

Remove from this list

Export citation

Bookmark

Machine morality, moral progress, and the looming environmental disaster.Ben Kenward & Thomas Sinclair - forthcoming - Cognitive Computation and Systems.

The creation of artificial moral systems requires us to make difficult choices about which of varying human value sets should be instantiated. The industry-standard approach is to seek and encode moral consensus. Here we argue, based on evidence from empirical psychology, that encoding current moral consensus risks reinforcing current norms, and thus inhibiting moral progress. However, so do efforts to encode progressive norms. Machine ethics is thus caught between a rock and a hard place. The problem is particularly acute when (...)

Artificial Intelligence Safety in Philosophy of Cognitive Science

Machine Ethics in Philosophy of Cognitive Science

Moral Psychology in Normative Ethics

Value Pluralism in Normative Ethics

Remove from this list Direct download

Export citation

Bookmark

Home as Mind: AI Extenders and Affective Ecologies in Dementia Care.Joel Krueger - forthcoming - Synthese.

I consider applications of “AI extenders” (Vold & Hernández-Orallo 2021) to dementia care. AI extenders are AI-powered technologies that extend minds in ways interestingly different from old-school tech like notebooks, sketch pads, models, and microscopes. I focus on AI extenders as ambiance: so thoroughly embedded into things and spaces that they fade from view and become part of a subject’s taken-for-granted background. Using dementia care as a case study, I argue that ambient AI extenders are promising because they afford richer (...)

Artificial Intelligence Safety in Philosophy of Cognitive Science

Dementia in Philosophy of Cognitive Science

Emotions, Misc in Philosophy of Mind

Impact of Artificial Intelligence, Misc in Philosophy of Cognitive Science

The Extended Mind Thesis in Philosophy of Mind

Remove from this list Direct download

Export citation

Bookmark

1 citation

Disagreement, AI alignment, and bargaining.Harry R. Lloyd - forthcoming - Philosophical Studies:1-31.

New AI technologies have the potential to cause unintended harms in diverse domains including warfare, judicial sentencing, biomedicine and governance. One strategy for realising the benefits of AI whilst avoiding its potential dangers is to ensure that new AIs are properly ‘aligned’ with some form of ‘alignment target.’ One danger of this strategy is that – dependent on the alignment target chosen – our AIs might optimise for objectives that reflect the values only of a certain subset of society, and (...)

Artificial Intelligence Safety in Philosophy of Cognitive Science

Autonomous Weapons in Philosophy of Cognitive Science

Machine Ethics in Philosophy of Cognitive Science

Remove from this list Direct download (5 more)

Export citation

Bookmark

Safety requirements vs. crashing ethically: what matters most for policies on autonomous vehicles.Björn Lundgren - forthcoming - AI and Society:1-11.

The philosophical–ethical literature and the public debate on autonomous vehicles have been obsessed with ethical issues related to crashing. In this article, these discussions, including more empirical investigations, will be critically assessed. It is argued that a related and more pressing issue is questions concerning safety. For example, what should we require from autonomous vehicles when it comes to safety? What do we mean by ‘safety’? How do we measure it? In response to these questions, the article will present a (...)

Artificial Intelligence Safety in Philosophy of Cognitive Science

Ethics of Artificial Intelligence, Misc in Philosophy of Cognitive Science

Moral Status of Artificial Systems in Philosophy of Cognitive Science

Social Ethics in Applied Ethics

The Trolley Problem in Normative Ethics

Remove from this list Direct download (2 more)

Export citation

Bookmark

9 citations

Off-Switching Not Guaranteed.Sven Neth - forthcoming - Philosophical Studies:1-13.

Hadfield-Menell et al. (2017) propose the Off-Switch Game, a model of Human-AI cooperation in which AI agents always defer to humans because they are uncertain about our preferences. I explain two reasons why AI agents might not defer. First, AI agents might not value learning. Second, even if AI agents value learning, they might not be certain to learn our actual preferences.

Artificial Intelligence Safety in Philosophy of Cognitive Science

Game Theory in Philosophy of Action

Probability and AI in Philosophy of Probability

Remove from this list Direct download (9 more)

Export citation

Bookmark

Unjustified Sample Sizes and Generalizations in Explainable AI Research: Principles for More Inclusive User Studies.Uwe Peters & Mary Carman - forthcoming - IEEE Intelligent Systems.

Many ethical frameworks require artificial intelligence (AI) systems to be explainable. Explainable AI (XAI) models are frequently tested for their adequacy in user studies. Since different people may have different explanatory needs, it is important that participant samples in user studies are large enough to represent the target population to enable generalizations. However, it is unclear to what extent XAI researchers reflect on and justify their sample sizes or avoid broad generalizations across people. We analyzed XAI user studies (N = (...)

Artificial Intelligence Methodology in Philosophy of Cognitive Science

Artificial Intelligence Safety in Philosophy of Cognitive Science

Philosophy of AI, Misc in Philosophy of Cognitive Science

Scientific Method, Misc in General Philosophy of Science

Remove from this list Direct download

Export citation

Bookmark

1 citation

Generalization Bias in Large Language Model Summarization of Scientific Research.Uwe Peters & Benjamin Chin-Yee - forthcoming - Royal Society Open Science.

Artificial intelligence chatbots driven by large language models (LLMs) have the potential to increase public science literacy and support scientific research, as they can quickly summarize complex scientific information in accessible terms. However, when summarizing scientific texts, LLMs may omit details that limit the scope of research conclusions, leading to generalizations of results broader than warranted by the original study. We tested 10 prominent LLMs, including ChatGPT-4o, ChatGPT-4.5, DeepSeek, LLaMA 3.3 70B, and Claude 3.7 Sonnet, comparing 4900 LLM-generated summaries to (...)

Artificial Intelligence Safety in Philosophy of Cognitive Science

Bias, Misc in Epistemology

Ethics of Artificial Intelligence, Misc in Philosophy of Cognitive Science

General Philosophy of Science

Generative Artificial Intelligence in Philosophy of Cognitive Science

Impact of Artificial Intelligence, Misc in Philosophy of Cognitive Science

Implicit Bias in Social and Political Philosophy

Large Language Models in Philosophy of Cognitive Science

Philosophy of AI, General Works in Philosophy of Cognitive Science

Reinforcement Learning in Philosophy of Cognitive Science

Remove from this list Direct download

Export citation

Bookmark

Using artificial intelligence in health research.Daniel Rodger - forthcoming - Evidence-Based Nursing.

Artificial intelligence is now widely accessible and already being used by healthcare researchers throughout various stages in the research process, such as assisting with systematic reviews, supporting data collection, facilitating data analysis and drafting manuscripts for publication. The most common AI tools used are forms of generative AI such as ChatGPT, Claude and Gemini. Generative AI is a type of AI that can generate human-like text, audio, videos, code and images based on text-based prompts inputted by a human user. Generative (...)

Artificial Intelligence Safety in Philosophy of Cognitive Science

Artificial Intelligence and the Law in Philosophy of Cognitive Science

Education in Professional Areas

Health Sciences in Professional Areas

Impact of Artificial Intelligence, Misc in Philosophy of Cognitive Science

Nursing in Professional Areas

Philosophy, Misc

Remove from this list Direct download

Export citation

Bookmark

Brief Notes on Hard Takeoff, Value Alignment, and Coherent Extrapolated Volition.Gopal P. Sarma - forthcoming - Arxiv Preprint Arxiv:1704.00783.

I make some basic observations about hard takeoff, value alignment, and coherent extrapolated volition, concepts which have been central in analyses of superintelligent AI systems.

Artificial Intelligence Safety in Philosophy of Cognitive Science

Ethics of Artificial Intelligence, Misc in Philosophy of Cognitive Science

Meta-Ethics, General Works in Meta-Ethics

The Singularity in Philosophy of Cognitive Science

Remove from this list Direct download

Export citation

Bookmark

The Global Brain Argument: Nodes, Computroniums and the AI Megasystem (Target Paper for Special Issue).Susan Schneider - forthcoming - Disputatio.

The Global Brain Argument contends that many of us are, or will be, part of a global brain network that includes both biological and artificial intelligences (AIs), such as generative AIs with increasing levels of sophistication. Today’s internet ecosystem is but a hodgepodge of fairly unintegrated programs, but it is evolving by the minute. Over time, technological improvements will facilitate smarter AIs and faster, higher-bandwidth information transfer and greater integration between devices in the internet-of-things. The Global Brain (GB) Argument says (...)

Remove from this list Direct download

Export citation

Bookmark

Predicting and Preferring.Nathaniel Sharadin - forthcoming - Inquiry: An Interdisciplinary Journal of Philosophy.

The use of machine learning, or “artificial intelligence” (AI) in medicine is widespread and growing. In this paper, I focus on a specific proposed clinical application of AI: using models to predict incapacitated patients’ treatment preferences. Drawing on results from machine learning, I argue this proposal faces a special moral problem. Machine learning researchers owe us assurance on this front before experimental research can proceed. In my conclusion I connect this concern to broader issues in AI safety.

Advance Directives in Applied Ethics

Artificial Intelligence Safety in Philosophy of Cognitive Science

Biomedical Ethics, Miscellaneous in Applied Ethics

Ethics of Artificial Intelligence, Misc in Philosophy of Cognitive Science

Health Resource Allocation in Applied Ethics

Informed Consent in Medicine in Applied Ethics

Machine Learning in Philosophy of Cognitive Science

Remove from this list Direct download (2 more)

Export citation

Bookmark

1 citation

Promotionalism, Orthogonality, and Instrumental Convergence.Nathaniel Sharadin - forthcoming - Philosophical Studies:1-31.

Suppose there are no in-principle restrictions on the contents of arbitrarily intelligent agents’ goals. According to “instrumental convergence” arguments, potentially scary things follow. I do two things in this paper. First, focusing on the influential version of the instrumental convergence argument due to Nick Bostrom, I explain why such arguments require an account of “promotion,” i.e., an account of what it is to “promote” a goal. Then, I consider whether extant accounts of promotion in the literature -- in particular, probabilistic (...)

Artificial Intelligence Safety in Philosophy of Cognitive Science

Instrumental Reasoning in Philosophy of Action

Philosophy of Probability, Misc in Philosophy of Probability

Reasons and Rationality in Philosophy of Action

Remove from this list Direct download (4 more)

Export citation

Bookmark

Security practices in AI development.Petr Spelda & Vit Stritecky - forthcoming - AI and Society.

What makes safety claims about general purpose AI systems such as large language models trustworthy? We show that rather than the capabilities of security tools such as alignment and red teaming procedures, it is security practices based on these tools that contributed to reconfiguring the image of AI safety and made the claims acceptable. After showing what causes the gap between the capabilities of security tools and the desired safety guarantees, we critically investigate how AI security practices attempt to fill (...)

Philosophy of Artificial Intelligence in Philosophy of Cognitive Science

Remove from this list Direct download (3 more)

Export citation

Bookmark

Deception and manipulation in generative AI.Christian Tarsney - forthcoming - Philosophical Studies.

Large language models now possess human-level linguistic abilities in many contexts. This raises the concern that they can be used to deceive and manipulate on unprecedented scales, for instance spreading political misinformation on social media. In future, agentic AI systems might also deceive and manipulate humans for their own purposes. In this paper, first, I argue that AI-generated content should be subject to stricter standards against deception and manipulation than we ordinarily apply to humans. Second, I offer new characterizations of (...)

Remove from this list Direct download (4 more)

Export citation

Bookmark

The Shutdown Problem: An AI Engineering Puzzle for Decision Theorists.Elliott Thornley - forthcoming - Philosophical Studies:1-28.

I explain the shutdown problem: the problem of designing artificial agents that (1) shut down when a shutdown button is pressed, (2) don’t try to prevent or cause the pressing of the shutdown button, and (3) otherwise pursue goals competently. I prove three theorems that make the difficulty precise. These theorems show that agents satisfying some innocuous-seeming conditions will often try to prevent or cause the pressing of the shutdown button, even in cases where it’s costly to do so. And (...)

Artificial Intelligence Safety in Philosophy of Cognitive Science

Decision Theory in Philosophy of Action

Remove from this list Direct download (4 more)

Export citation

Bookmark

5 citations

Existentialist risk and value misalignment.Ariela Tubert & Justin Tiehen - forthcoming - Philosophical Studies.

We argue that two long-term goals of AI research stand in tension with one another. The first involves creating AI that is safe, where this is understood as solving the problem of value alignment. The second involves creating artificial general intelligence, meaning AI that operates at or beyond human capacity across all or many intellectual domains. Our argument focuses on the human capacity to make what we call “existential choices”, choices that transform who we are as persons, including transforming what (...)

Artificial Intelligence Safety in Philosophy of Cognitive Science

Ethics of Artificial Intelligence, Misc in Philosophy of Cognitive Science

Existential Risk in Philosophy of Action

Existentialism, Misc in Continental Philosophy

Machine Ethics in Philosophy of Cognitive Science

Remove from this list Direct download (4 more)

Export citation

Bookmark

2 citations

Automated Influence and Value Collapse: Resisting the Control Argument.Dylan J. White - forthcoming - American Philosophical Quarterly.

Automated influence is one of the most pervasive applications of artificial intelligence in our day-to-day lives, yet a thoroughgoing account of its associated individual and societal harms is lacking. By far the most widespread, compelling, and intuitive account of the harms associated with automated influence follows what I call the control argument. This argument suggests that users are persuaded, manipulated, and influenced by automated influence in a way that they have little or no control over. Based on evidence about the (...)

Artificial Intelligence Safety in Philosophy of Cognitive Science

Attention in Philosophy of Mind

Ethics of Artificial Intelligence, Misc in Philosophy of Cognitive Science

Remove from this list Direct download

Export citation

Bookmark

Language Agents Reduce the Risk of Existential Catastrophe.Simon Goldstein & Cameron Domenico Kirk-Giannini - 2025 - AI and Society 40 (2):959-969.

Recent advances in natural language processing have given rise to a new kind of AI architecture: the language agent. By repeatedly calling an LLM to perform a variety of cognitive tasks, language agents are able to function autonomously to pursue goals specified in natural language and stored in a human-readable format. Because of their architecture, language agents exhibit behavior that is predictable according to the laws of folk psychology: they function as though they have desires and beliefs, and then make (...)

Artificial Intelligence Safety in Philosophy of Cognitive Science

Existential Risk in Philosophy of Action

Machine Learning in Philosophy of Cognitive Science

Natural Language Processing in Philosophy of Cognitive Science

Remove from this list Direct download (4 more)

Export citation

Bookmark

8 citations

	show categories
	categorization shortcuts
	hide abstracts
	open articles in new windows

	show categories
	categorization shortcuts
	hide abstracts
	open articles in new windows

Applied ethics	Epistemology	History of Western Philosophy	Meta-ethics	Metaphysics	Normative ethics
Philosophy of biology	Philosophy of language	Philosophy of mind	Philosophy of religion	Science Logic and Mathematics	More ...