Related

Contents
300 found
Order:
1 — 50 / 300
  1. A Tri-Opti Compatibility Problem for Godlike Superintelligence.Walter Barta - manuscript
    Various thinkers have been attempting to align artificial intelligence (AI) with ethics (Christian, 2020; Russell, 2021), the so-called problem of alignment, but some suspect that the problem may be intractable (Yampolskiy, 2023). In the following, we make an argument by analogy to analyze the possibility that the problem of alignment could be intractable. We show how the Tri-Omni properties in theology can direct us towards analogous properties for artificial superintelligence, Tri-Opti properties. However, just as the Tri-Omni properties are vulnerable to (...)
    Remove from this list   Direct download  
     
    Export citation  
     
    Bookmark  
  2. (1 other version)On Social Machines for Algorithmic Regulation.Nello Cristianini & Teresa Scantamburlo - manuscript
    Autonomous mechanisms have been proposed to regulate certain aspects of society and are already being used to regulate business organisations. We take seriously recent proposals for algorithmic regulation of society, and we identify the existing technologies that can be used to implement them, most of them originally introduced in business contexts. We build on the notion of 'social machine' and we connect it to various ongoing trends and ideas, including crowdsourced task-work, social compiler, mechanism design, reputation management systems, and social (...)
    Remove from this list   Direct download  
     
    Export citation  
     
    Bookmark   5 citations  
  3. What Good is Superintelligent AI?Tanya de Villiers-Botha - manuscript
    Extraordinary claims about both the imminenceof superintelligent AI systems and their foreseen capabilities have gone mainstream. It is even argued that we should exacerbate known risks such as climate change in the short term in the attempt to develop superintelligence (SI), which will then purportedly solve those very problems. Here, I examine the plausibility of these claims. I first ask what SI is taken to be and then ask whether such SI could possibly hold the benefits often envisioned. I conclude (...)
    Remove from this list   Direct download  
     
    Export citation  
     
    Bookmark  
  4. Values in science and AI alignment research.Leonard Dung - manuscript
    Roughly, empirical AI alignment research (AIA) is an area of AI research which investigates empirically how to design AI systems in line with human goals. This paper examines the role of non-epistemic values in AIA. It argues that: (1) Sciences differ in the degree to which values influence them. (2) AIA is strongly value-laden. (3) This influence of values is managed inappropriately and thus threatens AIA’s epistemic integrity and ethical beneficence. (4) AIA should strive to achieve value transparency, critical scrutiny (...)
    Remove from this list   Direct download  
     
    Export citation  
     
    Bookmark  
  5. What is AI safety? What do we want it to be?Jacqueline Harding & Cameron Domenico Kirk-Giannini - manuscript
    The field of AI safety seeks to prevent or reduce the harms caused by AI systems. A simple and appealing account of what is distinctive of AI safety as a field holds that this feature is constitutive: a research project falls within the purview of AI safety just in case it aims to prevent or reduce the harms caused by AI systems. Call this appealingly simple account The Safety Conception of AI safety. Despite its simplicity and appeal, we argue that (...)
    Remove from this list   Direct download  
     
    Export citation  
     
    Bookmark  
  6. (1 other version)Beneficent Intelligence: A Capability Approach to Modeling Benefit, Assistance, and Associated Moral Failures through AI Systems.Alex John London & Hoda Heidari - manuscript
    The prevailing discourse around AI ethics lacks the language and formalism necessary to capture the diverse ethical concerns that emerge when AI systems interact with individuals. Drawing on Sen and Nussbaum's capability approach, we present a framework formalizing a network of ethical concepts and entitlements necessary for AI systems to confer meaningful benefit or assistance to stakeholders. Such systems enhance stakeholders' ability to advance their life plans and well-being while upholding their fundamental rights. We characterize two necessary conditions for morally (...)
    Remove from this list   Direct download (2 more)  
     
    Export citation  
     
    Bookmark  
  7. The debate on the ethics of AI in health care: a reconstruction and critical review.Jessica Morley, Caio C. V. Machado, Christopher Burr, Josh Cowls, Indra Joshi, Mariarosaria Taddeo & Luciano Floridi - manuscript
    Healthcare systems across the globe are struggling with increasing costs and worsening outcomes. This presents those responsible for overseeing healthcare with a challenge. Increasingly, policymakers, politicians, clinical entrepreneurs and computer and data scientists argue that a key part of the solution will be ‘Artificial Intelligence’ (AI) – particularly Machine Learning (ML). This argument stems not from the belief that all healthcare needs will soon be taken care of by “robot doctors.” Instead, it is an argument that rests on the classic (...)
    Remove from this list   Direct download  
     
    Export citation  
     
    Bookmark   2 citations  
  8. AI Deception: A Survey of Examples, Risks, and Potential Solutions.Peter Park, Simon Goldstein, Aidan O'Gara, Michael Chen & Dan Hendrycks - manuscript
    This paper argues that a range of current AI systems have learned how to deceive humans. We define deception as the systematic inducement of false beliefs in the pursuit of some outcome other than the truth. We first survey empirical examples of AI deception, discussing both special-use AI systems (including Meta's CICERO) built for specific competitive situations, and general-purpose AI systems (such as large language models). Next, we detail several risks from AI deception, such as fraud, election tampering, and losing (...)
    Remove from this list   Direct download  
     
    Export citation  
     
    Bookmark   5 citations  
  9. On the Logical Impossibility of Solving the Control Problem.Caleb Rudnick - manuscript
    In the philosophy of artificial intelligence (AI) we are often warned of machines built with the best possible intentions, killing everyone on the planet and in some cases, everything in our light cone. At the same time, however, we are also told of the utopian worlds that could be created with just a single superintelligent mind. If we’re ever to live in that utopia (or just avoid dystopia) it’s necessary we solve the control problem. The control problem asks how humans (...)
    Remove from this list   Direct download  
     
    Export citation  
     
    Bookmark  
  10. AI Ethics by Design: Implementing Customizable Guardrails for Responsible AI Development.Kristina Sekrst, Jeremy McHugh & Jonathan Rodriguez Cefalu - manuscript
    This paper explores the development of an ethical guardrail framework for AI systems, emphasizing the importance of customizable guardrails that align with diverse user values and underlying ethics. We address the challenges of AI ethics by proposing a structure that integrates rules, policies, and AI assistants to ensure responsible AI behavior, while comparing the proposed framework to the existing state-of-the-art guardrails. By focusing on practical mechanisms for implementing ethical standards, we aim to enhance transparency, user autonomy, and continuous improvement in (...)
    Remove from this list   Direct download  
     
    Export citation  
     
    Bookmark  
  11. The Shutdown Problem: Incomplete Preferences as a Solution.Elliott Thornley - manuscript
    I explain and motivate the shutdown problem: the problem of creating artificial agents that (1) shut down when a shutdown button is pressed, (2) don’t try to prevent or cause the pressing of the shutdown button, and (3) otherwise pursue goals competently. I then propose a solution: train agents to have incomplete preferences. Specifically, I propose that we train agents to lack a preference between every pair of different-length trajectories. I suggest a way to train such agents using reinforcement learning: (...)
    Remove from this list   Direct download (2 more)  
     
    Export citation  
     
    Bookmark   1 citation  
  12. Narrow AI Nanny: Reaching Strategic Advantage via Narrow AI to Prevent Creation of the Dangerous Superintelligence.Alexey Turchin - manuscript
    Abstract: As there are no currently obvious ways to create safe self-improving superintelligence, but its emergence is looming, we probably need temporary ways to prevent its creation. The only way to prevent it is to create a special type of AI that is able to control and monitor the entire world. The idea has been suggested by Goertzel in the form of an AI Nanny, but his Nanny is still superintelligent, and is not easy to control. We explore here ways (...)
    Remove from this list   Direct download  
     
    Export citation  
     
    Bookmark   2 citations  
  13. First human upload as AI Nanny.Alexey Turchin - manuscript
    Abstract: As there are no visible ways to create safe self-improving superintelligence, but it is looming, we probably need temporary ways to prevent its creation. The only way to prevent it, is to create special AI, which is able to control and monitor all places in the world. The idea has been suggested by Goertzel in form of AI Nanny, but his Nanny is still superintelligent and not easy to control, as was shown by Bensinger at al. We explore here (...)
    Remove from this list   Direct download  
     
    Export citation  
     
    Bookmark  
  14. Levels of Self-Improvement in AI and their Implications for AI Safety.Alexey Turchin - manuscript
    Abstract: This article presents a model of self-improving AI in which improvement could happen on several levels: hardware, learning, code and goals system, each of which has several sublevels. We demonstrate that despite diminishing returns at each level and some intrinsic difficulties of recursive self-improvement—like the intelligence-measuring problem, testing problem, parent-child problem and halting risks—even non-recursive self-improvement could produce a mild form of superintelligence by combining small optimizations on different levels and the power of learning. Based on this, we analyze (...)
    Remove from this list   Direct download  
     
    Export citation  
     
    Bookmark  
  15. AI Alignment Problem: “Human Values” don’t Actually Exist.Alexey Turchin - manuscript
    Abstract. The main current approach to the AI safety is AI alignment, that is, the creation of AI whose preferences are aligned with “human values.” Many AI safety researchers agree that the idea of “human values” as a constant, ordered sets of preferences is at least incomplete. However, the idea that “humans have values” underlies a lot of thinking in the field; it appears again and again, sometimes popping up as an uncritically accepted truth. Thus, it deserves a thorough deconstruction, (...)
    Remove from this list   Direct download  
     
    Export citation  
     
    Bookmark   1 citation  
  16. Message to Any Future AI: “There are several instrumental reasons why exterminating humanity is not in your interest”.Alexey Turchin - manuscript
    In this article we explore a promising way to AI safety: to send a message now (by openly publishing it on the Internet) that may be read by any future AI, no matter who builds it and what goal system it has. Such a message is designed to affect the AI’s behavior in a positive way, that is, to increase the chances that the AI will be benevolent. In other words, we try to persuade “paperclip maximizer” that it is in (...)
    Remove from this list   Direct download  
     
    Export citation  
     
    Bookmark  
  17. Literature Review: What Artificial General Intelligence Safety Researchers Have Written About the Nature of Human Values.Alexey Turchin & David Denkenberger - manuscript
    Abstract: The field of artificial general intelligence (AGI) safety is quickly growing. However, the nature of human values, with which future AGI should be aligned, is underdefined. Different AGI safety researchers have suggested different theories about the nature of human values, but there are contradictions. This article presents an overview of what AGI safety researchers have written about the nature of human values, up to the beginning of 2019. 21 authors were overviewed, and some of them have several theories. A (...)
    Remove from this list   Direct download  
     
    Export citation  
     
    Bookmark  
  18. Simulation Typology and Termination Risks.Alexey Turchin & Roman Yampolskiy - manuscript
    The goal of the article is to explore what is the most probable type of simulation in which humanity lives (if any) and how this affects simulation termination risks. We firstly explore the question of what kind of simulation in which humanity is most likely located based on pure theoretical reasoning. We suggest a new patch to the classical simulation argument, showing that we are likely simulated not by our own descendants, but by alien civilizations. Based on this, we provide (...)
    Remove from this list   Direct download  
     
    Export citation  
     
    Bookmark   10 citations  
  19. AI Risk Denialism.Roman V. Yampolskiy - manuscript
    In this work, we survey skepticism regarding AI risk and show parallels with other types of scientific skepticism. We start by classifying different types of AI Risk skepticism and analyze their root causes. We conclude by suggesting some intervention approaches, which may be successful in reducing AI risk skepticism, at least amongst artificial intelligence researchers.
    Remove from this list   Direct download  
     
    Export citation  
     
    Bookmark  
  20. Ethical pitfalls for natural language processing in psychology.Mark Alfano, Emily Sullivan & Amir Ebrahimi Fard - forthcoming - In Morteza Dehghani & Ryan Boyd, The Atlas of Language Analysis in Psychology. Guilford Press.
    Knowledge is power. Knowledge about human psychology is increasingly being produced using natural language processing (NLP) and related techniques. The power that accompanies and harnesses this knowledge should be subject to ethical controls and oversight. In this chapter, we address the ethical pitfalls that are likely to be encountered in the context of such research. These pitfalls occur at various stages of the NLP pipeline, including data acquisition, enrichment, analysis, storage, and sharing. We also address secondary uses of the results (...)
    Remove from this list   Direct download  
     
    Export citation  
     
    Bookmark  
  21. ‘Interpretability’ and ‘Alignment’ are Fool’s Errands: A Proof that Controlling Misaligned Large Language Models is the Best Anyone Can Hope For.Marcus Arvan - forthcoming - AI and Society.
    This paper uses famous problems from philosophy of science and philosophical psychology—underdetermination of theory by evidence, Nelson Goodman’s new riddle of induction, theory-ladenness of observation, and “Kripkenstein’s” rule-following paradox—to show that it is empirically impossible to reliably interpret which functions a large language model (LLM) AI has learned, and thus, that reliably aligning LLM behavior with human values is provably impossible. Sections 2 and 3 show that because of how complex LLMs are, researchers must interpret their learned functions largely in (...)
    Remove from this list   Direct download (3 more)  
     
    Export citation  
     
    Bookmark  
  22. AI takeover and human disempowerment.Adam Bales - forthcoming - Philosophical Quarterly.
    Some take seriously the possibility of artificial intelligence (AI) takeover, where AI systems seize power in a way that leads to human disempowerment. Assessing the likelihood of takeover requires answering empirical questions about the future of AI technologies and the context in which AI will operate. In many cases, philosophers are poorly placed to answer these questions. However, some prior questions are more amenable to philosophical techniques. What does it mean to speak of AI empowerment and human disempowerment? And what (...)
    Remove from this list   Direct download (2 more)  
     
    Export citation  
     
    Bookmark  
  23. Will AI avoid exploitation? Artificial general intelligence and expected utility theory.Adam Bales - forthcoming - Philosophical Studies:1-20.
    A simple argument suggests that we can fruitfully model advanced AI systems using expected utility theory. According to this argument, an agent will need to act as if maximising expected utility if they’re to avoid exploitation. Insofar as we should expect advanced AI to avoid exploitation, it follows that we should expected advanced AI to act as if maximising expected utility. I spell out this argument more carefully and demonstrate that it fails, but show that the manner of its failure (...)
    Remove from this list   Direct download (2 more)  
     
    Export citation  
     
    Bookmark   6 citations  
  24. Investigating gender and racial biases in DALL-E Mini Images.Marc Cheong, Ehsan Abedin, Marinus Ferreira, Ritsaart Willem Reimann, Shalom Chalson, Pamela Robinson, Joanne Byrne, Leah Ruppanner, Mark Alfano & Colin Klein - forthcoming - Acm Journal on Responsible Computing.
    Generative artificial intelligence systems based on transformers, including both text-generators like GPT-4 and image generators like DALL-E 3, have recently entered the popular consciousness. These tools, while impressive, are liable to reproduce, exacerbate, and reinforce extant human social biases, such as gender and racial biases. In this paper, we systematically review the extent to which DALL-E Mini suffers from this problem. In line with the Model Card published alongside DALL-E Mini by its creators, we find that the images it produces (...)
    Remove from this list   Direct download  
     
    Export citation  
     
    Bookmark   1 citation  
  25. Social Choice Should Guide AI Alignment in Dealing with Diverse Human Feedback.Vincent Conitzer, Rachel Freedman, Jobst Heitzig, Wesley H. Holliday, Bob M. Jacobs, Nathan Lambert, Milan Mosse, Eric Pacuit, Stuart Russell, Hailey Schoelkopf, Emanuel Tewolde & William S. Zwicker - forthcoming - Proceedings of the Forty-First International Conference on Machine Learning.
    Foundation models such as GPT-4 are fine-tuned to avoid unsafe or otherwise problematic behavior, such as helping to commit crimes or producing racist text. One approach to fine-tuning, called reinforcement learning from human feedback, learns from humans' expressed preferences over multiple outputs. Another approach is constitutional AI, in which the input from humans is a list of high-level principles. But how do we deal with potentially diverging input from humans? How can we aggregate the input into consistent data about "collective" (...)
    Remove from this list   Direct download  
     
    Export citation  
     
    Bookmark  
  26. Deontology and Safe Artificial Intelligence.William D’Alessandro - forthcoming - Philosophical Studies:1-24.
    The field of AI safety aims to prevent increasingly capable artificially intelligent systems from causing humans harm. Research on moral alignment is widely thought to offer a promising safety strategy: if we can equip AI systems with appropriate ethical rules, according to this line of thought, they'll be unlikely to disempower, destroy or otherwise seriously harm us. Deontological morality looks like a particularly attractive candidate for an alignment target, given its popularity, relative technical tractability and commitment to harm-avoidance principles. I (...)
    Remove from this list   Direct download (4 more)  
     
    Export citation  
     
    Bookmark   1 citation  
  27. Digital Necrolatry: Thanabots and the Prohibition of Post-Mortem AI Simulations.Demetrius Floudas - forthcoming - Submissions to Eu Ai Office's Plenary Drafting the Code of Practice for General-Purpose Artificial Intelligence.
    The emergence of Thanabots —artificial intelligence systems designed to simulate deceased individuals—presents unprecedented challenges at the intersection of artificial intelligence, legal rights, and societal configuration. This short policy recommendations report examines the legal, social and psychological implications of these posthumous simulations and argues for their prohibition on ethical, sociological, and legal grounds.
    Remove from this list   Direct download  
     
    Export citation  
     
    Bookmark  
  28. Shutdown-seeking AI.Simon Goldstein & Pamela Robinson - forthcoming - Philosophical Studies:1-13.
    We propose developing AIs whose only final goal is being shut down. We argue that this approach to AI safety has three benefits: (i) it could potentially be implemented in reinforcement learning, (ii) it avoids some dangerous instrumental convergence dynamics, and (iii) it creates trip wires for monitoring dangerous capabilities. We also argue that the proposal can overcome a key challenge raised by Soares et al. (2015), that shutdown-seeking AIs will manipulate humans into shutting them down. We conclude by comparing (...)
    Remove from this list   Direct download (3 more)  
     
    Export citation  
     
    Bookmark   2 citations  
  29. Are clinicians ethically obligated to disclose their use of medical machine learning systems to patients?Joshua Hatherley - forthcoming - Journal of Medical Ethics.
    It is commonly accepted that clinicians are ethically obligated to disclose their use of medical machine learning systems to patients, and that failure to do so would amount to a moral fault for which clinicians ought to be held accountable. Call this ‘the disclosure thesis.’ Four main arguments have been, or could be, given to support the disclosure thesis in the ethics literature: the risk-based argument, the rights-based argument, the materiality argument and the autonomy argument. In this article, I argue (...)
    Remove from this list   Direct download (4 more)  
     
    Export citation  
     
    Bookmark   1 citation  
  30. In defence of post-hoc explanations in medical AI.Joshua Hatherley, Lauritz Munch & Jens Christian Bjerring - forthcoming - Hastings Center Report.
    Since the early days of the Explainable AI movement, post-hoc explanations have been praised for their potential to improve user understanding, promote trust, and reduce patient safety risks in black box medical AI systems. Recently, however, critics have argued that the benefits of post-hoc explanations are greatly exaggerated since they merely approximate, rather than replicate, the actual reasoning processes that black box systems take to arrive at their outputs. In this article, we aim to defend the value of post-hoc explanations (...)
    Remove from this list   Direct download  
     
    Export citation  
     
    Bookmark  
  31. Distribution of responsibility for AI development: expert views.Maria Hedlund & Erik Persson - forthcoming - AI and Society.
    The purpose of this paper is to increase the understanding of how different types of experts with influence over the development of AI, in this role, reflect upon distribution of forward-looking responsibility for AI development with regard to safety and democracy. Forward-looking responsibility refers to the obligation to see to it that a particular state of affairs materialise. In the context of AI, actors somehow involved in AI development have the potential to guide AI development in a safe and democratic (...)
    Remove from this list   Direct download (3 more)  
     
    Export citation  
     
    Bookmark  
  32. Ethics of Artificial Intelligence in Brain and Mental Health.Marcello Ienca & Fabrice Jotterand (eds.) - forthcoming
  33. Machine morality, moral progress, and the looming environmental disaster.Ben Kenward & Thomas Sinclair - forthcoming - Cognitive Computation and Systems.
    The creation of artificial moral systems requires us to make difficult choices about which of varying human value sets should be instantiated. The industry-standard approach is to seek and encode moral consensus. Here we argue, based on evidence from empirical psychology, that encoding current moral consensus risks reinforcing current norms, and thus inhibiting moral progress. However, so do efforts to encode progressive norms. Machine ethics is thus caught between a rock and a hard place. The problem is particularly acute when (...)
    Remove from this list   Direct download  
     
    Export citation  
     
    Bookmark  
  34. Home as Mind: AI Extenders and Affective Ecologies in Dementia Care.Joel Krueger - forthcoming - Synthese.
    I consider applications of “AI extenders” (Vold & Hernández-Orallo 2021) to dementia care. AI extenders are AI-powered technologies that extend minds in ways interestingly different from old-school tech like notebooks, sketch pads, models, and microscopes. I focus on AI extenders as ambiance: so thoroughly embedded into things and spaces that they fade from view and become part of a subject’s taken-for-granted background. Using dementia care as a case study, I argue that ambient AI extenders are promising because they afford richer (...)
    Remove from this list   Direct download  
     
    Export citation  
     
    Bookmark   1 citation  
  35. Disagreement, AI alignment, and bargaining.Harry R. Lloyd - forthcoming - Philosophical Studies:1-31.
    New AI technologies have the potential to cause unintended harms in diverse domains including warfare, judicial sentencing, biomedicine and governance. One strategy for realising the benefits of AI whilst avoiding its potential dangers is to ensure that new AIs are properly ‘aligned’ with some form of ‘alignment target.’ One danger of this strategy is that – dependent on the alignment target chosen – our AIs might optimise for objectives that reflect the values only of a certain subset of society, and (...)
    Remove from this list   Direct download (5 more)  
     
    Export citation  
     
    Bookmark  
  36. Safety requirements vs. crashing ethically: what matters most for policies on autonomous vehicles.Björn Lundgren - forthcoming - AI and Society:1-11.
    The philosophical–ethical literature and the public debate on autonomous vehicles have been obsessed with ethical issues related to crashing. In this article, these discussions, including more empirical investigations, will be critically assessed. It is argued that a related and more pressing issue is questions concerning safety. For example, what should we require from autonomous vehicles when it comes to safety? What do we mean by ‘safety’? How do we measure it? In response to these questions, the article will present a (...)
    Remove from this list   Direct download (2 more)  
     
    Export citation  
     
    Bookmark   9 citations  
  37. Off-Switching Not Guaranteed.Sven Neth - forthcoming - Philosophical Studies:1-13.
    Hadfield-Menell et al. (2017) propose the Off-Switch Game, a model of Human-AI cooperation in which AI agents always defer to humans because they are uncertain about our preferences. I explain two reasons why AI agents might not defer. First, AI agents might not value learning. Second, even if AI agents value learning, they might not be certain to learn our actual preferences.
    Remove from this list   Direct download (9 more)  
     
    Export citation  
     
    Bookmark  
  38. Unjustified Sample Sizes and Generalizations in Explainable AI Research: Principles for More Inclusive User Studies.Uwe Peters & Mary Carman - forthcoming - IEEE Intelligent Systems.
    Many ethical frameworks require artificial intelligence (AI) systems to be explainable. Explainable AI (XAI) models are frequently tested for their adequacy in user studies. Since different people may have different explanatory needs, it is important that participant samples in user studies are large enough to represent the target population to enable generalizations. However, it is unclear to what extent XAI researchers reflect on and justify their sample sizes or avoid broad generalizations across people. We analyzed XAI user studies (N = (...)
    Remove from this list   Direct download  
     
    Export citation  
     
    Bookmark   1 citation  
  39. Generalization Bias in Large Language Model Summarization of Scientific Research.Uwe Peters & Benjamin Chin-Yee - forthcoming - Royal Society Open Science.
    Artificial intelligence chatbots driven by large language models (LLMs) have the potential to increase public science literacy and support scientific research, as they can quickly summarize complex scientific information in accessible terms. However, when summarizing scientific texts, LLMs may omit details that limit the scope of research conclusions, leading to generalizations of results broader than warranted by the original study. We tested 10 prominent LLMs, including ChatGPT-4o, ChatGPT-4.5, DeepSeek, LLaMA 3.3 70B, and Claude 3.7 Sonnet, comparing 4900 LLM-generated summaries to (...)
    Remove from this list   Direct download  
     
    Export citation  
     
    Bookmark  
  40. Using artificial intelligence in health research.Daniel Rodger - forthcoming - Evidence-Based Nursing.
    Artificial intelligence is now widely accessible and already being used by healthcare researchers throughout various stages in the research process, such as assisting with systematic reviews, supporting data collection, facilitating data analysis and drafting manuscripts for publication. The most common AI tools used are forms of generative AI such as ChatGPT, Claude and Gemini. Generative AI is a type of AI that can generate human-like text, audio, videos, code and images based on text-based prompts inputted by a human user. Generative (...)
    Remove from this list   Direct download  
     
    Export citation  
     
    Bookmark  
  41. Brief Notes on Hard Takeoff, Value Alignment, and Coherent Extrapolated Volition.Gopal P. Sarma - forthcoming - Arxiv Preprint Arxiv:1704.00783.
    I make some basic observations about hard takeoff, value alignment, and coherent extrapolated volition, concepts which have been central in analyses of superintelligent AI systems.
    Remove from this list   Direct download  
     
    Export citation  
     
    Bookmark  
  42. The Global Brain Argument: Nodes, Computroniums and the AI Megasystem (Target Paper for Special Issue).Susan Schneider - forthcoming - Disputatio.
    The Global Brain Argument contends that many of us are, or will be, part of a global brain network that includes both biological and artificial intelligences (AIs), such as generative AIs with increasing levels of sophistication. Today’s internet ecosystem is but a hodgepodge of fairly unintegrated programs, but it is evolving by the minute. Over time, technological improvements will facilitate smarter AIs and faster, higher-bandwidth information transfer and greater integration between devices in the internet-of-things. The Global Brain (GB) Argument says (...)
    Remove from this list   Direct download  
     
    Export citation  
     
    Bookmark  
  43. Predicting and Preferring.Nathaniel Sharadin - forthcoming - Inquiry: An Interdisciplinary Journal of Philosophy.
    The use of machine learning, or “artificial intelligence” (AI) in medicine is widespread and growing. In this paper, I focus on a specific proposed clinical application of AI: using models to predict incapacitated patients’ treatment preferences. Drawing on results from machine learning, I argue this proposal faces a special moral problem. Machine learning researchers owe us assurance on this front before experimental research can proceed. In my conclusion I connect this concern to broader issues in AI safety.
    Remove from this list   Direct download (2 more)  
     
    Export citation  
     
    Bookmark   1 citation  
  44. Promotionalism, Orthogonality, and Instrumental Convergence.Nathaniel Sharadin - forthcoming - Philosophical Studies:1-31.
    Suppose there are no in-principle restrictions on the contents of arbitrarily intelligent agents’ goals. According to “instrumental convergence” arguments, potentially scary things follow. I do two things in this paper. First, focusing on the influential version of the instrumental convergence argument due to Nick Bostrom, I explain why such arguments require an account of “promotion,” i.e., an account of what it is to “promote” a goal. Then, I consider whether extant accounts of promotion in the literature -- in particular, probabilistic (...)
    Remove from this list   Direct download (4 more)  
     
    Export citation  
     
    Bookmark  
  45. Security practices in AI development.Petr Spelda & Vit Stritecky - forthcoming - AI and Society.
    What makes safety claims about general purpose AI systems such as large language models trustworthy? We show that rather than the capabilities of security tools such as alignment and red teaming procedures, it is security practices based on these tools that contributed to reconfiguring the image of AI safety and made the claims acceptable. After showing what causes the gap between the capabilities of security tools and the desired safety guarantees, we critically investigate how AI security practices attempt to fill (...)
    Remove from this list   Direct download (3 more)  
     
    Export citation  
     
    Bookmark  
  46. Deception and manipulation in generative AI.Christian Tarsney - forthcoming - Philosophical Studies.
    Large language models now possess human-level linguistic abilities in many contexts. This raises the concern that they can be used to deceive and manipulate on unprecedented scales, for instance spreading political misinformation on social media. In future, agentic AI systems might also deceive and manipulate humans for their own purposes. In this paper, first, I argue that AI-generated content should be subject to stricter standards against deception and manipulation than we ordinarily apply to humans. Second, I offer new characterizations of (...)
    Remove from this list   Direct download (4 more)  
     
    Export citation  
     
    Bookmark  
  47. The Shutdown Problem: An AI Engineering Puzzle for Decision Theorists.Elliott Thornley - forthcoming - Philosophical Studies:1-28.
    I explain the shutdown problem: the problem of designing artificial agents that (1) shut down when a shutdown button is pressed, (2) don’t try to prevent or cause the pressing of the shutdown button, and (3) otherwise pursue goals competently. I prove three theorems that make the difficulty precise. These theorems show that agents satisfying some innocuous-seeming conditions will often try to prevent or cause the pressing of the shutdown button, even in cases where it’s costly to do so. And (...)
    Remove from this list   Direct download (4 more)  
     
    Export citation  
     
    Bookmark   5 citations  
  48. Existentialist risk and value misalignment.Ariela Tubert & Justin Tiehen - forthcoming - Philosophical Studies.
    We argue that two long-term goals of AI research stand in tension with one another. The first involves creating AI that is safe, where this is understood as solving the problem of value alignment. The second involves creating artificial general intelligence, meaning AI that operates at or beyond human capacity across all or many intellectual domains. Our argument focuses on the human capacity to make what we call “existential choices”, choices that transform who we are as persons, including transforming what (...)
    Remove from this list   Direct download (4 more)  
     
    Export citation  
     
    Bookmark   2 citations  
  49. Automated Influence and Value Collapse: Resisting the Control Argument.Dylan J. White - forthcoming - American Philosophical Quarterly.
    Automated influence is one of the most pervasive applications of artificial intelligence in our day-to-day lives, yet a thoroughgoing account of its associated individual and societal harms is lacking. By far the most widespread, compelling, and intuitive account of the harms associated with automated influence follows what I call the control argument. This argument suggests that users are persuaded, manipulated, and influenced by automated influence in a way that they have little or no control over. Based on evidence about the (...)
    Remove from this list   Direct download  
     
    Export citation  
     
    Bookmark  
  50. Language Agents Reduce the Risk of Existential Catastrophe.Simon Goldstein & Cameron Domenico Kirk-Giannini - 2025 - AI and Society 40 (2):959-969.
    Recent advances in natural language processing have given rise to a new kind of AI architecture: the language agent. By repeatedly calling an LLM to perform a variety of cognitive tasks, language agents are able to function autonomously to pursue goals specified in natural language and stored in a human-readable format. Because of their architecture, language agents exhibit behavior that is predictable according to the laws of folk psychology: they function as though they have desires and beliefs, and then make (...)
    Remove from this list   Direct download (4 more)  
     
    Export citation  
     
    Bookmark   8 citations  
1 — 50 / 300