Shutdown-seeking AI

Simon Goldstein; Pamela Robinson

Download from

dx.doi.org

More download options

Shutdown-seeking AI

Simon Goldstein & Pamela Robinson

Philosophical Studies:1-13 (forthcoming) Copy BIBT_EX

Abstract

We propose developing AIs whose only final goal is being shut down. We argue that this approach to AI safety has three benefits: (i) it could potentially be implemented in reinforcement learning, (ii) it avoids some dangerous instrumental convergence dynamics, and (iii) it creates trip wires for monitoring dangerous capabilities. We also argue that the proposal can overcome a key challenge raised by Soares et al. (2015), that shutdown-seeking AIs will manipulate humans into shutting them down. We conclude by comparing our approach with Soares et al.'s corrigibility framework.

Author Profiles

Simon Goldstein

University of Hong Kong

Pamela Robinson

University of British Columbia, Okanagan

Keywords

Philosophy of AI Decision Theory AI Safety

Reprint years

DOI

10.1007/s11098-024-02099-6

Other Versions

No versions found

Links

PhilArchive

This entry is not archived by us. If you are the author and have permission from the publisher, we recommend that you archive it. Many publishers automatically grant permission to authors to archive pre-prints. By uploading a copy of your work, you will enable us to better index it, making it easier to find.

Upload a copy of this work Papers currently archived: 104,706

External links

Setup an account with your affiliations in order to access resources via your University's proxy server

Through your library

Sign in / register and customize your OpenURL resolver
Configure custom resolver

My notes

Analytics

Added to PP
2024-06-07

Downloads
73 (#310,051)

6 months
14 (#233,595)

Historical graph of downloads

How can I increase my downloads?

Author Profiles

Simon Goldstein

University of Hong Kong

Pamela Robinson

University of British Columbia, Okanagan

Citations of this work

The Shutdown Problem: An AI Engineering Puzzle for Decision Theorists.Elliott Thornley - forthcoming - Philosophical Studies:1-28.

Towards Shutdownable Agents via Stochastic Choice.Elliott Thornley, Alexander Roman, Christos Ziakas, Leyton Ho & Louis Thomson - 2025 - Technical A.I. Safety Conference.

Add more citations

References found in this work

Fully Autonomous AI.Wolfhart Totschnig - 2020 - Science and Engineering Ethics 26 (5):2473-2485.

Add more references

Applied ethics	Epistemology	History of Western Philosophy	Meta-ethics	Metaphysics	Normative ethics
Philosophy of biology	Philosophy of language	Philosophy of mind	Philosophy of religion	Science Logic and Mathematics	More ...

Shutdown-seeking AI

Abstract

Author Profiles

Categories

Keywords

Reprint years

DOI

Other Versions

Links

PhilArchive

External links

Through your library

My notes

Similar books and articles

Analytics

Author Profiles

Citations of this work

References found in this work