Shutdown-seeking AI

Philosophical Studies:1-13 (forthcoming)
  Copy   BIBTEX

Abstract

We propose developing AIs whose only final goal is being shut down. We argue that this approach to AI safety has three benefits: (i) it could potentially be implemented in reinforcement learning, (ii) it avoids some dangerous instrumental convergence dynamics, and (iii) it creates trip wires for monitoring dangerous capabilities. We also argue that the proposal can overcome a key challenge raised by Soares et al. (2015), that shutdown-seeking AIs will manipulate humans into shutting them down. We conclude by comparing our approach with Soares et al.'s corrigibility framework.

Other Versions

No versions found

Links

PhilArchive



    Upload a copy of this work     Papers currently archived: 100,774

External links

Setup an account with your affiliations in order to access resources via your University's proxy server

Through your library

Analytics

Added to PP
2024-06-07

Downloads
68 (#299,178)

6 months
22 (#131,638)

Historical graph of downloads
How can I increase my downloads?

Author Profiles

Simon Goldstein
University of Hong Kong
Pamela Robinson
University of British Columbia, Okanagan

References found in this work

Fully Autonomous AI.Wolfhart Totschnig - 2020 - Science and Engineering Ethics 26 (5):2473-2485.

Add more references