An endangered species: how LLMs threaten Wikipedia’s sustainability

AI and Society:1-14 (forthcoming)
  Copy   BIBTEX

Abstract

As a collaboratively edited and open-access knowledge archive, Wikipedia offers a vast dataset for training artificial intelligence (AI) applications and models, enhancing data accessibility and access to information. However, reliance on the crowd-sourced encyclopedia raises ethical issues related to data provenance, knowledge production, curation, and digital labor. Drawing on critical data studies, feminist posthumanism, and recent research at the intersection of Wikimedia and AI, this study employs problem-centered expert interviews to investigate the relationship between Wikipedia and large language models (LLMs). Key findings include the unclear role of Wikipedia in LLM training, ethical issues, and potential solutions for systemic biases and sustainability challenges. By foregrounding these concerns, this study contributes to ongoing discourses on the responsible use of AI in digital knowledge production and information management. Ultimately, this article calls for greater transparency and accountability in how big tech entities use open-access datasets like Wikipedia, advocating for collaborative frameworks prioritizing ethical considerations and equitable representation.

Other Versions

No versions found

Links

PhilArchive

    This entry is not archived by us. If you are the author and have permission from the publisher, we recommend that you archive it. Many publishers automatically grant permission to authors to archive pre-prints. By uploading a copy of your work, you will enable us to better index it, making it easier to find.

    Upload a copy of this work     Papers currently archived: 103,343

External links

Setup an account with your affiliations in order to access resources via your University's proxy server

Through your library

Analytics

Added to PP
2025-02-21

Downloads
3 (#1,867,272)

6 months
3 (#1,061,821)

Historical graph of downloads
How can I increase my downloads?

Citations of this work

No citations found.

Add more citations