The Devil in the Data: Machine Learning & the Theory-Free Ideal

Abstract

Machine learning (ML) refers to a class of computer-facilitated methods of statistical modelling. ML modelling techniques are now being widely adopted across the sciences. A number of outspoken representatives from the general public, computer science, various scientific fields, and philosophy of science alike seem to share in the belief that ML will radically disrupt scientific practice or the variety of epistemic outputs science is capable of producing. Such a belief is held, at least in part, because its adherents take ML to exist on novel epistemic footing relative to classical mathematical or statistical modelling approaches utilised in science. Namely, they take modelling with ML to be a “theory-free” enterprise, in the sense of not resting essentially on input from human conceptual grasp on the target phenomenon and domain expertise. I take this view to arise from the further, and more deeply entrenched belief that data is worldly and objective; i.e., data is viewed as recapitulating or representing with perfect fidelity the structure or properties of the systems in nature it is sampled from. Yet most contemporary philosophers of science take on board, in one version or other, the thesis of theory-ladenness or theory-mediation of observation or measurement. From this, it follows that most philosophers of science (and, I will venture, many scientists) hold irreconcilable views on the nature of data, an internal tension which threatens the integrity of their appraisals of ML in science. Taking the thesis of theory-ladenness on board---and its implications for the nature of data seriously---it follows that there is no reason to believe that ML differs fundamentally in its epistemic footing from established mathematical modelling approaches in the sciences. I show that usage and interpretation can differ from “standard practice” in scientific projects which wield the tools of ML without accepting a difference in the epistemic or representational status of such tools.

Other Versions

No versions found

Links

PhilArchive

    This entry is not archived by us. If you are the author and have permission from the publisher, we recommend that you archive it. Many publishers automatically grant permission to authors to archive pre-prints. By uploading a copy of your work, you will enable us to better index it, making it easier to find.

    Upload a copy of this work     Papers currently archived: 106,169

External links

Setup an account with your affiliations in order to access resources via your University's proxy server

Through your library

  • Only published works are available at libraries.

Similar books and articles

What kind of novelties can machine learning possibly generate? The case of genomics.Emanuele Ratti - 2020 - Studies in History and Philosophy of Science Part A 83:86-96.
Do ML models represent their targets?Emily Sullivan - forthcoming - Philosophy of Science.
Predicting and explaining with machine learning models: Social science as a touchstone.Oliver Buchholz & Thomas Grote - 2023 - Studies in History and Philosophy of Science Part A 102 (C):60-69.
Machine Learning and the Future of Scientific Explanation.Florian J. Boge & Michael Poznic - 2021 - Journal for General Philosophy of Science / Zeitschrift für Allgemeine Wissenschaftstheorie 52 (1):171-176.

Analytics

Added to PP
2023-06-03

Downloads
395 (#79,291)

6 months
24 (#135,495)

Historical graph of downloads
How can I increase my downloads?

Author's Profile

Mel Andrews
University of Cincinnati (PhD)

Citations of this work

No citations found.

Add more citations

References found in this work

No references found.

Add more references