Abstract
In the last decades, supervised machine learning has seen the widespread growth of highly complex, non-interpretable models, of which deep neural networks are the most typical representative. Due to their complexity, these models have showed an outstanding performance in a series of tasks, as in image recognition and machine translation. Recently, though, there has been an important discussion over whether those non-interpretable models are able to provide any sort of understanding whatsoever. For some scholars, only interpretable models can provide understanding. More popular, however, is the idea that understanding can come from a careful analysis of the dataset or from the model’s theoretical basis. In this paper, I wish to examine the possible forms of obtaining understanding of such non-interpretable models. Two main strategies for providing understanding are analyzed. The first involves understanding without interpretability, either through external evidence for the model’s inner functioning or through analyzing the data. The second is based on the artificial production of interpretable structures, through three main forms: post hoc models, hybrid models, and quasi-interpretable structures. Finally, I consider some of the conceptual difficulties in the attempt to create explanations for these models, and their implications for understanding.