Abstract
One of the fundamental research goals for explanation-based Natural Language Inference (NLI) is to build models that can reason in complex domains through the generation of natural language explanations. However, the methodologies to design and evaluate explanation-based inference models are still poorly informed by theoretical accounts on the nature of explanation. As an attempt to provide an epistemologically grounded characterisation for NLI, this paper focuses on the scientific domain, aiming to bridge the gap between theory and practice on the notion of a scientific explanation. Specifically, the paper combines a detailed survey of the modern accounts of scientific explanation in Philosophy of Science with a systematic analysis of corpora of natural language explanations, clarifying the nature and function of explanatory arguments from both a top-down (categorical) and a bottom-up (corpus-based) perspective. Through a mixture of quantitative and qualitative methodologies, the presented study allows deriving the following main conclusions: (1) Explanations cannot be entirely characterised in terms of inductive or deductive arguments as their main function is to perform unification; (2) An explanation typically cites causes and mechanisms that are responsible for the occurrence of the event to be explained; (3) While natural language explanations possess an intrinsic causal-mechanistic nature, they are not limited to causes and mechanisms, also accounting for pragmatic elements such as definitions, properties and taxonomic relations; (4) Patterns of unification naturally emerge in corpora of explanations even if not intentionally modelled; (5) Unification is realised through a process of abstraction, whose function is to provide the inference mechanism for subsuming the event to be explained under recurring patterns and high-level regularities. The paper contributes to addressing a fundamental gap in classical theoretical accounts on the nature of scientific explanations and their materialisation as linguistic artefacts. This characterisation can support a more principled design and evaluation of explanation-based AI systems which can better interpret, process, and generate natural language explanations.