This paper reconceptualizes the problem of explainable artificial intelligence not as a matter of post-hoc explanation or the selection of simpler models, but as a problem of concept formation and conceptual structure. Despite its high performance, deep learning remains vulnerable to subtle input perturbations, and there are documented cases in which it reaches judgments in ways that diverge from human reasoning. In domains such as medicine, law, and security—where the cost of false positives is high—the mere ability to produce correct outputs is not sufficient; what is also required is the justification of the grounds for judgment and their reconstruction in a form intelligible to human agents.
Chapter 2 argues that, although deep learning makes autonomous concept formation possible by internalizing feature engineering within the learning process itself, it is precisely for this reason that the problem of explanation arises in a structural way.
Following Bengio et al. (2013), the chapter examines smoothness and multiple explanatory factors as central conditions, while pointing out that the expressive power of distributed representations alone cannot explain how a given activation pattern condenses into a single stable concept. Accordingly, the paper proposes that concept formation should be understood not as the mere aggregation of generative factors, but as the formation of patterns subject to structural constraints.
Chapter 3 introduces a probabilistic model of concept formationin order to treat human and artificial concepts within a common formal framework. In this model, a concept is understood as a probabilistic structure over a feature space, specified in terms of likelihoods and priors, and concept boundaries are determined by modality and spikeness.
The model further evaluates dimensional compressibilityby examining how well structural features are preserved under low-dimensional projection, and on that basis identifies simpler and more fundamental concepts. Moreover, translation between models is formulated not as a matter of one-to-one correspondence between individual concepts, but as a problem of comparison between whole models, with its cost assessed in terms of expected information loss. In this way, explainable AI is redefined as the problem of constraining human and machine systems so that they share conceptual structures that can be mutually recoded with low loss.