Abstract
Network analysis views complex systems as networks with well-defined structural properties that account for their complexity. These characteristics, which include scale-free behavior, small worlds and communities, are not to be found in networks such as random graphs and lattices that do not correspond to complex systems. They provide therefore a robust ground for claiming the existence of “complex networks” as a non-trivial subset of networks. The theory of complex networks has thus been successful in making systematically explicit relevant marks of complexity in the form of structural properties, and this success is at the root of its current popularity. Much less systematic has been, on the other hand, the definition of the properties of the building components of complex networks. The obvious assumption is that these components must be nodes and links. Generally, however, the internal structure of nodes is not taken into account, and links are serendipitously identified by the perspective with which one looks at the network to be analyzed. For instance, if the nodes are Web pages that contain information about scientific papers, one point of view will match the relevant links with hyperlinks to similar Web pages, and another with citations of other articles. We intend to contribute here a systematic approach to the identification of the components of a complex network that is based on information theory. The approach hinges on some recent results arising from the convergence between the theory of complex networks and probabilistic techniques for content mining. At its core there is the idea that nodes in a complex network correspond to basic information units from which links are extracted via methods of machine learning. Hence the links themselves are viewed as emergent properties, similarly to the broader structural properties mentioned above. Indeed, beside rounding up the theory, this approach based on learning has clear practical benefits, in that it makes networks emerge from arbitrary information domains. We provide examples and applications in a variety of contexts, starting from an information-theoretic reconstruction of the well-known distinction between “strong links” and “weak links” and then delving into specific applications such as business process management and analysis of policy making.