Inference of Cascades and Correlated Networks

Dec 1, 2022, 9:00 am10:30 am
Friend Center 007
Event Description

This thesis makes fundamental contributions to a few statistical inference tasks on networks, with a focus on information-theoretic characterizations.

We first study the problem of estimating the source of a network cascade given a time series of noisy information about the spread. Initially, there is a single vertex affected by the cascade (the source) and the cascade spreads in discrete time steps across the network. The cascade evolution is hidden, but one can observe a time series of noisy signals from each vertex. The time series of a vertex is assumed to be a sequence of i.i.d. samples from a pre-change distribution before the cascade affects the vertex, and the time series is a sequence of i.i.d. samples from a post-change distribution once the cascade has affected the vertex. Given the time series of noisy signals, which can be viewed as a noisy measurement of the cascade evolution, we aim to devise a procedure to reliably estimate the cascade source as fast as possible. We investigate Bayesian and minimax formulations of the source estimation problem and derive near-optimal estimators for simple cascade dynamics and network topologies. In the Bayesian setting, an estimator which observes samples until the error of the Bayes-optimal estimator falls below a threshold achieves optimal performance. In the minimax setting, optimal performance is achieved by designing a novel multi-hypothesis sequential probability ratio test (MSPRT). When there are n vertices in the graph, we find that these optimal estimators require log log (n) observations of the noisy time series when the network topology is a k-regular tree, and polylog(n) observations are required for lattices. Finally, we discuss how our methods may be extended to cascades on arbitrary graphs.

Next, we consider the tasks of graph matching and community recovery in networks with correlated structure. First, we study the problem of learning the latent vertex correspondence between two edge-correlated stochastic block models, focusing on the regime where the average degree is logarithmic in the number of vertices. We derive the precise information-theoretic threshold for exact recovery: above the threshold there exists an estimator that outputs the true correspondence with probability close to 1, while below it no estimator can recover the true correspondence with probability bounded away from 0. We then characterize the information-theoretic landscape of community recovery in correlated stochastic block models, which requires a delicate interplay between graph matching and community recovery algorithms. In particular, we uncover and characterize a region of the parameter space where exact community recovery is possible using multiple correlated graphs, even though (1) this is information-theoretically impossible using a single graph and (2) exact graph matching is also information-theoretically impossible. In this regime, we develop a novel algorithm that carefully synthesizes algorithms from the community recovery and graph matching literatures.