## Statistical learning and inference over networks

Loading...

##### Authors

Sihag, Saurabh

##### Issue Date

2020-12

##### Type

Electronic thesis

Thesis

Thesis

##### Language

ENG

##### Keywords

Electrical engineering

##### Alternative Title

##### Abstract

In this dissertation, we build upon the fundamental principles behind structure learning, parameter estimation, hypothesis testing, and group testing to analyze different problems in network models that arise in practical applications. We first consider the problem of structure learning in the setting where the network structure is completely unknown and is to be inferred from the collected data samples. In applications such as genetic networks, social networks and physical infrastructures, there may exist multiple network models that share similar structural properties. In such scenarios, inference about the structure of one network model provides the side information to the inference tasks for other models. Under this motivation, we study the problem of structure learning of a pair of graphs that have unknown, partially identical structures. We consider the families of Ising and Gaussian models and analyze the sample complexity of learning the structure learning under bounded probability of error in the structure recovery. We provide algorithm independent information-theoretic bounds on the sample complexity that establish the statistical difficulty of the problem and provide a benchmark for any algorithm. From an algorithmic perspective, we analyze the performance of a maximum likelihood based decoder and a tractable joint structure learning algorithm that provide the sufficient conditions on the sample complexity for bounded probability of error.

Given this distributed estimation framework, we next focus on sequentially estimating the parameters in the sensor network. In many applications, data acquisition is accompanied by logistical constraints. By imposing practical constraints on the number of data points that the network can collect at a time instant, we provide a sequential strategy that dynamically decides about the minimum number of measurements required to form reliable estimates, the sensors from which these measurements should be collected, and the optimal estimates for each sensor. Specifically, the proposed sequential strategy consists of the stopping rule of the sampling process, a data-adaptive control policy for selecting the sensors over time, and a set of estimators, combination of which admits asymptotic optimality.

Moving on from the problems of structure learning and estimation under model uncertainty that address the challenges in statistical inference when the true network model is not known perfectly, we next focus on scalable and distributed inference techniques to address challenges offered by large scale networks. Many applications such as power systems, surveillance systems and internet of things can be modeled by networks of interconnected measurement devices collecting data for various inferential purposes. As the scale of the network model increases, it is desirable to distribute the inference among different autonomous entities to gain scalability. Under this motivation, we consider the problem of parameter estimation in networked data, where the network consists of multiple sensors that are collectively observing a set of unknown parameters. Each sensor is observing only a subset of the complete set of parameters. In this setting, we propose a distributed estimation framework characterized by a cost function that incorporates the quality of estimates based on autonomous decisions at all sensors and the consistency in the estimates of commonly observed parameters by different sensors. We provide closed form estimation rules that optimize the proposed estimation cost.

When detection of anomalous behavior in a group of items is the primary objective, group testing approaches to detection of anomalous items are appealing due to their inherent scalability for large scale problems. The standard group testing frameworks are characterized by multiple pooled tests where the items are selected for a pooled test independently of each other. However, in practice, the pooling of items may be governed by the underlying relationships depending on the context. An example of such underlying relationship is a network or graph model that models context-specific relationships, such as proximity, access, correlation, etc., between different items. Motivated by such practical scenarios, we study the problem of group testing in network models where a few nodes of the network may be defective and the pool of nodes to be tested at every instant must conform to the network structure, i.e., form a subgraph. We provide multi-stage group testing approaches with limited adaptivity and characterize the sufficient number of tests in terms of the properties of the network model for isolation of defective nodes over the network.

We next consider the problem of parameter estimation in data models when there is uncertainty in the true underlying model of the measurements. Such uncertainties may arise due to the possibility of disruptions or changes in the nominal model, for e.g., line outages in power grids. We assume that there is a possibility that the underlying model may deviate from the nominal model to a set of alternative models. The key observation is that the estimation quality is critically intertwined with the successful isolation of the true model but the true model cannot be determined perfectly due to noisy measurements. We formulate the parameter estimation problem under uncertainty as a multiple composite hypothesis testing problem and provide closed form optimal decision rules that account for estimation quality in the network under the constraints on the error rates in isolating the true model.

Given this distributed estimation framework, we next focus on sequentially estimating the parameters in the sensor network. In many applications, data acquisition is accompanied by logistical constraints. By imposing practical constraints on the number of data points that the network can collect at a time instant, we provide a sequential strategy that dynamically decides about the minimum number of measurements required to form reliable estimates, the sensors from which these measurements should be collected, and the optimal estimates for each sensor. Specifically, the proposed sequential strategy consists of the stopping rule of the sampling process, a data-adaptive control policy for selecting the sensors over time, and a set of estimators, combination of which admits asymptotic optimality.

Moving on from the problems of structure learning and estimation under model uncertainty that address the challenges in statistical inference when the true network model is not known perfectly, we next focus on scalable and distributed inference techniques to address challenges offered by large scale networks. Many applications such as power systems, surveillance systems and internet of things can be modeled by networks of interconnected measurement devices collecting data for various inferential purposes. As the scale of the network model increases, it is desirable to distribute the inference among different autonomous entities to gain scalability. Under this motivation, we consider the problem of parameter estimation in networked data, where the network consists of multiple sensors that are collectively observing a set of unknown parameters. Each sensor is observing only a subset of the complete set of parameters. In this setting, we propose a distributed estimation framework characterized by a cost function that incorporates the quality of estimates based on autonomous decisions at all sensors and the consistency in the estimates of commonly observed parameters by different sensors. We provide closed form estimation rules that optimize the proposed estimation cost.

When detection of anomalous behavior in a group of items is the primary objective, group testing approaches to detection of anomalous items are appealing due to their inherent scalability for large scale problems. The standard group testing frameworks are characterized by multiple pooled tests where the items are selected for a pooled test independently of each other. However, in practice, the pooling of items may be governed by the underlying relationships depending on the context. An example of such underlying relationship is a network or graph model that models context-specific relationships, such as proximity, access, correlation, etc., between different items. Motivated by such practical scenarios, we study the problem of group testing in network models where a few nodes of the network may be defective and the pool of nodes to be tested at every instant must conform to the network structure, i.e., form a subgraph. We provide multi-stage group testing approaches with limited adaptivity and characterize the sufficient number of tests in terms of the properties of the network model for isolation of defective nodes over the network.

We next consider the problem of parameter estimation in data models when there is uncertainty in the true underlying model of the measurements. Such uncertainties may arise due to the possibility of disruptions or changes in the nominal model, for e.g., line outages in power grids. We assume that there is a possibility that the underlying model may deviate from the nominal model to a set of alternative models. The key observation is that the estimation quality is critically intertwined with the successful isolation of the true model but the true model cannot be determined perfectly due to noisy measurements. We formulate the parameter estimation problem under uncertainty as a multiple composite hypothesis testing problem and provide closed form optimal decision rules that account for estimation quality in the network under the constraints on the error rates in isolating the true model.

##### Description

December 2020

School of Engineering

School of Engineering

##### Full Citation

##### Publisher

Rensselaer Polytechnic Institute, Troy, NY