Show simple item record

dc.rights.licenseRestricted to current Rensselaer faculty, staff and students. Access inquiries may be directed to the Rensselaer Libraries.
dc.contributorJi, Qiang, 1963-
dc.contributorWozny, M. J. (Michael J.)
dc.contributorWang, Meng
dc.contributorMitchell, John E.
dc.contributor.authorNie, Siqi
dc.date.accessioned2021-11-03T08:42:47Z
dc.date.available2021-11-03T08:42:47Z
dc.date.created2017-01-13T09:41:55Z
dc.date.issued2016-12
dc.identifier.urihttps://hdl.handle.net/20.500.13015/1829
dc.descriptionDecember 2016
dc.descriptionSchool of Engineering
dc.description.abstractWe first focus on reducing the inference complexity for directed PGMs, i.e., Bayesian networks (BNs). The BN inference problem is NP-hard in general, but can be solved in polynomial time if its treewidth is bounded. To balance between BN's representation power and its inference complexity, we propose methods to learn BN structures with bounded treewidth. For small BNs, we introduce an exact algorithm based on a mixed integer linear programming formulation. For large BNs, to avoid the intractable computation of the treewidth, we propose to use k-trees as super-structures to indirectly impose the treewidth constraint during BN structure learning. To identify the optimal k-tree, we propose different strategies to effectively search the k-tree space. Experiments on benchmark machine learning data sets demonstrate the significant performance improvement over state-of-the-art methods for both BN inference efficiency and accuracy.
dc.description.abstractFor undirected PGMs, we focus on learning and inference with restricted Boltzmann machines (RBMs), a special kind of Markov network. Conventional RBMs assume inputs to be scalars. To model high-dimensional motion data, we extend the RBM to capture both spatial and temporal interactions within and among vectors, producing the RBM with local interactions (LRBM). We extend the conventional contrastive divergence method for parameter learning in LRBMs. The LRBM is evaluated and compared to state-of-the-art dynamic models on the tasks of human action recognition and facial expression recognition with superior performance.
dc.description.abstractThe model learning includes parameter learning and structure learning. For parameter learning, we first perform layerwise unsupervised learning by maximizing the marginal likelihood through stochastic gradient ascent. The parameters of the entire model are then refined jointly by maximizing either the joint likelihood (generative) or conditional likelihood (discriminative). For structure learning, we explore the maximum likelihood learning with L1-regularization and structural expectation maximization (EM) algorithm. The L1-regularization formulates the structure learning as parameter learning under sparsity constraints. The EM algorithm approximates the E-step using stochastic samples, and solves the M-step using hill climbing. We further combine the structural EM with local learning under L1-regularization. For inference, to overcome the computational intractability with the large number of dependent latent variables, we propose the pseudo-likelihood to approximate the posterior probability and employ the coordinate ascent algorithm to perform maximum a posteriori (MAP) inference. We thoroughly evaluate the performance of the deep directed model against other probabilistic and non-probabilistic deep models in terms of data representation, feature learning, and model-based classification.
dc.description.abstractNext we investigate the learning and inference methods for a probabilistic deep model. To explicitly capture the interactions among latent variables and to avoid an exponential number of parameters, we propose to construct the deep generative model with two kinds of building blocks, regression Bayesian networks (RBNs) and Noisy-Or Bayesian networks (NoBNs). The RBNs use a regression function to define the conditional probability, while the NoBNs use a Noisy-Or function to explicitly capture the causal relationship. Compared to existing undirected probabilistic deep models such as deep Boltzmann machines (DBMs) or deep belief networks (DBNs), the proposed deep directed model can better represent the data, its underlying patterns, and the uncertainties in the data.
dc.description.abstractProbabilistic graphical models (PGMs) are powerful tools to compactly represent the probabilistic dependencies among random variables. With their powerful and intuitive representation ability as well as a body of well-developed algorithms, PGMs have been widely applied to solving many real world problems. Despite significant progress, learning and inference with PGMs remain intractable, in particular for large models and big data. This thesis focuses on developing advanced learning and inference methods for both directed and undirected PGMs to improve their performance on large domains.
dc.language.isoENG
dc.publisherRensselaer Polytechnic Institute, Troy, NY
dc.relation.ispartofRensselaer Theses and Dissertations Online Collection
dc.subjectElectrical engineering
dc.titleEfficient learning and inference for probabilistic graphical models
dc.typeElectronic thesis
dc.typeThesis
dc.digitool.pid177816
dc.digitool.pid177817
dc.digitool.pid177818
dc.rights.holderThis electronic version is a licensed copy owned by Rensselaer Polytechnic Institute, Troy, NY. Copyright of original work retained by author.
dc.description.degreePhD
dc.relation.departmentDept. of Electrical, Computer, and Systems Engineering


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record