Show simple item record

dc.rights.licenseRestricted to current Rensselaer faculty, staff and students. Access inquiries may be directed to the Rensselaer Libraries.
dc.contributorZaki, Mohammed J., 1971-
dc.contributorMagdon-Ismail, Malik
dc.contributorGoldberg, Mark
dc.contributorRavichandran, T.
dc.contributor.authorAnchuri, Pranay
dc.date.accessioned2021-11-03T08:28:15Z
dc.date.available2021-11-03T08:28:15Z
dc.date.created2015-10-01T11:33:09Z
dc.date.issued2015-08
dc.identifier.urihttps://hdl.handle.net/20.500.13015/1530
dc.descriptionAugust 2015
dc.descriptionSchool of Science
dc.description.abstractMany real-world graphs have complex labels on the nodes and edges. Mining only exact patterns yields limited insights, since it may be hard to find exact matches. However, in many domains it is relatively easy to compute some cost (or distance) between different labels. Using this information, it becomes possible to mine a much richer set of approximate subgraph patterns.
dc.description.abstractWe apply our models and methods on several real-datasets such as: i) Configuration management databases representing the infrastructure entities and their inter-relationships in large IT companies. ii) Protein-Protein interaction network in organisms such as yeast and graphs representing 3D structure of proteins.
dc.description.abstractApproximate Patterns : We propose three models for approximate matching of patterns in a given input graph. First, we allow for bounded label mismatches of the pattern. To find these approximate matches, we proposed a neighborhood-label based algorithm that can effciently prune infeasible matches of the pattern. Second, we allow both bounded label and structural mismatches of the pattern. To effciently find such approximate matches, we repeatedly use our label-only algorithm on a specially constructed subgraph of the pattern. Finally, we discuss why the existing models cannot be adapated to mine interesting patterns from uncertain graphs where edges are associated with a probability of existence. Therefore, we propose coverage based pattern mining that is a novel way to think about pattern mining in uncertain graphs. Our algorithm essentially enumerates a set of patterns that covers distinct regions of the input graph with high probability.
dc.description.abstractApproximate Support: The problem of mining frequent subgraph patterns from a database of graphs is a well studied problem in the literature. However, the methods proposed for the multiple input graphs scenario cannot be directly extended for mining patterns from a single large graph. This is due to the fact that the support function, is defined in terms of all the embeddings of a pattern, which can be exponential. We propose a network-flow based approach that gives a polynomial time approximation for the support function. We also propose a three step procedure to summarize the output of graph mining algorithms.
dc.description.abstractGraph analytics is the process of discovering patterns and insights from data that can be modeled as graphs. Algorithms for graph analytics fall into two broad categories: Mining and Management. Graph mining algorithms are often used in graph management and vice versa. In recent times, these algorithms have become an indispensable tool for analyzing networks in domains such as i) Computational biology, ii) Infrastructure and mobile sectors, iii) Cybersecurity.
dc.description.abstractIn this thesis, we present methods for mining approximate frequent patterns from a single large graph. The approximate pattern mining algorithms that we consider in this thesis cover two main categories: Firstly, instead of computing the exact support of a pattern we approximate the support by an upper-bound. Secondly, we tolerate bounded label and structural mismatches when finding matches of a pattern in the database. A common theme across both these paradigms is that the problems are NP-Hard to even approximate upto a constant factor of the optimal value. We formally prove the exponential complexity of all these problems and also present approximation algorithms for solving the problems at scale.
dc.language.isoENG
dc.publisherRensselaer Polytechnic Institute, Troy, NY
dc.relation.ispartofRensselaer Theses and Dissertations Online Collection
dc.subjectComputer science
dc.titleMining approximate frequent patterns from graph databases
dc.typeElectronic thesis
dc.typeThesis
dc.digitool.pid176726
dc.digitool.pid176727
dc.digitool.pid176728
dc.rights.holderThis electronic version is a licensed copy owned by Rensselaer Polytechnic Institute, Troy, NY. Copyright of original work retained by author.
dc.description.degreePhD
dc.relation.departmentDept. of Computer Science


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record