dc.rights.license | Restricted to current Rensselaer faculty, staff and students. Access inquiries may be directed to the Rensselaer Libraries. | |
dc.contributor | Zaki, Mohammed J., 1971- | |
dc.contributor | Magdon-Ismail, Malik | |
dc.contributor | Goldberg, Mark | |
dc.contributor | Ravichandran, T. | |
dc.contributor.author | Anchuri, Pranay | |
dc.date.accessioned | 2021-11-03T08:28:15Z | |
dc.date.available | 2021-11-03T08:28:15Z | |
dc.date.created | 2015-10-01T11:33:09Z | |
dc.date.issued | 2015-08 | |
dc.identifier.uri | https://hdl.handle.net/20.500.13015/1530 | |
dc.description | August 2015 | |
dc.description | School of Science | |
dc.description.abstract | Many real-world graphs have complex labels on the nodes and edges. Mining only exact patterns yields limited insights, since it may be hard to find exact matches. However, in many domains it is relatively easy to compute some cost (or distance) between different labels. Using this information, it becomes possible to mine a much richer set of approximate subgraph patterns. | |
dc.description.abstract | We apply our models and methods on several real-datasets such as: i) Configuration management databases representing the infrastructure entities and their inter-relationships in large IT companies. ii) Protein-Protein interaction network in organisms such as yeast and graphs representing 3D structure of proteins. | |
dc.description.abstract | Approximate Patterns : We propose three models for approximate matching of patterns in a given input graph. First, we allow for bounded label mismatches of the pattern. To find these approximate matches, we proposed a neighborhood-label based algorithm that can effciently prune infeasible matches of the pattern. Second, we allow both bounded label and structural mismatches of the pattern. To effciently find such approximate matches, we repeatedly use our label-only algorithm on a specially constructed subgraph of the pattern. Finally, we discuss why the existing models cannot be adapated to mine interesting patterns from uncertain graphs where edges are associated with a probability of existence. Therefore, we propose coverage based pattern mining that is a novel way to think about pattern mining in uncertain graphs. Our algorithm essentially enumerates a set of patterns that covers distinct regions of the input graph with high probability. | |
dc.description.abstract | Approximate Support: The problem of mining frequent subgraph patterns from a database of graphs is a well studied problem in the literature. However, the methods proposed for the multiple input graphs scenario cannot be directly extended for mining patterns from a single large graph. This is due to the fact that the support function, is defined in terms of all the embeddings of a pattern, which can be exponential. We propose a network-flow based approach that gives a polynomial time approximation for the support function. We also propose a three step procedure to summarize the output of graph mining algorithms. | |
dc.description.abstract | Graph analytics is the process of discovering patterns and insights from data that can be modeled as graphs. Algorithms for graph analytics fall into two broad categories: Mining and Management. Graph mining algorithms are often used in graph management and vice versa. In recent times, these algorithms have become an indispensable tool for analyzing networks in domains such as i) Computational biology, ii) Infrastructure and mobile sectors, iii) Cybersecurity. | |
dc.description.abstract | In this thesis, we present methods for mining approximate frequent patterns from a single large graph. The approximate pattern mining algorithms that we consider in this thesis cover two main categories: Firstly, instead of computing the exact support of a pattern we approximate the support by an upper-bound. Secondly, we tolerate bounded label and structural mismatches when finding matches of a pattern in the database. A common theme across both these paradigms is that the problems are NP-Hard to even approximate upto a constant factor of the optimal value. We formally prove the exponential complexity of all these problems and also present approximation algorithms for solving the problems at scale. | |
dc.language.iso | ENG | |
dc.publisher | Rensselaer Polytechnic Institute, Troy, NY | |
dc.relation.ispartof | Rensselaer Theses and Dissertations Online Collection | |
dc.subject | Computer science | |
dc.title | Mining approximate frequent patterns from graph databases | |
dc.type | Electronic thesis | |
dc.type | Thesis | |
dc.digitool.pid | 176726 | |
dc.digitool.pid | 176727 | |
dc.digitool.pid | 176728 | |
dc.rights.holder | This electronic version is a licensed copy owned by Rensselaer Polytechnic Institute, Troy, NY. Copyright of original work retained by author. | |
dc.description.degree | PhD | |
dc.relation.department | Dept. of Computer Science | |