Preprocessing and learning for graph structured data

Authors
Brissette, Christopher
ORCID
Loading...
Thumbnail Image
Other Contributors
Szymanski, Boleslaw, K.
Gao, Jianxi
Huang, Andy
Slota, George
Issue Date
2023-08
Keywords
Computer science
Degree
PhD
Terms of Use
This electronic version is a licensed copy owned by Rensselaer Polytechnic Institute (RPI), Troy, NY. Copyright of original work retained by author.
Full Citation
Abstract
Graphs are general structures which may be used to describe any system or dataset with related elements. Because of the prevalence of such data, efficient and accurate algorithms for analyzing graphs are of extreme importance in numerical algorithms and general data science tasks. The size of graph structured datasets has only increased in the past decades and promises to continue doing so. As companies like Google and Meta wrestle with peta and exa-scale graph analysis problems; computational scientists face many of the same issues as simulations require ever-larger meshes. Because of this, acceleration and preprocessing techniques are important to ensure graph algorithms run efficiently and accurately. We investigate several preprocessing and acceleration techniques for performing tasks on graph structured data. We develop a methodology for generating graph null-models with a desired degree distribution. This a problem which has been of interest to network scientists for decades. Despite this, parallelizable, fast subroutines used in algorithms for generating such graphs tend to yield inaccurate distributions. We suggest a novel analysis technique for the popular Chung-Lu random graph generator, and show that this analysis technique provides a method for automatically generating parameters for Chung-Lu-like null models as a pre-processing step. We provide several methods for generating these null-models, and show that in all cases we significantly out-perform standard Chung-Lu generation. We also suggest that such null models may be used to improve the accuracy of Modularity maximization. Additionally, we examine the task of coarsening graphs while preserving the spectrum of the graph Laplacian. Coarsening is an important preprocessing step for many large scale graph problems which aim to solve relatively smal subproblems and reconstruct an approximate solution on the original graph. Coarsening is used in clustering, partitioning, and multigrid methods for solving linear systems of equation. The graph Laplacian is an important operator for describing graph structured data. It relates the heat transfer in a graph to its topology. As such, its eigenvectors and eigenvalues hold important information about edge cuts and clustering. We present a heuristic for preserving the spectrum of the graph Laplacian during coarsening, and present a parallel algorithm for utilising this heuristic. This is in contrast to prior publications on the subject which focus on serial and k-means methods for spectrum consistent coarsening. We further analyze the inverse problem, and find that the original graph may be reconstructed to within some edge-weight error given a coarse representation which approximates its spectrum. This presents a novel development in graph coarsening literature and suggests that preserving a graphs spectrum during coarsening may be sufficient to preserve all structure. Finally we investigate a technique for accelerating the training of graph neural networks using Koopman operator theory. Graph neural networks provide a powerful method for performing classification and prediction tasks on graphs. This is in contrast to traditional neural networks which struggle with the unordered nature of nodes and edges. Because of this, a great deal of effort has been put into accelerating graph neural networks through techniques such as graph pooling. Despite this, they are still often slow to train. We suggest a method for accelerating training by interweaving standard backpropagation steps with prediction steps that make use of simple matrix-vector multiplication. We apply our method to the task of node classification and find that it is prone to instability, but can achieve multiple times speed-ups over Adam for well-chosen parameters. This work represents the first time Koopman training has been applied to graph neural networks, and the first time it has been applied on GPU.
Description
August2023
School of Science
Department
Dept. of Computer Science
Publisher
Rensselaer Polytechnic Institute, Troy, NY
Relationships
Rensselaer Theses and Dissertations Online Collection
Access
Users may download and share copies with attribution in accordance with a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 license. No commercial use or derivatives are permitted without the explicit approval of the author.
Collections