##### Abstract

In the context of information propagation, the community structures play an essential role in facilitating the local spread of information because the community members are more likely to accept inputs from each other than from the outsiders. On the other hand, the community structures slow down global information diffusion due to trapping the messages in dense regions and thus preventing global penetration. For this reason, we formulate a virality prediction framework for global online news using the community structure of online news media. The goal is to classify the viral news at their early stage of spreading. Our virality prediction framework considers every news a cascade in the network formed by online media sites.; When an online media site reports a piece of news, it gets infected by the corresponding cascade. Based on the statistical survival analysis, our model defines the probability of infection along the network edges given the propagation delay. Since the news generally spreads among the media sites of similar cultural backgrounds, our model assumes each media site is affiliated with some communities, and the news cascade reaches each community with a certain probability. The mathematical model, after simplification, becomes a graph representation learning model which finds the node embeddings from the news cascades. Given the node embeddings of the early adopters, the F1 score of the viral news prediction produced by our approach outperforms the results of feature-based approaches by almost 20% on the Global Database of Events, Language, and Tone (GDELT) dataset. We parallelize the corresponding graph representation learning algorithm for both shared memory and distributed memory machines. The parallelized stochastic gradient descent algorithm is shown to scale well to 10K+ cores in the IBM Blue Gene/Q supercomputer at RPI (ranked as one of the fastest world-wide IBM Blue Gene configuration in academia). Moreover, in order to facilitate the information spread in peer-to-peer (P2P) networks, we proposed a P2P overlay network construction algorithm which maintains the power-law distribution of nodes’ degree while peers dynamically join and leave the P2P network.; Finally, we study the patterns of the polarization evolution by analyzing millions of roll-call votes in the legislative branches of the United States. Community structure of the members in these legislative branches is a critical factor for the development of political polarization. We proposed a social dynamical model explaining the formation of polarization observed in real legislative voting data and derived the tipping points of the dynamical system which provides early warning signals when the system is close to the bifurcation. Our dynamical model successfully explains the directions of polarization change in 28 out of the past 30 U.S. Congresses. The hidden variable in the dynamical model, called the polarization utility, is shown to correlate well with critical historical events such as the civil rights movement and Super PACs.; Besides the maximization of modularity and its generalized version, an alternative approach to detect communities is the statistical inference to fit the generative graph model to the observed network data. The degree-corrected stochastic block model is one such random graph model, generating different network partitions, ranging from traditional assortative communities to disassortative structures. It does not impose any constraints on the mixing pattern of the resulting block assignments, thus the return of the traditional assortative community structures is not guaranteed. On top of the degree-corrected stochastic block model, we propose a generative random graph model which puts a constraint on nodes' internal degree ratio. This model stabilizes the inference of block model, avoiding inference algorithms like Markov chain Monte Carlo to get trapped in the local optima of the log-likelihood. Unlike the modularity maximization algorithm which always attempts to find traditional assortative communities, in this regularized model, one single regularization parameter controls the mixing patterns discovered from the given network.; While most community detection algorithms take unweighted graph as input by default, they can be extended to accept the edge weights. In this thesis, we also show that the appropriately assigned edge weights can improve the quality of the detected communities. We propose an edge weighting scheme to avoid the bias of modularity maximization towards merging well-formed small communities into large ones. Our proposed edge weighting scheme works in a semi-supervised fashion: a regression model penalized by the merging of small ground truth communities is trained to convert the local edge features into edge weights. Experimental results show that, in addition to modularity maximization, the five different state-of-the-art approaches are also significantly improved by the edge weights produced by our model.; The unified theory shows that, in generalized modularity maximization, the well-formed communities whose densities are smaller than the resolution parameter are split into multiple clusters; but the well-formed communities whose background inter-community edge densities are larger than the resolution parameter are merged into one large component. This result reveals a ``plateaus" problem that no resolution parameter that avoids such damaging splits and mergers exists when the density of any well-formed community is smaller than the background inter-community edge density among some other well-formed communities. Therefore, we propose a progressive agglomerative heuristic algorithm based on a statistical hypothesis testing framework that systematically increases the resolution parameter to partition a graph recursively. The statistical hypothesis testing checks if the partition found by each branch of recursion is significant. If it is, this recursion branch continues splitting the current graph at the next level; otherwise, this recursion branch terminates, accepting the null hypothesis that the current subgraph is already a community.; Community structures are observed across a wide variety of networks, including the World Wide Web, Internet, research collaboration, transportation, social and biochemical networks. Community detection aims at discovering the partition of the network nodes into groups such that the edges inside each group are generally more numerous than the edges across them. Modularity is perhaps the most widely used quality metric to evaluate the partition of network nodes. Despite being one of the most widely used state-of-the-art community detection approaches, modularity maximization suffers from the resolution limit problem which arises due to the implicit dependence of the modularity definition on a constant (explicitly defined in the generalized modularity as a resolution parameter). In this thesis, we uncover importance of this dependence using random graph theories to explain the resolution limit of modularity. Specifically, we establish the asymptotic theoretical upper and lower bounds on the resolution parameter of generalized modularity for the modularity maximization to recover community structure correctly, which is the first work connecting the resolution limit of modularity with the random graph models.; Emergent social phenomena like radicalization, civil unrest, and opinion migration can be explained by the accumulative influences between individuals in a social network. Such emergent phenomena are often described as ``nobody saw them coming" because of their explosive dynamics, and yet they have a profound impact on our society. Since these dynamics are significantly affected by the social network topology, there is a strong desire to study the interplay between the emergent dynamics and the underlying social network structure. In this doctoral thesis, we focus on the community structure in social networks and study its role in the prediction of emergent network dynamics.;

##### Description

May 2019; School of Science

##### Department

Dept. of Computer Science;

##### Publisher

Rensselaer Polytechnic Institute, Troy, NY

##### Relationships

Rensselaer Theses and Dissertations Online Collection;

##### Access

Restricted to current Rensselaer faculty, staff and students. Access inquiries may be directed to the Rensselaer Libraries.;