Show simple item record

dc.rights.licenseCC BY-NC-ND. Users may download and share copies with attribution in accordance with a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 License. No commercial use or derivatives are permitted without the explicit approval of the author.
dc.contributorSzymanśki, Bolesław
dc.contributorAnshelevich, Elliot
dc.contributorKorniss, Gyorgy
dc.contributorAdali, Sibel
dc.contributor.authorChen, Mingming
dc.date.accessioned2021-11-03T08:32:07Z
dc.date.available2021-11-03T08:32:07Z
dc.date.created2016-02-25T14:34:12Z
dc.date.issued2015-12
dc.identifier.urihttps://hdl.handle.net/20.500.13015/1601
dc.descriptionDecember 2015
dc.descriptionSchool of Science
dc.description.abstractMany networks contain community structure which identifies groups of nodes within which connections are denser than between them. Detecting and characterizing such community structure, which is known as community detection, is one of the fundamental issues in the study of network systems. It has received a considerable attention in the last years. Numerous techniques have been developed for both efficient and effective community detection. The most popular one has been to maximize the community quality metric known as Newman's modularity over all the possible partitions of a network. This metric measures the difference between the fraction of all edges that are within the actual community and a fraction of such edges in a randomized graph with the same number of nodes and the same degree sequence. It is widely used to measure the strength of the community structure detected by the community detection algorithms.
dc.description.abstractAlthough community detection is one of the fundamental techniques of network science, the community structure of networks discovered by community detection algorithms does not usually represent the reality. The primary reason for this is incompleteness and inaccuracy of current network data collection methods, which may cause datasets to appear less modular than the underlying networks really are. Thus, in this thesis we aim at recovering or improving the network community structure which may be hidden or impaired because of the missing or incorrectly identified extraneous edges. To this end, we introduce a method for improving the network structure. This method uses the scores obtained from different link prediction techniques to replace a certain fraction of low ranking existing links with the top ranked predicted links.
dc.description.abstractWe then propose a novel fine-tuned disjoint community detection algorithm that repeatedly attempts to improve the quality metrics by splitting and merging the given community structure. This new algorithm can actually be used to optimize any community quality metric. However, in this thesis, we only consider modularity and modularity density.
dc.description.abstractIn addition, modularity can only be used to quantify the quality of disjoint communities. However, it is more realistic to expect that nodes in real-world networks belong to more than one community, resulting in overlapping communities. In the past few years, several overlapping extensions of modularity were proposed to measure the quality of overlapping community structure. However, all these extensions differ just in the way they define the belonging coefficient and belonging function. Yet, there is lack of systematic comparison of different extensions. To fill this gap, we overview overlapping extensions of modularity and generalize them with a uniform definition enabling application of different belonging coefficients and belonging functions to select the best. In addition, we extend localized modularity, modularity density, and eight local community quality metrics to enable their usages for overlapping communities.
dc.description.abstractHowever, modularity maximization suffers from two opposite yet concurrent problems. In some cases, it tends to split large communities into smaller communities. In other cases, it tends to form large communities by merging communities that are smaller than a certain threshold which depends on the total number of edges in the network and on the degree of inter-connectivity between the communities. The latter problem is well-known in the literature as the resolution limit problem. To solve these two problems simultaneously, we propose a new community quality metric, that we termed Modularity Density, as an alternative to modularity. First, we show modularity decreased by Split Penalty, defined as the fraction of edges that connect nodes of different communities, resolves the issue of favoring small communities. Then, we demonstrate that including community densities into modularity and split penalty eliminates the problem of favoring large communities, namely the resolution limit problem.
dc.language.isoENG
dc.publisherRensselaer Polytechnic Institute, Troy, NY
dc.relation.ispartofRensselaer Theses and Dissertations Online Collection
dc.rightsAttribution-NonCommercial-NoDerivs 3.0 United States*
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/3.0/us/*
dc.subjectComputer science
dc.titleDiscovering community structure by optimizing community quality metrics
dc.typeElectronic thesis
dc.typeThesis
dc.digitool.pid177009
dc.digitool.pid177011
dc.digitool.pid177014
dc.rights.holderThis electronic version is a licensed copy owned by Rensselaer Polytechnic Institute, Troy, NY. Copyright of original work retained by author.
dc.description.degreePhD
dc.relation.departmentDept. of Computer Science


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record

CC BY-NC-ND. Users may download and share copies with attribution in accordance with a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 License. No commercial use or derivatives are permitted without the explicit approval of the author.
Except where otherwise noted, this item's license is described as CC BY-NC-ND. Users may download and share copies with attribution in accordance with a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 License. No commercial use or derivatives are permitted without the explicit approval of the author.