Show simple item record

dc.rights.licenseRestricted to current Rensselaer faculty, staff and students. Access inquiries may be directed to the Rensselaer Libraries.
dc.contributorJi, Heng
dc.contributorFox, Peter A.
dc.contributorHendler, James A.
dc.contributorLin, Chin-Yew
dc.contributorSun, Yizhou
dc.contributor.authorHuang, Hongzhao
dc.date.accessioned2021-11-03T08:27:09Z
dc.date.available2021-11-03T08:27:09Z
dc.date.created2015-06-09T13:58:54Z
dc.date.issued2015-05
dc.identifier.urihttps://hdl.handle.net/20.500.13015/1496
dc.descriptionMay 2015
dc.descriptionSchool of Science
dc.description.abstractMicroblogging, a new type of online information sharing platform through short messages of up to 140 characters, has grown up quickly and received increasing attentions in recent years. A microblogging platform (e.g., Twitter) enables both individuals and organizations to disseminate information, from current affairs to breaking news in a timely fashion, which makes it a valuable knowledge source with super-fresh information. For example, during Hurricane Irene in 2011, updates from users living in New York City and transportation/evacuation posts from the government are very useful information for people to keep track of the disaster. Therefore, conducting related Natural Language Processing (NLP) research on this new genre is demanded to assist knowledge mining and discovery.
dc.description.abstractTo achieve our goals, we propose to leverage and model heterogeneous information networks (HINs), in contrast to most existing NLP approaches on traditional genres (e.g., news) that only explored single type of information (e.g., texts). Microblogging contains heterogeneous types of information from social network structures to cross-genre link- ages, forming rich HINs. By designing effective approaches to model both unstructured texts and structured HINs, we can incorporate additional evidence from HIN structures beyond texts. In this thesis, we present different approaches to construct HINs from cross- genre, cross-source, and cross-type information by incorporating the existing clean social relations, as well as performing deep content analysis with some of the well-developed NLP approaches. We also present various effective approaches including unsupervised propagation, semi-supervised graph regularization, supervised learning-to-rank and deep neural networks to model HINs for ranking, classification, and similarity measurement. Our experimental results demonstrate that heterogeneous information network analysis approaches are also powerful in the field of NLP.
dc.description.abstractDifferent from the semi-structured knowledge bases (e.g., Wikipedia) and the traditional news, the informal microblogs tend to be noisy, short, and informal. And the phenomenon of information implicitness is more prominent and pervasive in microblogging. These characteristics bring unique challenges to people's reading and understanding of the informal microblogs, as well as many knowledge mining and discovery tasks. Thus, in order to alleviate these problems, in this thesis we propose to filter noisy and uninformative information, enrich the short microblogs with background knowledge from knowledge bases such as Wikipedia, and resolve the informal and implicit information to their regular referents.
dc.language.isoENG
dc.publisherRensselaer Polytechnic Institute, Troy, NY
dc.relation.ispartofRensselaer Theses and Dissertations Online Collection
dc.subjectComputer science
dc.titleModeling heterogeneous networks for information ranking, enrichment and resolution on microblogs
dc.typeElectronic thesis
dc.typeThesis
dc.digitool.pid176074
dc.digitool.pid176075
dc.digitool.pid176076
dc.rights.holderThis electronic version is a licensed copy owned by Rensselaer Polytechnic Institute, Troy, NY. Copyright of original work retained by author.
dc.description.degreePhD
dc.relation.departmentDept. of Computer Science


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record