• Login
    View Item 
    •   DSpace@RPI Home
    • Rensselaer Libraries
    • RPI Theses Online (Complete)
    • View Item
    •   DSpace@RPI Home
    • Rensselaer Libraries
    • RPI Theses Online (Complete)
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Sentiment analysis of Twitter data

    Author
    Yuan, Bo
    View/Open
    177221_Yuan_rpi_0185N_10833.pdf (2.567Mb)
    Other Contributors
    Szymanśki, Bolesław; Adali, Sibel; Magdon-Ismail, Malik;
    Date Issued
    2016-05
    Subject
    Computer science
    Degree
    MS;
    Terms of Use
    This electronic version is a licensed copy owned by Rensselaer Polytechnic Institute, Troy, NY. Copyright of original work retained by author.;
    Metadata
    Show full item record
    URI
    https://hdl.handle.net/20.500.13015/1661
    Abstract
    Sentiment Analysis and Opinion Mining has become a research hot-spot with the rapid development of social network websites.Twitter is a typical social network application with millions of users expressing their sentiment every day. In this work, we explored comprehensively the methodologies applied in sentiment classification over Twitter data: lexicon-based, rule-based and machine learning-based methods. Our data-set is crawled and manually cleaned with the principle of Naturally Annotated Big Data. The data-set contains 20,000 tweets ranging over ten popular topics.; For lexicon-based methods, we experimented with the Simple Word Count approach and Feature Scoring approach using most popular sentiment lexicons and semantic resources, namely MPQA subjectivity lexicon, SentiWordNet, Vader Sentiment Lexicon, Bing Liu's lexicon and General Inquirer. We built customized sentiment lexicons, designed featuring scores and compared ten classifiers on real-world Twitter data. Further, we designed Lingusitic Inference Rules(LIR) to improve lexicon-based classifiers. LIR aims to handle negation, valence shift and contrast conjunctions in natural language. For machine learning-based methods, we used state-of-the-art supervised learning models: Naive Bayes, Maximum Entropy and Support Vector Machines. Two sets of features are compared. The first set of features is Bag-of-Words with N-Gram. The second set of features is Part-of-Speech linguistic annotation.;
    Description
    May 2016; School of Science
    Department
    Dept. of Computer Science;
    Publisher
    Rensselaer Polytechnic Institute, Troy, NY
    Relationships
    Rensselaer Theses and Dissertations Online Collection;
    Access
    Restricted to current Rensselaer faculty, staff and students. Access inquiries may be directed to the Rensselaer Libraries.;
    Collections
    • RPI Theses Online (Complete)

    Browse

    All of DSpace@RPICommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

    My Account

    Login

    DSpace software copyright © 2002-2022  DuraSpace
    Contact Us | Send Feedback
    DSpace Express is a service operated by 
    Atmire NV