Application of unsupervised learning methods to an author separation task
Author
Barlett, Kevin W.Other Contributors
Szymanśki, Bolesław;Date Issued
2008-08Subject
Computer scienceDegree
MS;Terms of Use
This electronic version is a licensed copy owned by Rensselaer Polytechnic Institute, Troy, NY. Copyright of original work retained by author.; Attribution-NonCommercial-NoDerivs 3.0 United StatesMetadata
Show full item recordAbstract
As the amount of information stored as text grows, being able to efficiently organize and sort the available information becomes crucial. Supervised learning methods provide a solution for some of these tasks, however they rely upon human interaction for the initial classification of texts. Unsupervised learning methods allow not only for a solution to some of these tasks, but by removing the human component increases the speed and efficiency of these methods, thereby decreasing the barriers for application of these methods. This work explores the similar structure taken by unsupervised learning methods when applied to text mining problems, followed by a brief overview of the four components: feature extraction, feature selection, clustering, and cluster evaluation. This framework is then applied to a problem involving author separation, where many excerpts of literary works are presented with the task of dividing the excerpts into groupings corresponding with individual authors. The applicability of various learning methods are then considered based upon their relative performance on the given task.;Description
August 2008; School of ScienceDepartment
Dept. of Computer Science;Publisher
Rensselaer Polytechnic Institute, Troy, NYRelationships
Rensselaer Theses and Dissertations Online Collection;Access
CC BY-NC-ND. Users may download and share copies with attribution in accordance with a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 License. No commercial use or derivatives are permitted without the explicit approval of the author.;Collections
Except where otherwise noted, this item's license is described as CC BY-NC-ND. Users may download and share copies with attribution in accordance with a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 License. No commercial use or derivatives are permitted without the explicit approval of the author.