Author
Barlett, Kevin W.
Other Contributors
Szymanśki, Bolesław;
Date Issued
2008-08
Subject
Computer science
Degree
MS;
Terms of Use
This electronic version is a licensed copy owned by Rensselaer Polytechnic Institute, Troy, NY. Copyright of original work retained by author.;
Abstract
As the amount of information stored as text grows, being able to efficiently organize and sort the available information becomes crucial. Supervised learning methods provide a solution for some of these tasks, however they rely upon human interaction for the initial classification of texts. Unsupervised learning methods allow not only for a solution to some of these tasks, but by removing the human component increases the speed and efficiency of these methods, thereby decreasing the barriers for application of these methods. This work explores the similar structure taken by unsupervised learning methods when applied to text mining problems, followed by a brief overview of the four components: feature extraction, feature selection, clustering, and cluster evaluation. This framework is then applied to a problem involving author separation, where many excerpts of literary works are presented with the task of dividing the excerpts into groupings corresponding with individual authors. The applicability of various learning methods are then considered based upon their relative performance on the given task.;
Description
August 2008; School of Science
Department
Dept. of Computer Science;
Publisher
Rensselaer Polytechnic Institute, Troy, NY
Relationships
Rensselaer Theses and Dissertations Online Collection;
Access
Users may download and share copies with attribution in accordance with a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 License. No commercial use or derivatives are permitted without the explicit approval of the author.;