Unsupervised learning : evaluation, distributed setting, and privacy
dc.rights.license | CC BY-NC-ND. Users may download and share copies with attribution in accordance with a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 License. No commercial use or derivatives are permitted without the explicit approval of the author. | |
dc.contributor | Magdon-Ismail, Malik | |
dc.contributor | Ji, Heng | |
dc.contributor | Xia, Lirong | |
dc.contributor | Mitchell, John E. | |
dc.contributor.author | Tsikhanovich, Maksim | |
dc.date.accessioned | 2021-11-03T08:59:33Z | |
dc.date.available | 2021-11-03T08:59:33Z | |
dc.date.created | 2018-07-27T14:57:11Z | |
dc.date.issued | 2018-05 | |
dc.identifier.uri | https://hdl.handle.net/20.500.13015/2175 | |
dc.description | May 2018 | |
dc.description | School of Science | |
dc.description.abstract | Chapter 1 is an overview of topic modeling as a set of unsupervised learning tasks. We present the Latent Dirichlet Allocation (LDA) model, and show how k-means as well as non- negative matrix factorization (NMF) can also be interpreted as topic models. We present a variety of quantitative and qualitative evaluation techniques that aim to capture different properties of the model. Finally we show how we can leverage evaluation techniques and hyperparameter optimization tools to answer typical parameter selection questions. We hope to facilitate future research on topic modeling by encapsulating each of the above parts as a robust and re-usable set of tools, so that a future researcher can focus on one part at a time. | |
dc.description.abstract | In Chapter 3 we study empirical measures of Distributional Differential Privacy. We want to measure to what extent one participant in a distributed computation can correctly identify the presence of a single document in another participant’s database. We propose a measure based on the p-value of the Kolmogorov-Smirnov two-sample hypothesis test. We compare our measures to existing measures such as Differential Privacy, and use it to evaluate the privacy of our online algorithms. | |
dc.description.abstract | In Chapter 2 we present two algorithms for the data-distributed non-negative matrix fac- torization (NMF) task, and one for the singular value decomposition (SVD). In the offline setting, M parties have already computed NMF models of their local data. Our algorithm ensembles these into a global model by minimizing an upper bound on the reconstruction error for the original data in terms of reconstruction error on the local models. In the on- line setting, the M parties are all participating in a synchronous distributed computation. We present an algorithm that reconstructs the centralized NMF solution exactly if given the same initialization. Finally we present an online SVD algorithm. We compare these algorithms in terms of how well they initialize NMF. | |
dc.language.iso | ENG | |
dc.publisher | Rensselaer Polytechnic Institute, Troy, NY | |
dc.relation.ispartof | Rensselaer Theses and Dissertations Online Collection | |
dc.rights | Attribution-NonCommercial-NoDerivs 3.0 United States | * |
dc.rights.uri | http://creativecommons.org/licenses/by-nc-nd/3.0/us/ | * |
dc.subject | Computer science | |
dc.title | Unsupervised learning : evaluation, distributed setting, and privacy | |
dc.type | Electronic thesis | |
dc.type | Thesis | |
dc.digitool.pid | 178937 | |
dc.digitool.pid | 178938 | |
dc.digitool.pid | 178939 | |
dc.rights.holder | This electronic version is a licensed copy owned by Rensselaer Polytechnic Institute, Troy, NY. Copyright of original work retained by author. | |
dc.description.degree | PhD | |
dc.relation.department | Dept. of Computer Science |
Files in this item
This item appears in the following Collection(s)
-
RPI Theses Online (Complete)
Rensselaer theses from 2006; many restricted to current RPI Students, Faculty and Staff -
RPI Theses Open Access
Rensselaer Theses and Dissertations with Creative Commons Licenses
Except where otherwise noted, this item's license is described as CC BY-NC-ND. Users may download and share copies with attribution in accordance with a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 License. No commercial use or derivatives are permitted without the explicit approval of the author.