dc.rights.license | Restricted to current Rensselaer faculty, staff and students. Access inquiries may be directed to the Rensselaer Libraries. | |
dc.contributor | Wang, Meng | |
dc.contributor | Chow, J. H. (Joe H.), 1951- | |
dc.contributor | Radke, Richard J., 1974- | |
dc.contributor | Xu, Yangyang | |
dc.contributor.author | Wang, Ren | |
dc.date.accessioned | 2021-11-03T09:22:33Z | |
dc.date.available | 2021-11-03T09:22:33Z | |
dc.date.created | 2021-02-22T15:33:49Z | |
dc.date.issued | 2020-08 | |
dc.identifier.uri | https://hdl.handle.net/20.500.13015/2630 | |
dc.description | August 2020 | |
dc.description | School of Engineering | |
dc.description.abstract | With the growth of the amount of data and the extensive application of deep learning models, low-quality data and low-security models become problems for industry and users. For example, some data are noisy, corrupted, partially observed, and maybe even highly quantized. Real information is hidden in these low-quality data and difficult to be mined. Even more disturbingly, models could be poisoned by attackers with some well-designed methods and data. Users will make decisions based on these low-security models. Hence, extracting accurate information from questionable sources is extremely important. We will address the problem from two perspectives: (1) Estimate the ground truth data and the clusters from low-quality measurements (2) Extract the information of attack patterns from model parameters. | |
dc.description.abstract | In the first part, our goal is to extract information from low-quality data (including scenarios of partially observed, partially corrupted, and quantized) by developing data recovery and clustering tools. We first propose a framework that can recover data from noisy, quantized, and erroneous measurements. We then build general frameworks that can handle multi-copy measurements and missing values. The recovery errors are proven to be order-wise optimal and decay when the dimensions increase. Our framework is used on the privacy protection of Phasor Measurement Unit (PMU) data. We then extend the framework to data belonging to the union of subspaces (UoS). The recovery and clustering are achieved simultaneously by solving a nonconvex constrained maximum likelihood problem. We provide the guarantee of both the recovery accuracy and the clustering accuracy. We apply this framework to low-quality image recovery. Under similar assumptions, we further develop a distributed data collection and processing framework that achieves individual user's power usage data privacy and the operator's information accuracy simultaneously. We also extend the recovery framework to higher-order tensors, and show both theoretically and empirically that the theoretical recovery error is better than the existing quantized matrix recovery works and 1-bit tensor recovery works. The new framework performs well on the recommender dataset and low-quality images. At last, we study the robust matrix completion problem for time-varying models. Leveraging the low-rank property and the temporal information of the data, we develop novel methods to recover the original data from partially observed and corrupted measurements. The real video dataset demonstrates the effectiveness of our method. | |
dc.description.abstract | 在第一部分中,我们的目标是通过开发数据恢复和聚类工具从低质量的数据(包括部分丢失、部分损坏和量化的场景)中提取信息。我们首先提出一个框架,可以从含有噪声、量化和错误的测量中恢复数据。然后,我们构建可以处理多拷贝测量和含有缺失数据的通用框架。结果表明,恢复误差是序最优的,并且随着维数的增加而衰减。该框架用于相量测量单元数据的隐私保护。然后我们将框架扩展到属于子空间并集的数据。通过求解一个非凸约束的极大似然问题,同时实现了恢复和聚类。我们为恢复精度和聚类精度提供了保证。我们将此框架应用于低质量图像恢复。在类似的假设下,我们进一步开发了一个分布式数据采集和处理框架,该框架同时实现了个人用户用电数据的隐私性和运营商的信息准确性。我们还将恢复框架扩展到高阶张量,并从理论和经验上证明了理论恢复误差优于现有的量化矩阵恢复和单比特张量恢复。新框架在推荐数据集和低质量图像上都有良好表现。最后,我们研究了时变模型的鲁棒矩阵完备问题。利用数据的低秩性和时间信息,我们开发了一种新的方法来从部分观测和损坏的测量中恢复原始数据。实际的视频数据表明了该方法的有效性。 | |
dc.description.abstract | 在第二部分中,我们考虑了在训练阶段受到攻击的深卷积神经网络。所谓的木马攻击(又称后门攻击)可以在设计的触发模式存在的情况下操纵深卷积神经网络的预测结果。通过研究深卷积神经网络的固有特性,我们提出了一种新的特洛伊网络检测器,在没有数据样本的情况下提取攻击模式信息,进而检测深卷积神经网络是否收到攻击。通过非凸神经元激活最大化方法,检测器可以提取模型的攻击信息。该检测器可以同时检测木马模型、目标类并显示攻击所使用的木马触发器。该检测器在不同的数据集、不同的木马触发器和不同的攻击上进行了验证。 | |
dc.description.abstract | 随着数据量的增长和深度学习模型的广泛应用,低质量的数据和低安全性的模型成为业界和用户面临的问题。例如,有些数据含有噪声、部分损坏、大量丢失,甚至可能是高度量化的。真实信息隐藏在这些低质量的数据中,难以被挖掘。更令人不安的是,很多模型可能会被攻击者用一些精心设计的方法和数据进行攻击。用户将根据这些低安全性模型做出决策。因此,从可疑来源中提取准确的信息是极其重要的。我们将从两个角度来解决这一问题:(1)从低质量测量中估计真实数据和对应类别 (2)从模型参数中提取攻击模式信息。 | |
dc.description.abstract | In the second part, we consider deep convolutional neural networks (CNNs) that are attacked in the training phase. The so-called Trojan attack (aka backdoor attack) can manipulate the predictions of the CNN in the presence of a designed trigger pattern. By exploring the intrinsic property of the CNN, we propose a novel TrojanNet detector (TND) to extract the information of attack patterns with no data samples at all, and then detect whether a CNN is poisoned or not. Through a nonconvex neuron activation maximization approach, the detector can extract the attack information of the model. Specifically, the detector can detect Trojan models, target classes, and reveal Trojan triggers simultaneously. The detector is validated on different datasets, various Trojan triggers, and different attacks. | |
dc.language.iso | ENG | |
dc.publisher | Rensselaer Polytechnic Institute, Troy, NY | |
dc.relation.ispartof | Rensselaer Theses and Dissertations Online Collection | |
dc.subject | Electrical engineering | |
dc.title | Information extraction from low-quality or questionable sources | |
dc.type | Electronic thesis | |
dc.type | Thesis | |
dc.digitool.pid | 180385 | |
dc.digitool.pid | 180386 | |
dc.digitool.pid | 180387 | |
dc.rights.holder | This electronic version is a licensed copy owned by Rensselaer Polytechnic Institute, Troy, NY. Copyright of original work retained by author. | |
dc.description.degree | PhD | |
dc.relation.department | Dept. of Electrical, Computer, and Systems Engineering | |