Efficient and robust federated learning in heterogeneous networks

Castiglia, Timothy
Thumbnail Image
Other Contributors
Gittens, Alex
Yener, Bulent
Wang, Shiqiang
Patterson, Stacy
Issue Date
Computer science
Terms of Use
This electronic version is a licensed copy owned by Rensselaer Polytechnic Institute (RPI), Troy, NY. Copyright of original work retained by author.
Full Citation
In the modern day, data for machine learning tasks is generated by globally distributed devices. The data are often sensitive, containing private information that must not be shared. Federated learning (FL) algorithms were introduced as a means to learn from distributeddata in a privacy-preserving, communication-efficient manner. However, there are still a number of challenges that must be addressed. The globally distributed systems in which these algorithms run often have high communication latency. The network topology of devices does not always match the assumed topology of FL algorithms. Participants may have various computing speeds due to heterogeneous hardware or compute resources available. Data can be partitioned among parties by feature space, rather than the often assumed partitioning by sample space. Data can often be incomplete, e.g., missing labels and features. Data can also be spurious, containing irrelevant features that distract from the prediction task. In this thesis, we address these challenges to make FL efficient and robust. We first present a federated learning algorithm for multi-level networks: a set of disjoint sub-networks, each with a single hub and multiple workers. In our model, workers may run at different operating rates. We provide a unified mathematical framework and theoretical analysis that show the dependence of the convergence error on the worker node heterogeneity, hub network topology, and the number of local, sub-network, and global iterations. In the experiments, we find that our algorithm can converge up to 2× as fast as other federatedlearning algorithms for multi-level networks. We then present a vertical federated learning (VFL) algorithm for cases when parties store data with the same sample IDs, but different feature sets. This is known as vertically partitioned data. Our algorithm applies compression to intermediate values shared betweenparties. Our work provides the first theoretical analysis of the effect that message compression has on the convergence vertical federated learning algorithms. We experimentally show compression can reduce communication by over 90% without a significant decrease in accuracy compared to vertical federated learning without compression. Next, we present another vertical federated learning algorithm with the aim to accommodate device heterogeneity. We consider a system where the parties’ operating rates, local model architectures, and optimizers may be different from one another and, further, they may change over time. We provide theoretical convergence analysis and show that the convergence rate is constrained by the party operating rates and local optimizer parameters. We apply this analysis and extend our algorithm to adapt party learning rates in responseto changing operating rates and local optimizer parameters. In the experiments, we find that our algorithm can reach target accuracies up to 4× faster than other vertical federated learning algorithms, and that our adaptive extension can further provide an additional 30% improvement in time-to-target accuracy. Next, we present three Self-Supervised Vertical Federated Learning (SS-VFL) algorithms. These algorithms learn from unlabeled and non-overlapping data in a setting with vertically partitioned data. The algorithms apply self-supervised and data imputation methods. We compare our algorithms against supervised VFL in a series of experiments. We show that SS-VFL algorithms can achieve up to twice the accuracy of supervised VFL when labeled data are scarce. We also show that these SS-VFL algorithms can greatly reduce communication cost to reach target accuracies over supervised VFL. Finally, we present a feature selection method for vertical federated learning. Our method removes spurious features from the dataset in order to improve generalization, efficiency, and explainability. Our method requires little communication between parties compared to other VFL feature selection methods. We analytically prove that our method removes spurious features from model training. We provide extensive empirical evidence that our method can achieve high accuracy and remove spurious features at a fraction of the communication cost of other feature selection approaches.
School of Science
Dept. of Computer Science
Rensselaer Polytechnic Institute, Troy, NY
Rensselaer Theses and Dissertations Online Collection
Users may download and share copies with attribution in accordance with a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 license. No commercial use or derivatives are permitted without the explicit approval of the author.