Resource-aware distributed analytics and machine learning for hybrid edge-cloud systems

Das, Anirban
Thumbnail Image
Other Contributors
Zaki, Mohammed J., 1971-
Varela, Carlos A.
Brunschwiler, Thomas
Patterson, Stacy
Issue Date
Computer science
Terms of Use
This electronic version is a licensed copy owned by Rensselaer Polytechnic Institute (RPI), Troy, NY. Copyright of original work retained by author.
Full Citation
With more intelligent applications, data analytics and inference at the edge are proliferating as a complement to traditional computation done at a centralized cloud location. At the same time, distributed machine learning training at the edge of the network near the data producers is also gaining popularity, mainly due to benefits to security, privacy, and communication costs. However, edge devices are often resource-constrained, and further, there may be communication bottlenecks between the edge and the cloud. Successful solutions for these edge computing workloads must address challenges posed by constrained computation and communication resources. The first part of the thesis focuses on scheduling and task placement of data processing, analytics, and inference workloads. The goal is to provide some quality of service, for example, low latency or cost reduction in the context of edge-cloud architectures. We start with benchmarking the leading industry edge computing platforms that use the serverless computing paradigm as the medium of execution. We next consider serverless applications, consisting of a single-stage, and propose a framework to jointly execute such applications in the presence of an edge device and the public cloud. The aim is to decide whether to execute user jobs at the edge or the public cloud based on given latency or cost constraints. Finally, we consider a hybrid cloud scenario, where we consider a private cloud instead of a single edge device. Here, we study the problem of task placement and scheduling of multi-stage serverless applications between a private and the public cloud to minimize the cost of public cloud usage. In the second part of the thesis, we consider machine learning training workloads in edge-cloud platforms. More specifically, we study federated learning in this part of the thesis. Like the first part of the thesis, we first conduct a feasibility study of federated learning algorithms on resource-constrained devices. Next, we study an algorithm for horizontal federated learning in a hierarchical communication network. We analyze the convergence of the algorithm when there is a non-IID data distribution among the participants. Our analysis shows that the non-IID data distribution can have a significant impact on the algorithm convergence error. This insight paves the way for a more sophisticated algorithm design to diminish this performance gap. We then turn our focus towards vertical federated learning in a hierarchical network. We propose a new algorithm for model training where data is vertically partitioned across silos in the top tier and horizontally partitioned in the bottom tier among clients inside each silo. We present a theoretical analysis of our algorithm and show the dependence of the convergence rate on the number of vertical partitions, the number of local updates, and the number of clients in each hub. Lastly, we close with the summary and discussions on the future research directions and open questions of interest.
December 2021
School of Science
Dept. of Computer Science
Rensselaer Polytechnic Institute, Troy, NY
Rensselaer Theses and Dissertations Online Collection
Restricted to current Rensselaer faculty, staff and students in accordance with the Rensselaer Standard license. Access inquiries may be directed to the Rensselaer Libraries.