Evaluating next generation HPC interconnection networks
Loading...
Authors
Wolfe, Noah
Issue Date
2019-05
Type
Electronic thesis
Thesis
Thesis
Language
ENG
Keywords
Computer science
Alternative Title
Abstract
The common theme throughout this thesis is the performance quantification and understanding of application workloads on current and potential future HPC interconnection networks. To achieve this goal, network systems are modeled and analyzed using discrete-event simulation. Unlike traditional cycle-accurate simulations, the discrete-event modeling methodology, and specifically parallel discrete-event modeling, makes it possible to execute large ensembles of simulations and generate a comprehensive set of results necessary to perform exhaustive network design/performance studies in a reasonable amount of time. The discrete-event based network models are used to evaluate and predict performance of large-scale HPC systems of various theoretical configurations under a wide range of workloads including synthetic, CPU applications, and neuromorphic computing applications.
In the second part of this thesis, we perform numerous evaluations analyzing the scaling performance of the simulation framework as well as the performance of these networks at large-scale in response to various workloads and HPC environment conditions. The back-end discrete-event simulator is analyzed showing the effectiveness of the approach in speeding up the simulation run times by running in parallel. The discrete-event based network models are then used to perform a number of studies to predict and quantify performance of the networks. We test the Slim Fly at large-scale under CPU workloads to observe the effect of routing on end time performance. We study the performance benefits of additional rails in the Fat-Tree network by analyzing rail-scaling, job placements, multi-job execution, and increased computational power per compute node. Finally, we test equally provisioned Dragonfly, Fat-Tree and Slim Fly networks under synthetically generated workloads as well as real CPU application and novel neuromorphic application trace workloads to provide a fair comparison across a wide range of traffic workloads. Lastly, the results of the comparisons are summarized and compared with physical system costs in an attempt to provide a single figure of merit in comparing each network's performance as an HPC system interconnect.
In the first part of this thesis, we describe a subset of network topologies chosen for evaluation. The networks are chosen because they are either currently used in a deployed HPC system or they posses characteristics such as a low-diameter that make them a promising option as the interconnection network in a next generation supercomputer. We describe the Fat-Tree network and extensions made to represent pruned multi-rail configurations. Additionally we discuss two approaches to Dragonfly networks selected for comparison that leverage all-to-all connections and 2D grid connectivity within router groups. We also cover a recently proposed theoretical network topology called the Slim Fly. The topology layouts, connectivity and routing algorithms, as well as model validation are discussed to provide a clear picture of each networks theoretical capabilities and simulator accuracy.
In the second part of this thesis, we perform numerous evaluations analyzing the scaling performance of the simulation framework as well as the performance of these networks at large-scale in response to various workloads and HPC environment conditions. The back-end discrete-event simulator is analyzed showing the effectiveness of the approach in speeding up the simulation run times by running in parallel. The discrete-event based network models are then used to perform a number of studies to predict and quantify performance of the networks. We test the Slim Fly at large-scale under CPU workloads to observe the effect of routing on end time performance. We study the performance benefits of additional rails in the Fat-Tree network by analyzing rail-scaling, job placements, multi-job execution, and increased computational power per compute node. Finally, we test equally provisioned Dragonfly, Fat-Tree and Slim Fly networks under synthetically generated workloads as well as real CPU application and novel neuromorphic application trace workloads to provide a fair comparison across a wide range of traffic workloads. Lastly, the results of the comparisons are summarized and compared with physical system costs in an attempt to provide a single figure of merit in comparing each network's performance as an HPC system interconnect.
In the first part of this thesis, we describe a subset of network topologies chosen for evaluation. The networks are chosen because they are either currently used in a deployed HPC system or they posses characteristics such as a low-diameter that make them a promising option as the interconnection network in a next generation supercomputer. We describe the Fat-Tree network and extensions made to represent pruned multi-rail configurations. Additionally we discuss two approaches to Dragonfly networks selected for comparison that leverage all-to-all connections and 2D grid connectivity within router groups. We also cover a recently proposed theoretical network topology called the Slim Fly. The topology layouts, connectivity and routing algorithms, as well as model validation are discussed to provide a clear picture of each networks theoretical capabilities and simulator accuracy.
Description
May 2019
School of Science
School of Science
Full Citation
Publisher
Rensselaer Polytechnic Institute, Troy, NY