Author
McGlohon, Neil
Other Contributors
Carothers, Christopher D.; Shephard, M. S. (Mark S.); Slota, George M.; Ross, Robert B., 1972-;
Date Issued
2021-08
Subject
Computer science
Degree
PhD;
Terms of Use
This electronic version is a licensed copy owned by Rensselaer Polytechnic Institute (RPI), Troy, NY. Copyright of original work retained by author.;
Abstract
When designing the architecture for a supercomputer, there are many facets of design to consider. Among them, and possibly most important, is the choice of communication network that interconnects the thousands of processors together. The selected communication network forms the backbone of the system, allowing for massive scale parallelism and inter-process coordination. Different patterns of interconnection, or network topologies, have different strengths and weaknesses. The cost of building a supercomputer is often a significant factor that influences the choice of network topology. Prospective system builders aim to get the highest level of expected performance given their budget and poor or ill-informed choices in system design can be very costly mistakes. Thus, having reliable predictions of how different communication network topologies behave is a critical step in system acquisition. Simulation allows for the rapid testing and procurement of expected performance metrics of full-scale networked systems without needing to physically build them or compromise testing scale. Network topologies connect switches and any attached compute nodes to each other forming paths of communication from one endpoint to another. As compute nodes inject traffic into the network, packets containing the contents of communication will be routed from one switch to another until finally reaching their destination. With increased levels of traffic, switches may become overburdened, receiving more packets at a rate faster than what they are able to process and route. This imbalance will mean that any packets traversing the overloaded switch will become delayed as they sit in a queue in the switch's memory waiting to be routed. A switch becoming overloaded is a point of local congestion. Eventually, if the situation remains unresolved, the buffer space on the switch will become full and cannot receive any more packets until another already in its memory is routed away. If other switches have packets destined for the overloaded and full switch, then they may find themselves waiting to forward packets and, consequently, their buffer space begins to fill up. The local congestion previously found on a single switch begins to spread to other nearby switches and the problem worsens, interfering with many more packets and resulting in poor application performance. Network topologies can be designed to be more resilient to the effects of network congestion. For example, having a high diversity of possible paths between any two endpoints can provide more alternative routes for packets should one become congested. Clever routing schemes to more effectively balance load across the network or to route around observed points of local congestion can work to mitigate the effects of congestion and thus minimize packet interference. In this work, I look to study, through effective simulation, situations for the occurrence of congestion as well as technologies and methods to mitigate and resolve it. This document provides an overview of various methods for the avoidance of congestion, including the usage of adaptive routing, quality-of-service techniques, and network topology design. Additionally, it also explores two techniques for the mitigation and treatment of congestion through detection, causal identification, and abatement strategies. Lastly, this document proposes new techniques for effectively simulating large parallel discrete event simulations with simultaneous events -- such as network simulation -- and demonstrates how it can be used to gain deeper insight into the characteristics of simulated models.;
Description
August 2021; School of Science
Department
Dept. of Computer Science;
Publisher
Rensselaer Polytechnic Institute, Troy, NY
Relationships
Rensselaer Theses and Dissertations Online Collection;
Access
Restricted to current Rensselaer faculty, staff and students in accordance with the
Rensselaer Standard license. Access inquiries may be directed to the Rensselaer Libraries.;