• Login
    View Item 
    •   DSpace@RPI Home
    • Rensselaer Libraries
    • RPI Theses Online (Complete)
    • View Item
    •   DSpace@RPI Home
    • Rensselaer Libraries
    • RPI Theses Online (Complete)
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Exploration of communication interconnection network congestion and methods of mitigation through simulation

    Author
    McGlohon, Neil
    View/Open
    McGlohon_rpi_0185E_11837.pdf (25.19Mb)
    Other Contributors
    Carothers, Christopher D.; Shephard, M. S. (Mark S.); Slota, George M.; Ross, Robert B., 1972-;
    Date Issued
    2021-08
    Subject
    Computer science
    Degree
    PhD;
    Terms of Use
    This electronic version is a licensed copy owned by Rensselaer Polytechnic Institute (RPI), Troy, NY. Copyright of original work retained by author.;
    Metadata
    Show full item record
    URI
    https://hdl.handle.net/20.500.13015/6069
    Abstract
    When designing the architecture for a supercomputer, there are many facets of design to consider. Among them, and possibly most important, is the choice of communication network that interconnects the thousands of processors together. The selected communication network forms the backbone of the system, allowing for massive scale parallelism and inter-process coordination. Different patterns of interconnection, or network topologies, have different strengths and weaknesses. The cost of building a supercomputer is often a significant factor that influences the choice of network topology. Prospective system builders aim to get the highest level of expected performance given their budget and poor or ill-informed choices in system design can be very costly mistakes. Thus, having reliable predictions of how different communication network topologies behave is a critical step in system acquisition. Simulation allows for the rapid testing and procurement of expected performance metrics of full-scale networked systems without needing to physically build them or compromise testing scale. Network topologies connect switches and any attached compute nodes to each other forming paths of communication from one endpoint to another. As compute nodes inject traffic into the network, packets containing the contents of communication will be routed from one switch to another until finally reaching their destination. With increased levels of traffic, switches may become overburdened, receiving more packets at a rate faster than what they are able to process and route. This imbalance will mean that any packets traversing the overloaded switch will become delayed as they sit in a queue in the switch's memory waiting to be routed. A switch becoming overloaded is a point of local congestion. Eventually, if the situation remains unresolved, the buffer space on the switch will become full and cannot receive any more packets until another already in its memory is routed away. If other switches have packets destined for the overloaded and full switch, then they may find themselves waiting to forward packets and, consequently, their buffer space begins to fill up. The local congestion previously found on a single switch begins to spread to other nearby switches and the problem worsens, interfering with many more packets and resulting in poor application performance. Network topologies can be designed to be more resilient to the effects of network congestion. For example, having a high diversity of possible paths between any two endpoints can provide more alternative routes for packets should one become congested. Clever routing schemes to more effectively balance load across the network or to route around observed points of local congestion can work to mitigate the effects of congestion and thus minimize packet interference. In this work, I look to study, through effective simulation, situations for the occurrence of congestion as well as technologies and methods to mitigate and resolve it. This document provides an overview of various methods for the avoidance of congestion, including the usage of adaptive routing, quality-of-service techniques, and network topology design. Additionally, it also explores two techniques for the mitigation and treatment of congestion through detection, causal identification, and abatement strategies. Lastly, this document proposes new techniques for effectively simulating large parallel discrete event simulations with simultaneous events -- such as network simulation -- and demonstrates how it can be used to gain deeper insight into the characteristics of simulated models.;
    Description
    August 2021; School of Science
    Department
    Dept. of Computer Science;
    Publisher
    Rensselaer Polytechnic Institute, Troy, NY
    Relationships
    Rensselaer Theses and Dissertations Online Collection;
    Access
    Restricted to current Rensselaer faculty, staff and students in accordance with the Rensselaer Standard license. Access inquiries may be directed to the Rensselaer Libraries.;
    Collections
    • RPI Theses Online (Complete)

    Browse

    All of DSpace@RPICommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

    My Account

    Login

    DSpace software copyright © 2002-2023  DuraSpace
    Contact Us | Send Feedback
    DSpace Express is a service operated by 
    Atmire NV