• Login
    View Item 
    •   DSpace@RPI Home
    • Rensselaer Libraries
    • RPI Theses Online (Complete)
    • View Item
    •   DSpace@RPI Home
    • Rensselaer Libraries
    • RPI Theses Online (Complete)
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Using parallel simulation for extreme-scale network systems co-design

    Author
    Mubarak, Misbah
    View/Open
    176054_Mubarak_rpi_0185E_10573.pdf (3.733Mb)
    Other Contributors
    Carothers, Christopher D.; Ross, Robert B., 1972-; Shephard, M. S. (Mark S.); Trinkle, Jeffrey C.;
    Date Issued
    2015-05
    Subject
    Computer science
    Degree
    PhD;
    Terms of Use
    This electronic version is a licensed copy owned by Rensselaer Polytechnic Institute, Troy, NY. Copyright of original work retained by author.;
    Metadata
    Show full item record
    URI
    https://hdl.handle.net/20.500.13015/1489
    Abstract
    A high bandwidth, low latency interconnect network is a critical component in the design of future High Performance Computing (HPC) Systems. With a number of network topologies being proposed for future HPC systems, the research community has turned to simulation to find a topology that yields high performance. Among the network topologies available for HPC networks, one emerging class of networks are the low-diameter, low-latency topologies such as the dragonfly that use high-radix routers to yield high bisection bandwidth. Another candidate is the torus network topology that uses multidimensional network links to improve path diversity and exploit locality between nodes. Exploring the design space of these candidate interconnects by using simulation, before building real HPC systems, is critical.; MPI collective communication is a frequently used part of most large-scale scientific applications. In the second part of the thesis, we extend our torus and dragonfly network models to simulate MPI collective communication operations using the optimistic event scheduling capability of ROSS. We also demonstrate that both small- and large-scale dragonfly and torus collective models can execute efficiently on today's massively parallel architectures.; In the first part of the thesis, we present a methodology for the modeling and simulation of very large-scale dragonfly and torus network topologies at a detailed fidelity using the Rensselaer Optimistic Simulation System (ROSS).We evaluate various configurations of a million-node torus network in order to determine the effect of torus dimensionality on network performance using challenging HPC traffic patterns. We also explore a millionnode dragonfly network model and investigate the implications of its configurations on network performance using ROSS. We evaluate the performance of our simulations in order to demonstrate that we are able to execute large-scale network simulations efficiently on today's leadership class supercomputers such as the Blue Gene/Q systems at Argonne Leadership Computing Facility (ALCF) and RPI's Computational Center for Innovation (CCI). We show that our simulations can achieve an event rate of 1.33 billion events/second with a total of 872 billion committed events on the AMOS Blue Gene/Q system. We validate the accuracy of our torus and dragonfly network models using empirical measurements from Blue Gene super computers and simulated results from the cycle accurate simulator 'booksim' respectively.; In the context of HPC system simulations, having an end-to-end simulation tool that can characterize the behavior of large-scale scientific applications on future HPC systems is highly beneficial. The last part of the thesis describes how we have introduced the dragonfly and torus network models as an `interconnect component' of the 'CO-Design of multi-layer Exascale Storage architectures (CODES)' storage and network system simulator, so that the CODES HPC models can make use of these high fidelity networks as their underlying interconnect backbone. Additionally, in order to effectively evaluate the behavior of scientific applications on simulated HPC networks, we have introduced a workload generator component in CODES that uses real scientific application workloads from leadership class HPC systems and uses them as a basis to run the dragonfly and torus network simulations. We also present the performance results of the torus and dragonfly network models using real application's network traces through the CODES network workload generator at a modest scale.;
    Description
    May 2015; School of Science
    Department
    Dept. of Computer Science;
    Publisher
    Rensselaer Polytechnic Institute, Troy, NY
    Relationships
    Rensselaer Theses and Dissertations Online Collection;
    Access
    Restricted to current Rensselaer faculty, staff and students. Access inquiries may be directed to the Rensselaer Libraries.;
    Collections
    • RPI Theses Online (Complete)

    Browse

    All of DSpace@RPICommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

    My Account

    Login

    DSpace software copyright © 2002-2022  DuraSpace
    Contact Us | Send Feedback
    DSpace Express is a service operated by 
    Atmire NV