Extreme-scale neuromorphic architecture modeling using massively parallel optimistic simulation

Authors
Plagge, Mark Philip
ORCID
Loading...
Thumbnail Image
Other Contributors
Carothers, Christopher D.
Hendler, James A.
Rakheja, Shaloo
Slota, George M.
Issue Date
2020-12
Keywords
Computer science
Degree
PhD
Terms of Use
Attribution-NonCommercial-NoDerivs 3.0 United States
This electronic version is a licensed copy owned by Rensselaer Polytechnic Institute, Troy, NY. Copyright of original work retained by author.
Full Citation
Abstract
The cumulative work presented here shows an end-to-end spike-accurate neuromorphic hardware simulation system. This system is massively scalable, high performance, and able to provide performance metrics, virtual network communication patterns, and multi-model simulations. To help design multi-user neuromorphic acceleration systems, this simulation system is able to run multiple virtual models, managed by a neuromorphic process scheduler. Together, this system could provide hardware designers, HPC architects, and more the ability to simulate novel neuromorphic hardware running in multi-user and multi-process environments as a fist-step towards neuromorphic hardware/software co-design.
Exascale computing is rapidly becoming the new normal in the \ac{HPC} world, bringing along with it major architecture changes. Cutting-edge HPC designs are eschewing the traditional small but many node configurations for fewer, more powerful, compute nodes. Modern supercomputer designs integrate powerful CPUs often coupled with dedicated acceleration hardware and high-speed local storage. The addition of specialized hardware designed to offload specific types of computation is fast becoming a standard.
As an example of the dramatic increase in specialized compute hardware in \ac{HPC} systems, the Summit supercomputer system which was recently installed at Oak Ridge National Laboratory features 4,608 compute nodes, each capable of 42 teraflops. Summit is the replacement for the Titan supercomputer, which used 18,688 compute nodes each with 42 teraflops. Each node in the Summit supercomputer has NVidia Volta GPU boards along with a much more powerful CPU. These acceleration devices, coupled with the more powerful CPU, are the primary driving force behind the dramatic increase in per-node compute performance.
Given these trends in HPC design as well as the increasing prevalence of deep learning techniques in scientific computing, there is a significant interest in adding machine learning acceleration hardware to new systems. One promising machine learning hardware technology implements spiking neural networks directly in hardware - bypassing the ``von-Neumann bottleneck'', and provides significant improvements over GPU acceleration for power consumption. This specialized hardware called ``Neuromorphic Computing'' or ``Neuromorphic Hardware'', has the potential to dramatically increase machine learning performance in the HPC domain while affecting power requirements minimally. In this work, we attempt to address this new field through the analysis of neuromorphic hardware model simulations. We have developed a high-resolution neuromorphic hardware simulation model using the ROSS parallel discrete-event simulation framework, ``NeMo''. We demonstrate this model's performance capability, showing it capable of achieving high-performance simulations at massive scales. Using known biological neuron behaviors, we show \nemo's ability to replicate known spiking neuron models accurately. This work further leverages the \nemo simulation model to generate synthetic multi-processor communication traffic, by creating network communication traces of simulated multi-neuromorphic processor systems. These techniques provide the capability to generate traces that can be used to find the impact of neuromorphic hardware on HPC networks.
Further leveraging the NeMo neuromorphic simulation tool, this work explores the potential of a self-contained neuromorphic hardware acceleration system. We present a spiking neuron implementation of two scheduling algorithms, the First Come First Serve and Round Robin schedulers running in a neuromorphic arbiter. This arbiter is shown to correctly process incoming jobs, scheduling them to run on available simulated neuromorphic hardware. This implementation represents a first step towards a complete embedded operating system running entirely on neuromorphic hardware.
Description
December 2020
School of Science
Department
Dept. of Computer Science
Publisher
Rensselaer Polytechnic Institute, Troy, NY
Relationships
Rensselaer Theses and Dissertations Online Collection
Access
CC BY-NC-ND. Users may download and share copies with attribution in accordance with a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 License. No commercial use or derivatives are permitted without the explicit approval of the author.