Author
Raman, Srikumar
Other Contributors
McDonald, John F. (John Francis), 1942-; Zhang, Tong; Schoch, Paul M.; Carothers, Christopher D.;
Date Issued
2015-12
Subject
Electrical engineering
Degree
PhD;
Terms of Use
This electronic version is a licensed copy owned by Rensselaer Polytechnic Institute, Troy, NY. Copyright of original work retained by author.;
Abstract
In the current era, the computing industry has become firmly entrenched in the multi-core approach. Aided by Moore's law and technology scaling, the number of cores on state-of-the art multi-cores has been steadily increasing. While this has aided parallel computing efforts to a large extent, very little has been done to improve serial code performance in the last decade. This research explores the use of a high clock rate core, to accelerate serial sections of the workload, in conjunction with a set of slower cores that operate in parallel that continue to take advantage of the parallelizability of the workload.; Furthermore, as the advantages of a high clock rate core become clear, the challenge remains in finding a suitable technology for the implementation of the core. This research explores the one such alternate technology, namely a lateral heterojunction bipolar transistor, which remains fully compatible with existing CMOS technology. The device is evaluated for suitability in three major criterions – speed, power consumption and process compatibility with CMOS.; This research considers both the architectural organization of a single-ISA performance-asymmetric multi-core system and the technological feasibility of designing and fabricating a high speed compute core. A design space exploration of the architectural organization of a multi-core system is conducted using full system and bare-bones micro-architectural simulators running popular parallel benchmark workloads. The use of a 3D memory subsystem as a means of mitigating the memory wall problem and maintaining cache coherency between the heterogeneous cores is explored. We demonstrate by simulation that a performance asymmetric chip with one high clock rate unit to accelerate serial content, could deliver up to an 80% reduction in execution time of some scientific workloads, over a homogenous system with the same number of cores.;
Description
December 2015; School of Engineering
Department
Dept. of Electrical, Computer, and Systems Engineering;
Publisher
Rensselaer Polytechnic Institute, Troy, NY
Relationships
Rensselaer Theses and Dissertations Online Collection;
Access
Restricted to current Rensselaer faculty, staff and students. Access inquiries may be directed to the Rensselaer Libraries.;