NotesFAQContact Us
Search Tips
ERIC Number: ED526664
Record Type: Non-Journal
Publication Date: 2009
Pages: 121
Abstractor: As Provided
Reference Count: 0
ISBN: ISBN-978-1-1097-4344-9
Efficient High Performance Collective Communication for Distributed Memory Environments
Ali, Qasim
ProQuest LLC, Ph.D. Dissertation, Purdue University
Collective communication allows efficient communication and synchronization among a collection of processes, unlike point-to-point communication that only involves a pair of communicating processes. Achieving high performance for both kernels and full-scale applications running on a distributed memory system requires an efficient implementation of collective communication operations. Developing an efficient implementation requires attention to both algorithmic and hardware issues. This dissertation proposes and describes the implementation of collective communication algorithms that are both novel and extremely efficient. These algorithms target distributed memory machines: both clusters (with nodes that are either SMPs or uniprocessors) and accelerator-based machines (e.g., IBM's Cell processor, which is used as the accelerator core in IBM's Roadrunner, the world's fastest supercomputer). For the cluster of workstations environment, it also proposes efficient asynchronous and concurrent collective operations a generalized reduction algorithm and parallel reductions. For the Cell processor, this dissertation describes the implementation of very fast barrier synchronization, "broadcast", "all-gather", "reduce" and "all-reduce" collectives which work both on single and dual Cell machines. These collectives take into account the impacts of both concurrency and data traffic on the on-chip and off-chip interconnects. The implementations for both a cluster of workstations and the Cell processor achieve performance that is superior to the previous published state-of-the-art. This dissertation also presents and validates performance models for a variety of high-performance collective communication algorithms for systems with Cell processors. The models extend the PLogP model, a well-known point-to-point performance model, by accounting for the unique hardware characteristics of the Cell (e.g., heterogeneous interconnects and DMA engines) and by applying the model to collective communication. Finally, the dissertation presents experimental results validating our algorithm designs and the effectiveness of our models. [The dissertation citations contained here are published with the permission of ProQuest LLC. Further reproduction is prohibited without permission. Copies of dissertations may be obtained by Telephone (800) 1-800-521-0600. Web page:]
ProQuest LLC. 789 East Eisenhower Parkway, P.O. Box 1346, Ann Arbor, MI 48106. Tel: 800-521-0600; Web site:
Publication Type: Dissertations/Theses - Doctoral Dissertations
Education Level: N/A
Audience: N/A
Language: English
Sponsor: N/A
Authoring Institution: N/A