|
[ Publications ]
[ Research Opportunities ]
[ Partners & Supporters ]
[ Earlier Work ]
|
|
Memory System Performance in a NUMA Multicore
Multiprocessor
|
| Zoltan Majo,
Thomas R. Gross,
Memory System Performance in a NUMA Multicore
Multiprocessor, Proceedings of SYSTOR'11, ACM, May 2011.
[SYSTOR_2011.pdf
SYSTOR_2011.ps]
|
| Modern multicore processors with an on-chip memory
controller form the base for NUMA (non-uniform memory architecture)
multiprocessors. Each processor accesses part of the physical memory
directly and has access to the other parts via the memory controller
of other processors. These other processors are reached via the
cross-processor interconnect. As a consequence a processor's memory
controller must satisfy two kinds of requests: those that are
generated by the local cores and those that arrive via the
interconnect from other processors. On the other hand, a core
(respectively the core's cache) can obtain data from multiple sources:
data can be supplied by the local memory controller or by a remote
memory controller on another processor. In this paper we
experimentally analyze the behavior of the memory controllers of a
commercial multicore processor, the Intel Xeon 5520 (Nehalem). We
develop a simple model to characterize the sharing of local and remote
memory bandwidth. The uneven treatment of local and remote accesses
has implications for mapping applications onto such a NUMA multicore
multiprocessor. Maximizing data locality does not always minimize
execution time; it may be more advantageous to allocate data on a
remote processor (and then to fetch these data via the cross-processor
interconnect) than to store the data of all processes in local memory
(and consequently overloading the on-chip memory
controller). |
|
[ Publications ]
[ Research Opportunities ]
[ Partners & Supporters ]
[ Earlier Work ]
|
|