
| From: Lennart Sorensen <lsorense@csclub.uwaterloo.ca> | On Mon, Jul 20, 2015 at 01:48:45PM -0400, Scott Allen wrote: | > Probably not much slower, though. Perhaps just a few percent. This | > test is rather old but the situation is somewhat similar: | > <http://www.tomshardware.com/reviews/Intel-Core-i7-Nehalem,2057-13.html> It's an interesting test. But a bit superficial. DDR memory modules are 64-bits wide (if I remember correctly). So typically, the bottom 3 bits of an address select which byte in a module is being used (this need not be the case). The easy way to interleave memory access between a-power-of-two channels is to use the next low-order address bits as module select. That spreads the load between modules and exploits their parallel operation quite well (because nearby references are common). When you have three channels (as in the baseline for this test), I don't know where you put the third channel in the address space. That's me being lazy: it is surely written up somewhere. But I'd guess that it isn't as useful as the power-of-two case, and the figures seem to support that. It would have been nice to see some synthetic benchmarks in that test. | I remember going from 1 to 2 sticks of ram on a Core 2 Duo machine in | the past made the intel graphics (using shared ram) a lot faster. | So at least on some CPUs it has made a big difference. Perhaps the much-improved memory system of the i series makes a difference? See section 5 of that article. Maybe caches are so big that real memory access is infrequent for many programs. Perhaps page-mode access to RAM captures a lot of the possible advantage of interleaving? In any case, my intuition would have predicted a bigger difference. As always, careful, informed, exploratory measurement trumps intuition when it comes to performance.