From: Michael Siemon on
The scattering of benchmarks Apple has advertised about the new 12-core
MacPro (entry model, at 2.93 GHz, for $5000), vs. the previous
generation 8-core top-of-the-line (also at 2.93 GHz) are about 30%-50%
faster. If these benchmarks in fact make good use of multiple cores,
then that suggests that the main advantage of the new systems derives
purely from the additional cores -- though I suspect there is likely to
be some conflation of architectural enhancements along with
not-really-optimal use of the extra cores.

I also see that Geekbench scores for the current 2.66 GHz 8-core system
run barely (less than 5%) behind the 2.93 GHz scores. This suggests that
buying a 2.66 GHz refursbished model as they come available is going to
give one a system reasonably close to (and at least $1000 less than) the
$5K 12-core system.

Comments? Is this line of "reasoning" utterly bogus, or does it merit
some serious thought?
From: Kevin McMurtrie on
In article <mlsiemon-95A68A.16565330072010(a)news.individual.net>,
Michael Siemon <mlsiemon(a)sonic.net> wrote:

> The scattering of benchmarks Apple has advertised about the new 12-core
> MacPro (entry model, at 2.93 GHz, for $5000), vs. the previous
> generation 8-core top-of-the-line (also at 2.93 GHz) are about 30%-50%
> faster. If these benchmarks in fact make good use of multiple cores,
> then that suggests that the main advantage of the new systems derives
> purely from the additional cores -- though I suspect there is likely to
> be some conflation of architectural enhancements along with
> not-really-optimal use of the extra cores.
>
> I also see that Geekbench scores for the current 2.66 GHz 8-core system
> run barely (less than 5%) behind the 2.93 GHz scores. This suggests that
> buying a 2.66 GHz refursbished model as they come available is going to
> give one a system reasonably close to (and at least $1000 less than) the
> $5K 12-core system.
>
> Comments? Is this line of "reasoning" utterly bogus, or does it merit
> some serious thought?

Adding more cores doesn't produce a linear speed increase because shared
resources become a bottleneck. At 8+ cores, you start seeing severe
bottlenecks on RAM and CPU cache synchronization.
--
I won't see Google Groups replies because I must filter them as spam
From: thepixelfreak on
On 2010-07-30 20:37:13 -0700, Kevin McMurtrie <mcmurtrie(a)pixelmemory.us> said:

> In article <mlsiemon-95A68A.16565330072010(a)news.individual.net>,
> Michael Siemon <mlsiemon(a)sonic.net> wrote:
>
>> The scattering of benchmarks Apple has advertised about the new 12-core
>> MacPro (entry model, at 2.93 GHz, for $5000), vs. the previous
>> generation 8-core top-of-the-line (also at 2.93 GHz) are about 30%-50%
>> faster. If these benchmarks in fact make good use of multiple cores,
>> then that suggests that the main advantage of the new systems derives
>> purely from the additional cores -- though I suspect there is likely to
>> be some conflation of architectural enhancements along with
>> not-really-optimal use of the extra cores.
>>
>> I also see that Geekbench scores for the current 2.66 GHz 8-core system
>> run barely (less than 5%) behind the 2.93 GHz scores. This suggests that
>> buying a 2.66 GHz refursbished model as they come available is going to
>> give one a system reasonably close to (and at least $1000 less than) the
>> $5K 12-core system.
>>
>> Comments? Is this line of "reasoning" utterly bogus, or does it merit
>> some serious thought?
>
> Adding more cores doesn't produce a linear speed increase because shared
> resources become a bottleneck. At 8+ cores, you start seeing severe
> bottlenecks on RAM and CPU cache synchronization.

Maybe in the Apple implementation. I trust you're not making a
generalization about all multi-core systems.

--

thepixelfreak

From: JF Mezei on
Kevin McMurtrie wrote:

> Adding more cores doesn't produce a linear speed increase because shared
> resources become a bottleneck. At 8+ cores, you start seeing severe
> bottlenecks on RAM and CPU cache synchronization.

This was the case for the 8086 prior to CSI/Quickpath interconnect (aka:
Nehalem).

The Quickpath interconnect (started as "Common System Interconnect or
CSI back in 2004 when those plans were announced) gets its heritage from
Alpha super computers that had had 2 generations of NUMA (non uniform
memory access) with the second one (EV7) having made great strides in
approaching linear performance increases with each CPU added. Alphas
could scale to 64 CPUs.

Remember that when Compaq/HP murdered Alpha on June 25 2001, Intel got
not only many of the engineers (they were traded as part of the deal),
but also free access to all of the Alpha IP, including the memory subsystem.

AMD also inherited some of the Alpha engineers and was first to market
with an x86 chip that had better memory subsystem, but it lacked some
features, such as shared cache for cores on the same chip (which IBM
already had with its Power architecture).

Quickpath not only has shared cache between cores, but also very fast
and multiple paths to memory, to allow each CORE/CPU to get its memory
contents very fast with greatly reduced bottlenecks.

HP Itanic "mainframes" are now blade servers with the blade interconnect
having quipack in the backpane. So different blades in the same
enclosure share memory via Quickpath and at Quickpath speeds. This is
essentially a new way to build a "mainframe" class machine out of
commodity blades.

The fact that industry standard builders have not yet scaled the 8086
servers to 32 or 64 CPUs probably has more to do with market needs than
with technical capabilities of Quickpath. But there are specialized
servers out there with a lot of CPUs in them.

From: thepixelfreak on
On 2010-08-06 12:38:11 -0700, JF Mezei <jfmezei.spamnot(a)vaxination.ca> said:

> This was the case for the 8086 prior to CSI/Quickpath interconnect (aka:
> Nehalem).
>
> The Quickpath interconnect (started as "Common System Interconnect or
> CSI back in 2004 when those plans were announced) gets its heritage from
> Alpha super computers that had had 2 generations of NUMA (non uniform
> memory access) with the second one (EV7) having made great strides in
> approaching linear performance increases with each CPU added. Alphas
> could scale to 64 CPUs.

SGI have been doing this with Numalink (ccNUMA) for years. In fact we
scale all the way up to 1024 discrete processors with Itanium as a
single OS instance and now we're doing the very same with Nehalem and
QPI scaling all the way to 2048 cores in a single OS instance.

--

thepixelfreak