From: =?ISO-8859-1?Q?Niels_J=F8rgen_Kruse?= on
<http://www.redbooks.ibm.com/redpieces/pdfs/sg247833.pdf>

Rather impressive. It is not many years ago that Z-systems lagged POWER
badly.

Briefly:

5.2 GHz
OoO, hints at POWER4 like 5 instruction grouping
up to 3 instructions decode per clock
up to 5 instruction dispatch to functional units per clock
128 KB L1D, 64 KB L1I
1.5 MB L2
24 MB L3 per 4 cores
up to 768 MB L4
256 byte line sizes at all levels.

--
Mvh./Regards, Niels J�rgen Kruse, Vanl�se, Denmark
From: Andy Glew "newsgroup at on
On 7/27/2010 6:16 AM, Niels J�rgen Kruse wrote:
> <http://www.redbooks.ibm.com/redpieces/pdfs/sg247833.pdf>
>
> Rather impressive. It is not many years ago that Z-systems lagged POWER
> badly.
>
> Briefly:
>
> 5.2 GHz
> OoO, hints at POWER4 like 5 instruction grouping
> up to 3 instructions decode per clock
> up to 5 instruction dispatch to functional units per clock
> 128 KB L1D, 64 KB L1I
> 1.5 MB L2
> 24 MB L3 per 4 cores
> up to 768 MB L4
> 256 byte line sizes at all levels.

256 *BYTE*?

2048 bits?

Line sizes 4X the typical 64B line size of x86?

These aren't cache lines. They are disk blocks.

Won't make Robert Myers happy.

From: Terje Mathisen "terje.mathisen at on
Andy Glew wrote:
> On 7/27/2010 6:16 AM, Niels J�rgen Kruse wrote:
>> 24 MB L3 per 4 cores
>> up to 768 MB L4
>> 256 byte line sizes at all levels.
>
> 256 *BYTE*?

Yes, that one rather screamed at me as well.
>
> 2048 bits?
>
> Line sizes 4X the typical 64B line size of x86?
>
> These aren't cache lines. They are disk blocks.

Yes. So what?

I (and Nick, and you afair) have talked for years about how current CPUs
are just like mainframes of old:

new old
DISK -> TAPE : Sequential access only
RAM -> DISK : HW-controlled, block-based transfer
CACHE -> RAM : Actual random access, but blocks are still faster

>
> Won't make Robert Myers happy.
>
768 MB of L4 means your problem size is limited to a little less than
that, otherwise random access is out.

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"
From: =?ISO-8859-1?Q?Niels_J=F8rgen_Kruse?= on
Terje Mathisen <"terje.mathisen at tmsw.no"> wrote:

> Andy Glew wrote:
> > On 7/27/2010 6:16 AM, Niels J�rgen Kruse wrote:
> >> 24 MB L3 per 4 cores
> >> up to 768 MB L4
> >> 256 byte line sizes at all levels.
> >
> > 256 *BYTE*?
>
> Yes, that one rather screamed at me as well.

Another surprising thing I spotted browsing through the redbook, is the
claim of single cycle L1D access. That must be array access only, so
there are at least address generation and format cycles before and
after. Still, 3 cycle loads from a 128 KB L1D at 5.2 GHz must show up on
the power budget.

--
Mvh./Regards, Niels J�rgen Kruse, Vanl�se, Denmark
From: Robert Myers on
On Jul 27, 9:25 am, Andy Glew <"newsgroup at comp-arch.net"> wrote:
> On 7/27/2010 6:16 AM, Niels Jørgen Kruse wrote:
>
> > <http://www.redbooks.ibm.com/redpieces/pdfs/sg247833.pdf>
>
> > Rather impressive. It is not many years ago that Z-systems lagged POWER
> > badly.
>
> > Briefly:
>
> > 5.2 GHz
> > OoO, hints at POWER4 like 5 instruction grouping
> > up to 3 instructions decode per clock
> > up to 5 instruction dispatch to functional units per clock
> > 128 KB L1D, 64 KB L1I
> > 1.5 MB L2
> > 24 MB L3 per 4 cores
> > up to 768 MB L4
> > 256 byte line sizes at all levels.
>
> 256 *BYTE*?
>
> 2048 bits?
>
> Line sizes 4X the typical 64B line size of x86?
>
> These aren't cache lines.  They are disk blocks.
>
> Won't make Robert Myers happy.

So much for my hopes for Blue Waters--unless IBM has some other tricks
up its sleeve, which wouldn't surprise me.

Robert.