Redbook on the new z196 mainframes. [Computer Architecture]

Prev: Effects of Memory Latency and Bandwidth on Supercomputer,Application Performance
Next: Changing the color of objects/primitives only ? (flat shading...) (massive parallel lookup hardware idea...)

From: =?ISO-8859-1?Q?Niels_J=F8rgen_Kruse?= on 27 Jul 2010 09:16

<http://www.redbooks.ibm.com/redpieces/pdfs/sg247833.pdf>

Rather impressive. It is not many years ago that Z-systems lagged POWER
badly.

Briefly:

5.2 GHz
OoO, hints at POWER4 like 5 instruction grouping
up to 3 instructions decode per clock
up to 5 instruction dispatch to functional units per clock
128 KB L1D, 64 KB L1I
1.5 MB L2
24 MB L3 per 4 cores
up to 768 MB L4
256 byte line sizes at all levels.

--
Mvh./Regards, Niels J�rgen Kruse, Vanl�se, Denmark

From: Andy Glew "newsgroup at on 27 Jul 2010 09:25

On 7/27/2010 6:16 AM, Niels J�rgen Kruse wrote:
> <http://www.redbooks.ibm.com/redpieces/pdfs/sg247833.pdf>
>
> Rather impressive. It is not many years ago that Z-systems lagged POWER
> badly.
>
> Briefly:
>
> 5.2 GHz
> OoO, hints at POWER4 like 5 instruction grouping
> up to 3 instructions decode per clock
> up to 5 instruction dispatch to functional units per clock
> 128 KB L1D, 64 KB L1I
> 1.5 MB L2
> 24 MB L3 per 4 cores
> up to 768 MB L4
> 256 byte line sizes at all levels.

256 *BYTE*?

2048 bits?

Line sizes 4X the typical 64B line size of x86?

These aren't cache lines. They are disk blocks.

Won't make Robert Myers happy.

From: Terje Mathisen "terje.mathisen at on 27 Jul 2010 11:08

Andy Glew wrote:
> On 7/27/2010 6:16 AM, Niels J�rgen Kruse wrote:
>> 24 MB L3 per 4 cores
>> up to 768 MB L4
>> 256 byte line sizes at all levels.
>
> 256 *BYTE*?

Yes, that one rather screamed at me as well.
>
> 2048 bits?
>
> Line sizes 4X the typical 64B line size of x86?
>
> These aren't cache lines. They are disk blocks.

Yes. So what?

I (and Nick, and you afair) have talked for years about how current CPUs
are just like mainframes of old:

new old
DISK -> TAPE : Sequential access only
RAM -> DISK : HW-controlled, block-based transfer
CACHE -> RAM : Actual random access, but blocks are still faster

>
> Won't make Robert Myers happy.
>
768 MB of L4 means your problem size is limited to a little less than
that, otherwise random access is out.

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

From: =?ISO-8859-1?Q?Niels_J=F8rgen_Kruse?= on 27 Jul 2010 12:37

Terje Mathisen <"terje.mathisen at tmsw.no"> wrote:

> Andy Glew wrote:
> > On 7/27/2010 6:16 AM, Niels J�rgen Kruse wrote:
> >> 24 MB L3 per 4 cores
> >> up to 768 MB L4
> >> 256 byte line sizes at all levels.
> >
> > 256 *BYTE*?
>
> Yes, that one rather screamed at me as well.

Another surprising thing I spotted browsing through the redbook, is the
claim of single cycle L1D access. That must be array access only, so
there are at least address generation and format cycles before and
after. Still, 3 cycle loads from a 128 KB L1D at 5.2 GHz must show up on
the power budget.

--
Mvh./Regards, Niels J�rgen Kruse, Vanl�se, Denmark

From: Robert Myers on 27 Jul 2010 13:25

On Jul 27, 9:25 am, Andy Glew <"newsgroup at comp-arch.net"> wrote:
> On 7/27/2010 6:16 AM, Niels Jørgen Kruse wrote:
>
> > <http://www.redbooks.ibm.com/redpieces/pdfs/sg247833.pdf>
>
> > Rather impressive. It is not many years ago that Z-systems lagged POWER
> > badly.
>
> > Briefly:
>
> > 5.2 GHz
> > OoO, hints at POWER4 like 5 instruction grouping
> > up to 3 instructions decode per clock
> > up to 5 instruction dispatch to functional units per clock
> > 128 KB L1D, 64 KB L1I
> > 1.5 MB L2
> > 24 MB L3 per 4 cores
> > up to 768 MB L4
> > 256 byte line sizes at all levels.
>
> 256 *BYTE*?
>
> 2048 bits?
>
> Line sizes 4X the typical 64B line size of x86?
>
> These aren't cache lines. They are disk blocks.
>
> Won't make Robert Myers happy.

So much for my hopes for Blue Waters--unless IBM has some other tricks
up its sleeve, which wouldn't surprise me.

Robert.

| Next | Last
Pages: 1 2 3 4 5
Prev: Effects of Memory Latency and Bandwidth on Supercomputer,Application Performance
Next: Changing the color of objects/primitives only ? (flat shading...) (massive parallel lookup hardware idea...)