Why does Intel favor thin rectangular CPUs? [Computer Architecture]

Prev: 128316 Computer Knowledge, Free and alwqays Up to Date 59
Next: Fwd: Different stacks for return addresses and data?

From: Robert Myers on 2 Mar 2010 14:48

On Mar 2, 2:12 pm, Stephen Fuld <SF...(a)alumni.cmu.edu.invalid> wrote:
> On 3/2/2010 9:56 AM, Robert Myers wrote:
>
>
>
> Well, some of the details of Mitch's proposal aren't clearly specified.
> I can't tell if he intends the off chip DRAM to be part of the
> processor's address space or not. If it is, then the on-chip DRAM is
> essentially a level 4 cache. But it could be that it isn't, in which
> case, it is more like a fast paging device with some extra features.
>
Help me out here. Isn't the page file part of the processor's address
space? When something is paged out, you don't have to worry about
coherence because you can't touch it.

Maybe there is some genius-level subtlety I'm missing.

Robert.

From: Stephen Fuld on 2 Mar 2010 16:30

On 3/2/2010 11:48 AM, Robert Myers wrote:
> On Mar 2, 2:12 pm, Stephen Fuld<SF...(a)alumni.cmu.edu.invalid> wrote:
>> On 3/2/2010 9:56 AM, Robert Myers wrote:
>>
>>
>>
>> Well, some of the details of Mitch's proposal aren't clearly specified.
>> I can't tell if he intends the off chip DRAM to be part of the
>> processor's address space or not. If it is, then the on-chip DRAM is
>> essentially a level 4 cache. But it could be that it isn't, in which
>> case, it is more like a fast paging device with some extra features.
>>
> Help me out here. Isn't the page file part of the processor's address
> space?

No. In most systems, the process of bringing in a page to main memory
from the page file, causes the page to be mapped into the processor
address space (I know, Del, not true of AS/400) That is, the page tables
are updated. If this were not the case, you couldn't have multiple
programs all running whose aggregate size totals more than the processor
address space. So on a 32 bit processor, you couldn't have more than 4
GB of programs. But, of course you can do this, as the pages that
aren't active are in the page file and don't take up processor address
space.

> When something is paged out, you don't have to worry about
> coherence because you can't touch it.

That's true, but not the point.

> Maybe there is some genius-level subtlety I'm missing.

That's too easy, so I won't respond. :-)

--
- Stephen Fuld
(e-mail address disguised to prevent spam)

From: Robert Myers on 2 Mar 2010 16:50

On Mar 2, 4:30 pm, Stephen Fuld <SF...(a)alumni.cmu.edu.invalid> wrote:
> On 3/2/2010 11:48 AM, Robert Myers wrote:

> > When something is paged out, you don't have to worry about
> > coherence because you can't touch it.
>
> That's true, but not the point.
>
Sorry, but it does seem like the critical point. If what used to be
the main memory is now nothing but a page file, then only one
processor can have that page in Level 4 cache, or whatever you choose
to call it, just as there can only be one copy of something in main
memory. Processors would have shared access through some kind of NUMA
architecture.

If, on the other hand, the Level 4 cache operates like a cache, then
processors share access through cache snooping.

If there is some other detail that Mitch left out, I'm still missing
it. I will admit that I confused the issue by using the term "Level 4
cache," although I might point out that IBM keeps Level 4 cache in
main memory in some architectures.

Robert.

From: Stephen Fuld on 2 Mar 2010 17:16

On 3/2/2010 1:50 PM, Robert Myers wrote:
> On Mar 2, 4:30 pm, Stephen Fuld<SF...(a)alumni.cmu.edu.invalid> wrote:
>> On 3/2/2010 11:48 AM, Robert Myers wrote:
>
>>> When something is paged out, you don't have to worry about
>>> coherence because you can't touch it.
>>
>> That's true, but not the point.
>>
> Sorry, but it does seem like the critical point. If what used to be
> the main memory is now nothing but a page file, then only one
> processor can have that page in Level 4 cache, or whatever you choose
> to call it, just as there can only be one copy of something in main
> memory.

Yes.

> Processors would have shared access through some kind of NUMA
> architecture.

Mitch's original proposal was multiple cores on a single chip. When you
say multiple processors, are you talking about the multiple cores on the
chip or multiple chips?

> If, on the other hand, the Level 4 cache operates like a cache, then
> processors share access through cache snooping.

If it is a single L4 cache on a chip, then there is no coherence issue
for it on the chip. Multiple chips have the same coherence issues that
current Intel and AMD chips have now with their on chip caches.

Let me try a different explanation. Consider a system with DRAM main
memory, but with the page file is resident on a solid state disk (SSD)
attached via say a SATA port. Except for the "details" of interconnect
speed and I/O, this is entirely analogous, yet no one would call the
DRAM memory of this system a level 4 cache.

Please remember that I am not saying that what Mitch proposed actually
operates this way, but that given the information he has provided, it is
possible.

--
- Stephen Fuld
(e-mail address disguised to prevent spam)

From: Robert Myers on 2 Mar 2010 17:57

On Mar 2, 5:16 pm, Stephen Fuld <SF...(a)alumni.cmu.edu.invalid> wrote:
> On 3/2/2010 1:50 PM, Robert Myers wrote:
>
> > On Mar 2, 4:30 pm, Stephen Fuld<SF...(a)alumni.cmu.edu.invalid> wrote:
> >> On 3/2/2010 11:48 AM, Robert Myers wrote:
>
> >>> When something is paged out, you don't have to worry about
> >>> coherence because you can't touch it.
>
> >> That's true, but not the point.
>
> > Sorry, but it does seem like the critical point. If what used to be
> > the main memory is now nothing but a page file, then only one
> > processor can have that page in Level 4 cache, or whatever you choose
> > to call it, just as there can only be one copy of something in main
> > memory.
>
> Yes.
>
> > Processors would have shared access through some kind of NUMA
> > architecture.
>
> Mitch's original proposal was multiple cores on a single chip. When you
> say multiple processors, are you talking about the multiple cores on the
> chip or multiple chips?
>
> > If, on the other hand, the Level 4 cache operates like a cache, then
> > processors share access through cache snooping.
>
> If it is a single L4 cache on a chip, then there is no coherence issue
> for it on the chip. Multiple chips have the same coherence issues that
> current Intel and AMD chips have now with their on chip caches.
>
> Let me try a different explanation. Consider a system with DRAM main
> memory, but with the page file is resident on a solid state disk (SSD)
> attached via say a SATA port. Except for the "details" of interconnect
> speed and I/O, this is entirely analogous, yet no one would call the
> DRAM memory of this system a level 4 cache.
>
> Please remember that I am not saying that what Mitch proposed actually
> operates this way, but that given the information he has provided, it is
> possible.
>
I have only myself to blame for my cavalier use of language. It
seemed natural to call memory resident on the die "cache." Nothing
would have tempted me to call the main memory in your SSD proposal
cache, although, as I pointed out, IBM has already blurred the lines.
I try to avoid arguments over terminology whenever possible. Never
again will I refer to something that doesn't in every way conform to
your notion of cache as
"cache."

Assuming that the on-chip memory is acting like main memory, then all
cores on a die would have equal access to the main memory resident on
the die (or chip, which could conceivably have multiple dies). If a
core on another chip with it's own distinct memory needed that data it
could only store a copy of that data in Level 3 cache, and you would
have to deal with cache coherence. The situation seems completely
analogous to what you would have with a multiple Nehalem system,
except that the main memory has migrated to the chip and your page
file would reside in motherboard memory (in all likelihood).

Such an arrangement *still* leaves an interconnect bandwidth problem
if there are multiple sockets in the system, as surely there would be
where super-high bandwidth is a requirement. Maybe if you can jam
enough into a single socket, you can justify an optical interconnect
between sockets.

Robert.

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Prev: 128316 Computer Knowledge, Free and alwqays Up to Date 59
Next: Fwd: Different stacks for return addresses and data?