From: Chris Gray on
Terje Mathisen <Terje.Mathisen(a)tmsw.no> writes:

> The usual programming paradigm for such a system is to have many
> threads running the same algorithm, which means that training
> information from one thread is likely to be useful for another, or at
> least not detrimental.

That doesn't mean that the ability to have multiple predictor states
is bad. You need a way for the OS to tell the CPU "thread with key Y
is going to be very similar to thread with key X". That means that
the key Y state should be seeded with the key X state, or that the
two states can be merged into one larger, more detailed state.

I guess it comes down to the question of just how much value is a
more accurate predictor - how many gates can you afford, and is it
worthwhile to need a few extra instructions to initialize it?

--
Experience should guide us, not rule us.

Chris Gray cg(a)GraySage.COM
http://www.Nalug.ORG/ (Lego)
http://www.GraySage.COM/cg/ (Other)
From: Quadibloc on
On Nov 8, 2:00 pm, Terje Mathisen <Terje.Mathi...(a)tmsw.no> wrote:

> The usual programming paradigm for such a system is to have many threads
> running the same algorithm, which means that training information from
> one thread is likely to be useful for another, or at least not detrimental.

Ah, I thought the usual operation of a multicore system is to have an
operating system running multiple different applications at once, and
the operating system itself, so that the system would have more
different applications providing threads than there were cores.

Thus, on a Windows PC, when I look at Task Manager, under the
Processes tab, I usually find more than four things listed there.

Admittedly, if I was using multicore chips in a supercomputer in order
to do massively-parallel number-crunching, I probably _would_ be using
the system as you describe. In fact, even on a PC, if I was playing
certain graphics-intensive computer games, that may well be what I
want. So the situation you describe, even if not "usual", is the one
that applies... the only times when performance is critical.

John Savard
From: Robert Myers on
On Nov 8, 10:58 pm, Quadibloc <jsav...(a)ecn.ab.ca> wrote:

>
> Ah, I thought the usual operation of a multicore system is to have an
> operating system running multiple different applications at once, and
> the operating system itself, so that the system would have more
> different applications providing threads than there were cores.
>
> Thus, on a Windows PC, when I look at Task Manager, under the
> Processes tab, I usually find more than four things listed there.
>
> Admittedly, if I was using multicore chips in a supercomputer in order
> to do massively-parallel number-crunching, I probably _would_ be using
> the system as you describe. In fact, even on a PC, if I was playing
> certain graphics-intensive computer games, that may well be what I
> want. So the situation you describe, even if not "usual", is the one
> that applies... the only times when performance is critical.
>
Windows, VNC, an embedded virtual Linux machine, and Chrome with many
open tabs keep this i7 pretty busy. Add in bloated messengers, and
it's sometimes not enough. Going back to anything less is sort of
depressing, actually. Most of what has been said here and elsewhere
about the uselessness of multiple cores/ multi-threading has been "all
computing is like the computing I'm used to, and it always will be."

Robert.

From: Ken Hagan on
On Sun, 08 Nov 2009 16:52:21 -0000, Quadibloc <jsavard(a)ecn.ab.ca> wrote:

> If one has a multithreaded core, branch predictor information should
> be labelled by thread, so that information gathered about the branches
> in one thread isn't used to control how branches in another thread are
> handled. The branch predictor should not simply ignore the fact that
> multiple different threads are being executed in the core.

I'm still slightly confused, perhaps as much by other people's responses
as by your suggestion.

When we speak of "thread" here, are these CPU hyper-threads or OS threads
(or indeed, some other OS-supplied tag, allowing for groups of
behaviourally similar threads to learn from one another)? Since the
subject came up in the context of multithreaded cores, I presumed the
former, but possibly you were thinking of the latter. If so, would that be
useful even on a single-threaded core?
From: Quadibloc on
On Nov 9, 3:03 am, "Ken Hagan" <K.Ha...(a)thermoteknix.com> wrote:
> On Sun, 08 Nov 2009 16:52:21 -0000, Quadibloc <jsav...(a)ecn.ab.ca> wrote:

> > If one has a multithreaded core, branch predictor information should
> > be labelled by thread, so that information gathered about the branches
> > in one thread isn't used to control how branches in another thread are
> > handled. The branch predictor should not simply ignore the fact that
> > multiple different threads are being executed in the core.

> I'm still slightly confused, perhaps as much by other people's responses  
> as by your suggestion.

> When we speak of "thread" here, are these CPU hyper-threads or OS threads  
> (or indeed, some other OS-supplied tag, allowing for groups of  
> behaviourally similar threads to learn from one another)? Since the  
> subject came up in the context of multithreaded cores, I presumed the  
> former, but possibly you were thinking of the latter. If so, would that be  
> useful even on a single-threaded core?

My thinking was that at a given moment, there might be, say, 128 OS
threads... and in a commercial CPU, eight of those threads might
actually be executing at that moment - two in each core of a quad-core
CPU. I was thinking that since the threads were likely to be unrelated
in a conventional Windows PC environment, if the cores are "hyper-
threaded" with the grand total of two threads each, they should have
two separate branch predictors each.

Allocating a different OS thread to a core thread slot, I figured,
would take place "infrequently", say during a timer interrupt 60 times
a second (that's how often they had timer interrupts on my
grandpappy's IBM 360...) and so I viewed flushing the branch
predictor, rather than trying to give it the ability to cope with the
operating system's idea of what constitutes a thread, as an acceptable
departure from optimization.

John Savard