From: Stephen Fuld on
nmm1(a)cam.ac.uk wrote:

snip

> I still think that there is serious potential in using a cooperative
> approach to branch prediction, as I have posted before. It involves
> adding a few, simple instructions that have no effect on the actions
> on the program, but may be used to pass information from the compiler
> to the prediction unit.

Let me make sure I understand what you are proposing.

> Roughly:
>
> SET_PRED n <registers>
> Sets predictor n to an undefined hash of the register values.

So this takes some number of registers, whose names are given in the
instruction, creates a hash of the values in those registers and saves
the hash in some dedicated location, indexed by n? Is that an accurate
description?

> USE_PRED n <confidence>
> Uses that predictor, with confidence between (say) -7 and 7; negative
> values imply the confidence of it not being taken.

So this takes the same hash of the same registers and compares the value
of the hash to the one stored at location n. If the values match, use
the confidence value in the instruction to predict the branch. If the
hash values don't match, don't use the confidence value (and perhaps
default to some other predictor). Is that correct?

> PUSH_PREDS <predictors>
> POP_PREDS <predictors>
> Used at function calls and elsewhere to indicate the change in use.
> Simply clearing a predictor is always a permitted action, and most
> hardware would keep only a very small stack of them.

What is specified by the <predictors> in these instructions? I
understand the intent, but not exactly what the instructions specify.

This whole scheme as I described it sounds problematical, so before I
spend much time discussing the problems, please let me know if my
descriptions are accurate, or where I went wrong.

--
- Stephen Fuld
(e-mail address disguised to prevent spam)
From: nmm1 on
In article <he1bb8$hmg$1(a)news.eternal-september.org>,
Stephen Fuld <SFuld(a)Alumni.cmu.edu.invalid> wrote:
>
> > Roughly:
>>
>> SET_PRED n <registers>
>> Sets predictor n to an undefined hash of the register values.
>
>So this takes some number of registers, whose names are given in the
>instruction, creates a hash of the values in those registers and saves
>the hash in some dedicated location, indexed by n? Is that an accurate
>description?

Not quite. It looks up the hash in a branch predictor structure,
and saves a confidence value, according to whether it is expected
to correspond with branches being taken. That structure is likely
to be a simple LRU.

>> USE_PRED n <confidence>
>> Uses that predictor, with confidence between (say) -7 and 7; negative
>> values imply the confidence of it not being taken.
>
>So this takes the same hash of the same registers and compares the value
>of the hash to the one stored at location n. If the values match, use
>the confidence value in the instruction to predict the branch. If the
>hash values don't match, don't use the confidence value (and perhaps
>default to some other predictor). Is that correct?

No, not at all. It uses the prediction value directly by looking
up the value indexed by n, and matching signs. It will also update
that value, according to the success of the prediction.

>> PUSH_PREDS <predictors>
>> POP_PREDS <predictors>
>> Used at function calls and elsewhere to indicate the change in use.
>> Simply clearing a predictor is always a permitted action, and most
>> hardware would keep only a very small stack of them.
>
>What is specified by the <predictors> in these instructions? I
>understand the intent, but not exactly what the instructions specify.

The indices of the predictors - i.e. the 'n's above.

>This whole scheme as I described it sounds problematical, so before I
>spend much time discussing the problems, please let me know if my
>descriptions are accurate, or where I went wrong.

Certainly. I apologise for being obscure. The basic idea is that
there are a small number of predictor states, which are selected
and used according to code generated by the compiler. The actual
prediction is handled entirely by the hardware, based on what has
happened before (i.e. just as at present).

The only real difference from the current mechanisms is that the
compiler can tell the hardware "these are the registers that control
which object I am handling", rather than the lookup being purely
based on the instruction location (and possibly thread number).


Regards,
Nick Maclaren.
From: Stephen Fuld on
nmm1(a)cam.ac.uk wrote:
> In article <he1bb8$hmg$1(a)news.eternal-september.org>,
> Stephen Fuld <SFuld(a)Alumni.cmu.edu.invalid> wrote:
>>> Roughly:
>>>
>>> SET_PRED n <registers>
>>> Sets predictor n to an undefined hash of the register values.
>> So this takes some number of registers, whose names are given in the
>> instruction, creates a hash of the values in those registers and saves
>> the hash in some dedicated location, indexed by n? Is that an accurate
>> description?
>
> Not quite. It looks up the hash in a branch predictor structure,
> and saves a confidence value, according to whether it is expected
> to correspond with branches being taken. That structure is likely
> to be a simple LRU.

So the structure is a two dimensional table, one dimension indexed by n,
and the other implementation dependent, but treated as an LRU? What
does each entry in the table contain? It must have the hash, and
presumably a few (4?) bit field to indicate prediction direction and
confidence.. Anything else?

So we take the hash. If it found, we move the corresponding entry to
the top of the LRU. If the hash is not found, it evicts the LRU entry
and creates a new MRU entry with the hash. Is that right?

>>> USE_PRED n <confidence>
>>> Uses that predictor, with confidence between (say) -7 and 7; negative
>>> values imply the confidence of it not being taken.
>> So this takes the same hash of the same registers and compares the value
>> of the hash to the one stored at location n. If the values match, use
>> the confidence value in the instruction to predict the branch. If the
>> hash values don't match, don't use the confidence value (and perhaps
>> default to some other predictor). Is that correct?
>
> No, not at all. It uses the prediction value directly by looking
> up the value indexed by n, and matching signs.

Signs??? Do you mean the sign of the predictor value in the table and
the sign of the confidence field? So we go the n array in the table and
match what against the hash value found in each entry? Don't we need to
do the hash on the registers to have something to match against?
Otherwise, how do we know which entry to use? Or do we just use the MRU
entry? Once we pick an entry, what do we do if the signs match versus
if they don't?

> It will also update
> that value, according to the success of the prediction.

So you increment or decrement the value in the entry in the table?


It's getting clearer, but I am still not clear on the details.


--
- Stephen Fuld
(e-mail address disguised to prevent spam)
From: nmm1 on
In article <he1hiv$jmf$1(a)news.eternal-september.org>,
Stephen Fuld <SFuld(a)Alumni.cmu.edu.invalid> wrote:
>
>So the structure is a two dimensional table, one dimension indexed by n,
>and the other implementation dependent, but treated as an LRU? What
>does each entry in the table contain? It must have the hash, and
>presumably a few (4?) bit field to indicate prediction direction and
>confidence.. Anything else?

No, and it needn't even be that. The 'n' need not be stored in the
structure. The hash is looked up and the confidence value (which
includes the direction) copied to the separate register file that
is indexed by n (and, again, contains just a confidence value).

>So we take the hash. If it found, we move the corresponding entry to
>the top of the LRU. If the hash is not found, it evicts the LRU entry
>and creates a new MRU entry with the hash. Is that right?

Yes.

>Signs??? Do you mean the sign of the predictor value in the table and
>the sign of the confidence field? So we go the n array in the table and
>match what against the hash value found in each entry? Don't we need to
>do the hash on the registers to have something to match against?
>Otherwise, how do we know which entry to use? Or do we just use the MRU
>entry? Once we pick an entry, what do we do if the signs match versus
>if they don't?

When using the data, we have the 'n', so we just fetch the confidence
value from the register file indexed by that. This is negated if the
confidence in the USE_PRED is negative, and its sign used to predict
the branch.

>> It will also update
>> that value, according to the success of the prediction.
>
>So you increment or decrement the value in the entry in the table?

Or whatever. That's what I was thinking of as a first experiment.

>It's getting clearer, but I am still not clear on the details.

Nor am I! What I am describing is an approach that might lead to
better prediction for O-O codes. But the details would be modified
according to whatever techniques are already known to be effective
for branch prediction, and as a result of simulation.

My key idea is to get away from using just the instruction address
and thread index for branch prediction, and to use the address of
the object being worked on (or whatever the compiler regards as
appropriate). Really no more than that. But doing even that needs
some handshaking between the compiler and the hardware.

Oh, the other main idea is 'advisory instructions' only, so that
a CPU could ignore them without behavioural changes. But that isn't
a new idea.

Regards,
Nick Maclaren.
From: EACA on
"The only bad thing is that some guys I know at AMD say that Bulldozer
is
not really all that great a product, but is shipping just because AMD
needs a model refresh."

Andy:
Can you elaborate more on this issue, because I find it irrational
that AMD's own engineers (or whoever told you that) don't "trust
enough" their new micro-architecture.

Waiting anxiously for your reply.