From: Anton Ertl on
Terje Mathisen <"terje.mathisen at tmsw.no"> writes:
>Anton Ertl wrote:
>> And what do you mean with ABA problem? What I understand as ABA
>> problem is not a problem here: If the speculative load loads the right
>> value, that value and any computation based on that will be correct
>> even if the content of memory location changes several times between
>> the speculative load and the checking load.
>
>I'm thinking of a multi-level structure where the critical value is a
>pointer:
>
>First you load it and get A, then load an item in the block A points at,
>then another process comes and does the following:
>
>Load A, process what it points at and free that block. (At this point
>A=NULL).
>
>Next the same or yet another process allocates a new block and gets to
>reuse the area A used to point to, but this time it is filled by another
>set of data, OK?
>
>Finally you are rescheduled, finish the processing you started and do a
>compare against the original value of A to make sure it has all been
>safe, before committing your updates.
>
>I.e. a single final compare isn't sufficient if the meaning can change,
>you have to verify every single item you have loaded that depended upon
>that speculatively loaded item.

Yes, if the memory semantics guarantee that kind of consistency (which
they do for IA-64 AFAIK), and if you cannot exclude the possibility
that another thread/process changes this stuff, you have to recheck
all the loads in the dependence chain (and in the right order), even
if not all of them are speculative. I don't think that this is a big
deal, because I guess that most dependent loads that happen before the
checking load will also be speculative and will have to be checked
anyway.

>The ALAT is similar to LLSC in that it will detect all modifications,
>including a rewrite of the same value.

I don't think that the ALAT does any better. It will notice that
something was stored at the loaded address, but it won't notice if
somebody changed something that dependent loads accessed, so the
program will have to check them with the ALAT in all situations where
they will have to do it with the double-loading approach.

- anton
--
M. Anton Ertl Some things have to be seen to be believed
anton(a)mips.complang.tuwien.ac.at Most things have to be believed to be seen
http://www.complang.tuwien.ac.at/anton/home.html
From: Quadibloc on
On Apr 27, 1:22 pm, n...(a)cam.ac.uk wrote:
> That is one of the reasons
> that many of the specialist supercomputers have been essentially
> usuable only from Fortran - it is a far less Von Neumann language
> than C/C++ and their followers.

If your ambitions are _that_ modest, one might cite the CELL
processor, or even MMX, as evidence of movement towards a new
computing paradigm.

I am dismayed that progress towards improved vector capabilities like
those of the Cray-1 in microprocessors has not come more quickly, but
I just see that as a way of increasing performance by grabbing the
cheapest possible form of parallelism, through making full use of
pipelining... not anything really transformational.

John Savard
From: Anton Ertl on
Robert Myers <rbmyersusa(a)gmail.com> writes:
>Anton Ertl wrote:
>> IIRC I read about the hardware for transparent register stack engine
>> operation not working, requiring a fallback to exception-driven
>> software spilling and refilling. That would not be a big problem on
>> most workloads. AFAIK SPARC and AMD29k have always used
>> exception-driven software spilling and refilling.
>>
>And that says what about Itanium, which had a completely different set
>of priorities? The fact that register spills could be handled
>asynchronously meant that you could use registers with reckless
>abandon--unless the RSE never worked the way it should have, in which
>case you couldn't.

Even with an asynchronous RSE you should not use registers with
reckless abandon, because they cost even then. You use them when they
pay off.

>Then you had the cost of all those architectural
>registers without a commensurate payback.

My impression is that having many registers pays off mainly for
certain numerical software where the RSE plays little role, because
most of the time is spent in inner loops that don't contain calls
(IIRC the RSE only saves integer registers anyway, not FP). So even
if the register stack is a little more costly than envisioned, I doubt
that it's a big problem.

>>> If the RSE didn't really work the way it was supposed to, then there
>>> would have been a fairly big downside to aggressive use of a large
>>> number of registers in any given procedure, thus limiting software
>>> pipelining to short loops.
>>
>> Not really, because software pipelining is beneficial mainly for inner
>> loops with many iterations; if you have that, then any register
>> spilling and refilling overhead is amortized over many executed
>> instructions. Of course, all of this depends on the compiler being
>> able to predict which loops have many iterations. But this is no
>> problem for SPEC CPU, which uses profile feedback; and of course, SPEC
>> CPU performance is what's relevant.
>>
>In other words, if Itanium hadn't attempted to embrace a design
>philosophy that is still apparently unwelcome to you, there shouldn't
>have been a problem. Are you being serious, or are you just jerking my
>chain?

Hmm, I guess I got a little distracted:-).

So, given that IA-64 seems to be relatively fast for numerical code
(well, not really anymore, if we compare SPEC CFP2006 results with the
faster Core i7s), and the CFP2006 Baseline results are not too far
away from the Peak results, I guess that they manage to guess the trip
counts well enough even without profile feedback. A simple heuristics
"integer->low trip count", "FP->high trip count" would explain the
observation that rotation is not used for integer loops.

- anton
--
M. Anton Ertl Some things have to be seen to be believed
anton(a)mips.complang.tuwien.ac.at Most things have to be believed to be seen
http://www.complang.tuwien.ac.at/anton/home.html
From: Terje Mathisen "terje.mathisen at on
Anton Ertl wrote:
> Terje Mathisen<"terje.mathisen at tmsw.no"> writes:
>> The ALAT is similar to LLSC in that it will detect all modifications,
>> including a rewrite of the same value.
>
> I don't think that the ALAT does any better. It will notice that
> something was stored at the loaded address, but it won't notice if
> somebody changed something that dependent loads accessed, so the
> program will have to check them with the ALAT in all situations where
> they will have to do it with the double-loading approach.

No, the critical difference is the ability to detect all writes to
protected locations, including rewrites of the same value:

This means that any app which blindly writes the entire chain when
updating would only need to check the start of the chain.

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"
From: MitchAlsup on
On Apr 27, 4:36 pm, Terje Mathisen <"terje.mathisen at tmsw.no">
wrote:
> Anton Ertl wrote:
> > Terje Mathisen<"terje.mathisen at tmsw.no">  writes:
> >> The ALAT is similar to LLSC in that it will detect all modifications,
> >> including a rewrite of the same value.
>
> > I don't think that the ALAT does any better.  It will notice that
> > something was stored at the loaded address, but it won't notice if
> > somebody changed something that dependent loads accessed, so the
> > program will have to check them with the ALAT in all situations where
> > they will have to do it with the double-loading approach.
>
> No, the critical difference is the ability to detect all writes to
> protected locations, including rewrites of the same value:
>
> This means that any app which blindly writes the entire chain when
> updating would only need to check the start of the chain.

With respect to the ABA problem and synchronization, a rewrite of a
critical location even with the same resulting bit pattern is still an
event that must terminate a synchronization attempt. At the very
minimum, it is exceedingly dangerous to allow a synchronize to assume
that it is successful when some other party has write-touched one of
its critical storage locations. This was THE situation where the ABA
problem acquired its name.

With respect to other programming model uses outside of
synchronization, I don't know.

Mitch