From: Ole-Hjalmar Kristensen on
>>>>> "FW" == Florian Weimer <fw(a)deneb.enyo.de> writes:

FW> * Ole-Hjalmar Kristensen:
>> Second, I cannot find anything in the RM which says you can make *any*
>> assumptions about objects which are outside the protected object so I
>> cannot see how signaling will help you.

FW> I think that's what 9.10 (Ada 95 without TC 1) is about.

Yes, you are right. Thanks. I looked in the part describing protected
objects without finding it.

But one part of 9.10 puzzles me:

9.c Reason: The point of this distinction is so that on
multiprocessors with inconsistent caches, the caches only need
to be refreshed at the beginning of an entry body, and forced
out at the end of an entry body or protected procedure that
leaves an entry open. Protected function calls, and protected
subprogram calls for entryless protected objects do not require
full cache consistency. Entryless protected objects are
intended to be treated roughly like atomic objects -- each
operation is indivisible with respect to other operations
(unless both are reads), but such operations cannot be used to
synchronize access to other nonvolatile shared variables.

If you do not refresh the cache at the beginning of a protected
procedure, how do you avoid reading stale data within the protected
object on a multiprocessor? And why the wording "cannot be used to
synchronize access to other *nonvolatile* shared variables"? Is the
implication that it *can* be used to synchronize other *volatile*
shared variables?

Btw., I ran a simple test on a SPRAC multiprocessor with an entryless
protected object containing a single integer versus an integer
declared with pragma atomic, and as expected the pragma atomic
solution was much (40x) faster.

--
C++: The power, elegance and simplicity of a hand grenade.
From: Ole-Hjalmar Kristensen on
>>>>> "(b" == (see below) <yaldnif.w(a)blueyonder.co.uk> writes:

<snip>

(b> I think the wording is trying to cover all the bases.
(b> One clue is the phrase "_full_ cache consistency".
(b> The caches certainly need to be consistent with respect to the
(b> protected data, even for protected functions and procedures,
(b> but only entries ensure global consistency and so provide
(b> synchronization of data that is not local to the protected object.

Yes, that seems reasonable.

>> Btw., I ran a simple test on a SPRAC multiprocessor with an entryless
>> protected object containing a single integer versus an integer
>> declared with pragma atomic, and as expected the pragma atomic
>> solution was much (40x) faster.

(b> Unfortunately, we can't usefully apply that pragma even
(b> to a pair of integers. (I don't mean the pair's components!)

Agreed, but you may able to cheat and pack a pair of integers into
a 64-bit atomic, and a compare-and-swap is also much cheaper than a
protected object it seems.

(b> --
(b> Bill Findlay
(b> <surname><forename> chez blueyonder.co.uk



--
C++: The power, elegance and simplicity of a hand grenade.
From: Ole-Hjalmar Kristensen on
>>>>> "(b" == (see below) <yaldnif.w(a)blueyonder.co.uk> writes:

<snip>

(b> Also, simply declaring a variable atomic does not in itself
(b> ensure global consistency of view. For that you also need
(b> to execute appropriate memory barrier operations for the
(b> architecture. (I'm sure you know that.)

Yes, I know. So pragma atomic does not help very much by itself in
implementing lock-free algorithms.

>> Agreed, but you may able to cheat and pack a pair of integers into
>> a 64-bit atomic, and a compare-and-swap is also much cheaper than a
>> protected object it seems.

(b> Yes, but this is completely implementation-dependent.
(b> Not much above the semantic level of assembly.
(b> I would take a lot of convincing that the performance
(b> improvement was actually necessary.

I agree that it rarely should be necessary. It would have been nice to
have a way of implementing lock-free algorithms in Ada efficiently,
but entryless protected objects seems to be the best solution if
you want a portable program.

(b> --
(b> Bill Findlay
(b> <surname><forename> chez blueyonder.co.uk


--
C++: The power, elegance and simplicity of a hand grenade.
From: Ole-Hjalmar Kristensen on
Interesting. I had not thought of Ada.Synchronous_Task_Control. Apart
from that, I agree that a "best effort" implementation of standard
library like the membar_ops and atomic_ops which are part of the
Solaris C library would likely be sufficient.


>>>>> "(b" == (see below) <yaldnif.w(a)blueyonder.co.uk> writes:

(b> On 02/04/2008 08:22, in article wvbr4pakzpo1.fsf(a)astra06.norway.sun.com,
(b> "Ole-Hjalmar Kristensen"
(b> <ole-hjalmar.kristensen(a)substitute_employer_here.com> wrote:

>> Yes, I know. So pragma atomic does not help very much by itself in
>> implementing lock-free algorithms.

(b> Yes, it's only one essential aspect.

>>>> Agreed, but you may able to cheat and pack a pair of integers into
>>>> a 64-bit atomic, and a compare-and-swap is also much cheaper than a
>>>> protected object it seems.
>>
>> (b> Yes, but this is completely implementation-dependent.
>> (b> Not much above the semantic level of assembly.
>> (b> I would take a lot of convincing that the performance
>> (b> improvement was actually necessary.
>>
>> I agree that it rarely should be necessary. It would have been nice to
>> have a way of implementing lock-free algorithms in Ada efficiently,
>> but entryless protected objects seems to be the best solution if
>> you want a portable program.

(b> It would not take too much - perhaps just a standard library including
(b> read-barrier and write-barrier operations, and a selection of things like
(b> CAS, TAS, EXCH, etc; mapped to m/c codes where these are provided,
(b> and implemented using the architecture's native sync operations where not.

(b> Is this the basis for an AI?

(b> Following on my second post to the point, I was reminded of the package
(b> Ada.Synchronous_Task_Control, which must also impose memory barriers is it
(b> is to work reliably on a multiprocessor; so I tried that in my Simpson's
(b> algorithm test as well. Here is the code:

>> with Ada.Synchronous_Task_Control;
>> package body Wait_Free_Atomicity is
>>
>> procedure Sync is
>> use Ada.Synchronous_Task_Control;
>> Sema : Suspension_Object;
>> Bool : Boolean := Current_State(Sema); -- either this
>> begin
>> -- Set_True(Sema); -- or
>> -- Suspend_Until_True(Sema); -- this
>> end Sync;
>>
>> procedure Update (Atomic_Item : in out an_Atomic; Item : in an_Item) is
>> Row : constant a_Bistate := not Atomic_Item.Last_Row_Inspected;
>> Col : constant a_Bistate := not Atomic_Item.Last_Column_Updated(Row);
>> begin
>> Atomic_Item.Data_Slot_Matrix(Row, Col) := Item;
>> Atomic_Item.Last_Column_Updated(Row) := Col;
>> Atomic_Item.Last_Row_Updated := Row;
>> Sync;
>> end Update;
>>
>> procedure Inspect (Atomic_Item : in out an_Atomic; Item : out an_Item) is
>> Row : constant a_Bistate := Atomic_Item.Last_Row_Updated;
>> Col : a_Bistate;
>> pragma Atomic (Col);
>> begin
>> Atomic_Item.Last_Row_Inspected := Row;
>> Col := Atomic_Item.Last_Column_Updated(Row);
>> Item := Atomic_Item.Data_Slot_Matrix(Row, Col);
>> Sync;
>> end Inspect;
>>
>> end Wait_Free_Atomicity;

(b> It works nicely, and is an order of magnitude faster than a protected
(b> object:

(b> 20_000_000 updates

(b> with Suspend_Until_True sync:
(b> No consistency failures.
(b> 7.30 real 14.14 user 0.04 sys

(b> with Current_State sync:
(b> No consistency failures.
(b> 3.57 real 6.97 user 0.02 sys

(b> --
(b> Bill Findlay
(b> <surname><forename> chez blueyonder.co.uk

--
C++: The power, elegance and simplicity of a hand grenade.
From: Ole-Hjalmar Kristensen on
After reading your post about use of suspension objects, I looked in
the RM to see what guarantees we have about the effect of sharing
variables between tasks synchronized with suspension objects. I
completely agree that to work reliably on a multiprocessor the
suspension operations *has* to impose a memory barrier, but I was
unable to find any explicit statement about sharing variables
synchronized with suspension objects.

The sections I have reproduced below is the closest I could get. Since
Suspend_Until_True is potentially blocking, it *could* be signaled (as
defined in 9.10), but I find it strange that it is not mentioned
explicitly. Also, the statement "can be used for two-stage suspend
operations and as a simple building block for implementing
higher-level queues", seems to indicate that the intent indeed is to
be able to use suspension objects to synchronize access to shared
variables. Comments, anyone?

From the AARM:

9.10 Shared Variables

9.a Reason: The underlying principle here is that for one action to
``signal'' a second, the second action has to follow a
potentially blocking operation, whose blocking is dependent on
the first action in some way. Protected procedures are not
potentially blocking, so they can only be "signalers," they
cannot be signaled.

and

D.10 Synchronous Task Control

1 [This clause describes a language-defined private semaphore (suspension
object), which can be used for two-stage suspend operations and as a simple
building block for implementing higher-level queues.]
....
9 The procedure Suspend_Until_True blocks the calling task until the state
of the object S is true; at that point the task becomes ready and the state
of the object becomes false.

10 {potentially blocking operation [Suspend_Until_True]} {blocking,
potentially [Suspend_Until_True]} {Program_Error (raised by failure of
run-time check)} Program_Error is raised upon calling Suspend_Until_True if
another task is already waiting on that suspension object. Suspend_Until_
True is a potentially blocking operation (see 9.5.1).


>>>>> "(b" == (see below) <yaldnif.w(a)blueyonder.co.uk> writes:

<snip>

(b> Following on my second post to the point, I was reminded of the package
(b> Ada.Synchronous_Task_Control, which must also impose memory barriers is it
(b> is to work reliably on a multiprocessor; so I tried that in my Simpson's
(b> algorithm test as well. Here is the code:

>> with Ada.Synchronous_Task_Control;
>> package body Wait_Free_Atomicity is
>>
>> procedure Sync is
>> use Ada.Synchronous_Task_Control;
>> Sema : Suspension_Object;
>> Bool : Boolean := Current_State(Sema); -- either this
>> begin
>> -- Set_True(Sema); -- or
>> -- Suspend_Until_True(Sema); -- this
>> end Sync;
>>
>> procedure Update (Atomic_Item : in out an_Atomic; Item : in an_Item) is
>> Row : constant a_Bistate := not Atomic_Item.Last_Row_Inspected;
>> Col : constant a_Bistate := not Atomic_Item.Last_Column_Updated(Row);
>> begin
>> Atomic_Item.Data_Slot_Matrix(Row, Col) := Item;
>> Atomic_Item.Last_Column_Updated(Row) := Col;
>> Atomic_Item.Last_Row_Updated := Row;
>> Sync;
>> end Update;
>>
>> procedure Inspect (Atomic_Item : in out an_Atomic; Item : out an_Item) is
>> Row : constant a_Bistate := Atomic_Item.Last_Row_Updated;
>> Col : a_Bistate;
>> pragma Atomic (Col);
>> begin
>> Atomic_Item.Last_Row_Inspected := Row;
>> Col := Atomic_Item.Last_Column_Updated(Row);
>> Item := Atomic_Item.Data_Slot_Matrix(Row, Col);
>> Sync;
>> end Inspect;
>>
>> end Wait_Free_Atomicity;

(b> It works nicely, and is an order of magnitude faster than a protected
(b> object:

(b> 20_000_000 updates

(b> with Suspend_Until_True sync:
(b> No consistency failures.
(b> 7.30 real 14.14 user 0.04 sys

(b> with Current_State sync:
(b> No consistency failures.
(b> 3.57 real 6.97 user 0.02 sys

(b> --
(b> Bill Findlay
(b> <surname><forename> chez blueyonder.co.uk



--
C++: The power, elegance and simplicity of a hand grenade.