|
From: (see below) on 31 Mar 2008 09:03 On 31/03/2008 08:59, in article wvbrd4pbz5mj.fsf(a)astra06.norway.sun.com, "Ole-Hjalmar Kristensen" <ole-hjalmar.kristensen(a)substitute_employer_here.com> wrote: > But one part of 9.10 puzzles me: > > 9.c Reason: The point of this distinction is so that on > multiprocessors with inconsistent caches, the caches only need > to be refreshed at the beginning of an entry body, and forced > out at the end of an entry body or protected procedure that > leaves an entry open. Protected function calls, and protected > subprogram calls for entryless protected objects do not require > full cache consistency. Entryless protected objects are > intended to be treated roughly like atomic objects -- each > operation is indivisible with respect to other operations > (unless both are reads), but such operations cannot be used to > synchronize access to other nonvolatile shared variables. > > If you do not refresh the cache at the beginning of a protected > procedure, how do you avoid reading stale data within the protected > object on a multiprocessor? And why the wording "cannot be used to > synchronize access to other *nonvolatile* shared variables"? Is the > implication that it *can* be used to synchronize other *volatile* > shared variables? I think the wording is trying to cover all the bases. One clue is the phrase "_full_ cache consistency". The caches certainly need to be consistent with respect to the protected data, even for protected functions and procedures, but only entries ensure global consistency and so provide synchronization of data that is not local to the protected object. > Btw., I ran a simple test on a SPRAC multiprocessor with an entryless > protected object containing a single integer versus an integer > declared with pragma atomic, and as expected the pragma atomic > solution was much (40x) faster. Unfortunately, we can't usefully apply that pragma even to a pair of integers. (I don't mean the pair's components!) -- Bill Findlay <surname><forename> chez blueyonder.co.uk
From: (see below) on 31 Mar 2008 10:17 On 31/03/2008 14:03, in article C4169FA1.E214C%yaldnif.w(a)blueyonder.co.uk, "(see below)" <yaldnif.w(a)blueyonder.co.uk> wrote: > On 31/03/2008 08:59, in article wvbrd4pbz5mj.fsf(a)astra06.norway.sun.com, > "Ole-Hjalmar Kristensen" > <ole-hjalmar.kristensen(a)substitute_employer_here.com> wrote: >> If you do not refresh the cache at the beginning of a protected >> procedure, how do you avoid reading stale data within the protected >> object on a multiprocessor? And why the wording "cannot be used to >> synchronize access to other *nonvolatile* shared variables"? Is the >> implication that it *can* be used to synchronize other *volatile* >> shared variables? > > I think the wording is trying to cover all the bases. > One clue is the phrase "_full_ cache consistency". > The caches certainly need to be consistent with respect to the > protected data, even for protected functions and procedures, > but only entries ensure global consistency and so provide > synchronization of data that is not local to the protected object. In practice, I would expect even protected functions and procedures to need a mutex, and that that would execute a memory barrier operation, so enforcing global consistency. To verify this I ran some simple tests myself. I implemented Simpson's algorithm for a lock-free shared variable. Here is the relevant code: > package Wait_Free_Atomicity is > > type an_Atomic is limited private; > > procedure Update (Atomic_Item : in out an_Atomic; Item : in an_Item); > procedure Inspect (Atomic_Item : in out an_Atomic; Item : out an_Item); > > private > > type a_Bistate is new Boolean; > pragma Atomic (a_Bistate); > > type a_Slot_Matrix is array (a_Bistate, a_Bistate) of an_Item; > pragma Volatile_Components (a_Slot_Matrix); > > type a_Full_Column is array (a_Bistate) of a_Bistate; > pragma Atomic_Components (a_Full_Column); > > type an_Atomic is > limited record > Data_Slot_Matrix : a_Slot_Matrix; > Last_Column_Updated : a_Full_Column := > (others => a_Bistate'First); > Last_Row_Inspected, Last_Row_Updated : a_Bistate := > a_Bistate'First; > pragma Atomic (Last_Row_Inspected); > pragma Atomic (Last_Row_Updated); > end record; > > end Wait_Free_Atomicity; > > package body Wait_Free_Atomicity is > > procedure Update (Atomic_Item : in out an_Atomic; Item : in an_Item) is > Row : constant a_Bistate := not Atomic_Item.Last_Row_Inspected; > Col : constant a_Bistate := not Atomic_Item.Last_Column_Updated(Row); > begin > Atomic_Item.Data_Slot_Matrix(Row, Col) := Item; > Atomic_Item.Last_Column_Updated(Row) := Col; > Atomic_Item.Last_Row_Updated := Row; > -- no explicit membar sync > end Update; > > procedure Inspect (Atomic_Item : in out an_Atomic; Item : out an_Item) is > Row : constant a_Bistate := Atomic_Item.Last_Row_Updated; > Col : a_Bistate; > pragma Atomic (Col); > begin > Atomic_Item.Last_Row_Inspected := Row; > Col := Atomic_Item.Last_Column_Updated(Row); > Item := Atomic_Item.Data_Slot_Matrix(Row, Col); > -- no explicit membar sync > end Inspect; > > end Wait_Free_Atomicity; The test was to run it with the Item type being a 5-tuple of consecutive integers, and checking that the inspecting task received correct tuples. When run on a single-processor machine (Mac PowerBook) there were no consistency failures. When run on a dual-core machine (MacBook Pro with Core 2 Duo CPU) there were many consistency failures. When the body was modified as follows: > package body Wait_Free_Atomicity is > > protected Mem_Bar is > entry Sync; > end Mem_Bar; > > protected body Mem_Bar is > entry Sync when Boolean'(True) is > begin > null; > end Sync; > end Mem_Bar; > > procedure Update (Atomic_Item : in out an_Atomic; Item : in an_Item) is > Row : constant a_Bistate := not Atomic_Item.Last_Row_Inspected; > Col : constant a_Bistate := not Atomic_Item.Last_Column_Updated(Row); > begin > Atomic_Item.Data_Slot_Matrix(Row, Col) := Item; > Atomic_Item.Last_Column_Updated(Row) := Col; > Atomic_Item.Last_Row_Updated := Row; > Mem_Bar.Sync; > end Update; > > procedure Inspect (Atomic_Item : in out an_Atomic; Item : out an_Item) is > Row : constant a_Bistate := Atomic_Item.Last_Row_Updated; > Col : a_Bistate; > pragma Atomic (Col); > begin > Atomic_Item.Last_Row_Inspected := Row; > Col := Atomic_Item.Last_Column_Updated(Row); > Item := Atomic_Item.Data_Slot_Matrix(Row, Col); > Mem_Bar.Sync; > end Inspect; > > end Wait_Free_Atomicity; Note that the tuples are external to the protected object. The consistency failures went away. The same result held when the Mem_Bar.Sync entry was replaced by a protected procedure; ditto with a protected function. For the sake of interest, here are the test results, with execution times (CPU times exceed real times because there are 2 cores working simultaneously): 20_000_000 updates with no sync: 3689 consistency failures. 0.90 real 0.73 user 0.09 sys with entry sync: No consistency failures. 222.90 real 138.52 user 265.78 sys with procedure sync: No consistency failures. 149.36 real 127.65 user 164.96 sys with function sync: No consistency failures. 142.43 real 120.33 user 155.93 sys -- Bill Findlay <surname><forename> chez blueyonder.co.uk
From: (see below) on 1 Apr 2008 10:12 On 01/04/2008 10:02, in article wvbr8wzyymkw.fsf(a)astra06.norway.sun.com, "Ole-Hjalmar Kristensen" <ole-hjalmar.kristensen(a)substitute_employer_here.com> wrote: >>>>>> "(b" == (see below) <yaldnif.w(a)blueyonder.co.uk> writes: > > <snip> > > (b> I think the wording is trying to cover all the bases. > (b> One clue is the phrase "_full_ cache consistency". > (b> The caches certainly need to be consistent with respect to the > (b> protected data, even for protected functions and procedures, > (b> but only entries ensure global consistency and so provide > (b> synchronization of data that is not local to the protected object. > > Yes, that seems reasonable. > >>> Btw., I ran a simple test on a SPRAC multiprocessor with an entryless >>> protected object containing a single integer versus an integer >>> declared with pragma atomic, and as expected the pragma atomic >>> solution was much (40x) faster. > > (b> Unfortunately, we can't usefully apply that pragma even > (b> to a pair of integers. (I don't mean the pair's components!) Also, simply declaring a variable atomic does not in itself ensure global consistency of view. For that you also need to execute appropriate memory barrier operations for the architecture. (I'm sure you know that.) > Agreed, but you may able to cheat and pack a pair of integers into > a 64-bit atomic, and a compare-and-swap is also much cheaper than a > protected object it seems. Yes, but this is completely implementation-dependent. Not much above the semantic level of assembly. I would take a lot of convincing that the performance improvement was actually necessary. -- Bill Findlay <surname><forename> chez blueyonder.co.uk
From: (see below) on 2 Apr 2008 10:59 On 02/04/2008 08:22, in article wvbr4pakzpo1.fsf(a)astra06.norway.sun.com, "Ole-Hjalmar Kristensen" <ole-hjalmar.kristensen(a)substitute_employer_here.com> wrote: > Yes, I know. So pragma atomic does not help very much by itself in > implementing lock-free algorithms. Yes, it's only one essential aspect. >>> Agreed, but you may able to cheat and pack a pair of integers into >>> a 64-bit atomic, and a compare-and-swap is also much cheaper than a >>> protected object it seems. > > (b> Yes, but this is completely implementation-dependent. > (b> Not much above the semantic level of assembly. > (b> I would take a lot of convincing that the performance > (b> improvement was actually necessary. > > I agree that it rarely should be necessary. It would have been nice to > have a way of implementing lock-free algorithms in Ada efficiently, > but entryless protected objects seems to be the best solution if > you want a portable program. It would not take too much - perhaps just a standard library including read-barrier and write-barrier operations, and a selection of things like CAS, TAS, EXCH, etc; mapped to m/c codes where these are provided, and implemented using the architecture's native sync operations where not. Is this the basis for an AI? Following on my second post to the point, I was reminded of the package Ada.Synchronous_Task_Control, which must also impose memory barriers is it is to work reliably on a multiprocessor; so I tried that in my Simpson's algorithm test as well. Here is the code: > with Ada.Synchronous_Task_Control; > package body Wait_Free_Atomicity is > > procedure Sync is > use Ada.Synchronous_Task_Control; > Sema : Suspension_Object; > Bool : Boolean := Current_State(Sema); -- either this > begin > -- Set_True(Sema); -- or > -- Suspend_Until_True(Sema); -- this > end Sync; > > procedure Update (Atomic_Item : in out an_Atomic; Item : in an_Item) is > Row : constant a_Bistate := not Atomic_Item.Last_Row_Inspected; > Col : constant a_Bistate := not Atomic_Item.Last_Column_Updated(Row); > begin > Atomic_Item.Data_Slot_Matrix(Row, Col) := Item; > Atomic_Item.Last_Column_Updated(Row) := Col; > Atomic_Item.Last_Row_Updated := Row; > Sync; > end Update; > > procedure Inspect (Atomic_Item : in out an_Atomic; Item : out an_Item) is > Row : constant a_Bistate := Atomic_Item.Last_Row_Updated; > Col : a_Bistate; > pragma Atomic (Col); > begin > Atomic_Item.Last_Row_Inspected := Row; > Col := Atomic_Item.Last_Column_Updated(Row); > Item := Atomic_Item.Data_Slot_Matrix(Row, Col); > Sync; > end Inspect; > > end Wait_Free_Atomicity; It works nicely, and is an order of magnitude faster than a protected object: 20_000_000 updates with Suspend_Until_True sync: No consistency failures. 7.30 real 14.14 user 0.04 sys with Current_State sync: No consistency failures. 3.57 real 6.97 user 0.02 sys -- Bill Findlay <surname><forename> chez blueyonder.co.uk
From: (see below) on 4 Apr 2008 09:56 On 04/04/2008 07:36, in article wvbry77uxh19.fsf(a)astra06.norway.sun.com, "Ole-Hjalmar Kristensen" <ole-hjalmar.kristensen(a)substitute_employer_here.com> wrote: > Interesting. I had not thought of Ada.Synchronous_Task_Control. Apart > from that, I agree that a "best effort" implementation of standard > library like the membar_ops and atomic_ops which are part of the > Solaris C library would likely be sufficient. Is there online documentation/code for that library? -- Bill Findlay <surname><forename> chez blueyonder.co.uk
|
Next
|
Last
Pages: 1 2 Prev: Text_IO from a stream Next: Generics with concrete and class-wide types |