From: Pops on
Ben (C++ MVP) recommended using the Interlocked Singly Linked Lists API:

http://msdn2.microsoft.com/en-us/library/ms686962.aspx

as a more efficient synchronization method for a recent thread
discussion in regards to a FIFO queue design.

If Ben is reading or if anyone else is familiar with this API, I have a
few questions:

First, based on what I see, this is for LIFO designs (i.e., stacks)?
Not FIFO. Am I missing something? I only see Push/Pop functions, no
head/tail functions.

Seconds, can you tell how it works with IPC, can it work for IPC?

Thanks

--
HLS



From: Ben Voigt [C++ MVP] on

"Pops" <dude(a)nospamnospamnospam.com> wrote in message
news:%23tndYNsJIHA.4712(a)TK2MSFTNGP04.phx.gbl...
> Ben (C++ MVP) recommended using the Interlocked Singly Linked Lists API:
>
> http://msdn2.microsoft.com/en-us/library/ms686962.aspx
>
> as a more efficient synchronization method for a recent thread discussion
> in regards to a FIFO queue design.
>
> If Ben is reading or if anyone else is familiar with this API, I have a
> few questions:
>
> First, based on what I see, this is for LIFO designs (i.e., stacks)? Not
> FIFO. Am I missing something? I only see Push/Pop functions, no
> head/tail functions.

We were discussing a multiple writer, single consumer queue. For this the
writers each use the InterlockedPushEntrySList function, while the consumer
uses the InterlockedFlushSList function to retrieve the entire queue as an
atomic operation. The consumer is then responsible for iterating through
the list of items retrieved in FIFO order.

>
> Seconds, can you tell how it works with IPC, can it work for IPC?

Because there is no name assigned, the SList cannot establish an IPC
channel. However, because the caller has control over the placement of all
data structures involved, it should be possible to use with shared memory.
It would be highly desirable in such cases to obtain the same virtual
address space in each of the processes. If this is not possible, then you
probably can't use the Win32 API implementation, though the underlying
algorithm will still work (using relative addresses / indices instead of
pointers).

>
> Thanks
>
> --
> HLS
>
>
>


From: Pops on
Ben Voigt [C++ MVP] wrote:

> We were discussing a multiple writer, single consumer queue. For this the
> writers each use the InterlockedPushEntrySList function, while the consumer
> uses the InterlockedFlushSList function to retrieve the entire queue as an
> atomic operation. The consumer is then responsible for iterating through
> the list of items retrieved in FIFO order.

Gotcha!

>> Seconds, can you tell how it works with IPC, can it work for IPC?
>
> Because there is no name assigned, the SList cannot establish an IPC
> channel. However, because the caller has control over the placement of all
> data structures involved, it should be possible to use with shared memory.
> It would be highly desirable in such cases to obtain the same virtual
> address space in each of the processes. If this is not possible, then you
> probably can't use the Win32 API implementation, though the underlying
> algorithm will still work (using relative addresses / indices instead of
> pointers).

Ok. Right.

I was interested to see if it can be applied in our virtual I/O
communication/channel framework which currently uses RPC. Probably won't
apply for remote machines operations, but might prove useful as an
improved alternative to local RPC operations.

My whole mantra these days is to increase client/server performance and
scalability. So this api might just have some purpose for us.


--
HLS
From: Chris Thomasson on
"Pops" <dude(a)nospamnospamnospam.com> wrote in message
news:%23tndYNsJIHA.4712(a)TK2MSFTNGP04.phx.gbl...
> Ben (C++ MVP) recommended using the Interlocked Singly Linked Lists API:
>
> http://msdn2.microsoft.com/en-us/library/ms686962.aspx
>
> as a more efficient synchronization method for a recent thread discussion
> in regards to a FIFO queue design.
>
> If Ben is reading or if anyone else is familiar with this API, I have a
> few questions:
>
> First, based on what I see, this is for LIFO designs (i.e., stacks)? Not
> FIFO. Am I missing something? I only see Push/Pop functions, no
> head/tail functions.
>
> Seconds, can you tell how it works with IPC, can it work for IPC?

Yes. You can create a node cache in shared memory and allocate/deallocate
your SLIST_ENTRY from there. Something like:


(quick and dirty pseudo-code sketch)
____________________________________________________________

#define L2CACHE_LINESZ() 128
#define CACHE_DEPTH() 1024


typedef struct shm_node_s shm_node;
typedef struct shm_node_data_s shm_node_data;
typedef struct shm_node_cache_s shm_node_cache;

struct shm_node_data_s {
/* some data */
};

struct shm_node_s {
SLIST_ENTRY slnode;
shm_node_data data;
};

struct shm_node_cache_s {
SLIST_HEADER anchor;
char pad[L2CACHE_LINESZ() - sizeof(SLIST_HEADER)];
shm_node nodes[CACHE_DEPTH()];
};


shm_node_cache*
shm_node_cache_create(
char const* const name
) {
int i;
shm_node_cache* const _this =
/* create file map with name, alloc and align shm */

InitializeSListHead(&_this->anchor);

for(i = 0; i < CACHE_DEPTH(); ++i) {
InterlockedPushEntrySList(&_this->anchor,
&_this->nodes[i].slnode);
}

return _this;
}


shm_node*
shm_node_cache_pop(
shm_node_cache* const _this
) {
return (shm_node*)InterlockedPopEntrySList(&_this->anchor);
}

void
shm_node_cache_push(
shm_node_cache* const _this,
shm_node* const node
) {
InterlockedPushEntrySList(&_this->anchor, &node->slnode);
}




Now you can create your queue. Something like:


typedef struct shm_queue_s shm_queue;

struct shm_queue_s {
SLIST_HEADER anchor;
char pad[L2CACHE_LINESZ() - sizeof(SLIST_HEADER)];
shm_node_cache* cache;
};


shm_queue*
shm_queue_create(
shm_node_cache* const cache,
char const* const name
) {
shm_queue* const _this =
/* create file map with name, alloc and align shm */
InitializeSListHead(&_this->anchor);
_this->cache = cache;
return _this;
}


int
shm_queue_process(
shm_queue* const _this,
void (*fp_process) (shm_node_data*)
) {
int i = 0;
shm_node* node = (shm_node*)InterlockedFlushSList(&_this->anchor);

/* reverse order of node list to create FIFO */

while (node ) {
shm_node* const nx = (shm_node*)nx->slnode.Next;
fp_process(&node->data);
shm_node_cache_push(_this->cache, node);
++i;
node = nx;
}
return i;
}


BOOL
shm_queue_push(
shm_queue* const _this,
shm_node_data* const data
) {
shm_node* const node = shm_node_cache_pop(_this->cache);
if (! node) { return FALSE; }
node->data = *data;
InterlockedPushEntrySList(&_this->anchor, &node->slnode);
return TRUE;
}

____________________________________________________________




Also, for what its worth, here is how Microsoft implements the SList API
under 32-bit systems:



typedef struct slist_anchor_s slist_anchor;
typedef struct slist_node_s slist_node;

struct slist_node_s {
slist_node* nx;
};

struct slist_anchor_s {
slist_node* head;
/* monotonic counter to avoid the infamous aba problem */
__int32 aba;
};


void
slist_anchor_init(
slist_anchor* const _this
) {
_this->head = 0;
_this->aba = 0;
}


void
slist_anchor_push(
slist_anchor* const _this,
slist_node* node
) {
slist_node* cmp;
do {
cmp = _this->head;
node->nx = cmp;
} while (! CAS(&_this->anchor, cmp, node));
}



slist_node*
slist_anchor_pop(
slist_anchor* const _this
) {
slist_anchor cmp, xchg;
do {
cmp = *_this;
if (! cmp.head) { return 0; }

/* hazard! refer to PDR algorihtm for more details... */
xchg.head = cmp.head->nx;

/* increment ABA */
xchg.aba = cmp.aba + 1;

} while (! DWCAS(_this, &cmp, &xchg));

return cmp.head;
}




Any thoughts?

From: Chris Thomasson on

"Pops" <dude(a)nospamnospamnospam.com> wrote in message
news:%23tndYNsJIHA.4712(a)TK2MSFTNGP04.phx.gbl...
> Ben (C++ MVP) recommended using the Interlocked Singly Linked Lists API:
>
> http://msdn2.microsoft.com/en-us/library/ms686962.aspx
>
> as a more efficient synchronization method for a recent thread discussion
> in regards to a FIFO queue design.
>
> If Ben is reading or if anyone else is familiar with this API, I have a
> few questions:
>
> First, based on what I see, this is for LIFO designs (i.e., stacks)? Not
> FIFO. Am I missing something? I only see Push/Pop functions, no
> head/tail functions.
>
> Seconds, can you tell how it works with IPC, can it work for IPC?

FWIW, if you want to create a blocking lock-less queue (e.g., consumer waits
while queue is empty), well here is an algorithm I invented a while back:

http://groups.google.de/group/comp.programming.threads/msg/632b6bdc2a137459


__________________________________________________

< pseudo-code >


typedef struct node_
{
struct node_ *next;
const void *state;



} node_t;


typedef struct stack_anchor_
{
node_t *front;
unsigned __int32 state;


} volatile stack_anchor_t;


typedef struct stack_
{
stack_anchor_t anchor;
HANDLE waitset;


} stack_t;


/* returns non-zero and updates comprand on failure */
extern int dwcas( volatile void *dest, void *cmp, const void *xchg );

void push( stack_t *_this, node_t *node )
{
stack_anchor_t xchg, cmp;


xchg.front = node;
cmp = _this->anchor;


do
{
node->next = cmp.front;


if ( cmp.state & 0x0000FFFF )
{
xchg.state = cmp.state - 1; /* waiters */
}


else { xchg.state = cmp.state; }


} while ( dwcas( &_this->anchor, &cmp, &xchg ) );


if ( cmp.state & 0x0000FFFF )
{
/* wake one */
if ( ! ReleaseSemaphore
( _this->waitset,
1,
0 ) )
{
assert( 0 ); abort();
}
}



}


node_t* pop( stack_t *_this )
{
stack_anchor_t xchg, cmp = _this->anchor;

for ( ;; )
{
if ( cmp.front ) /* condition predicate */
{
xchg.front = cmp.front->next; /* hazard */
xchg.state = cmp.state + 0x10000; /* aba */
}


else /* stack is empty */
{
xchg.front = cmp.front;
xchg.state = cmp.state + 1; /* waiters */
}


if ( ! dwcas( &_this->anchor, &cmp, &xchg ) )
{
if ( cmp.front ) { break; }


if ( WaitForSingleObject
( _this->waitset,
INFINITE ) != WAIT_OBJECT_0 )
{
assert( 0 ); abort();
}


cmp = _this->anchor;
}
}


return cmp.front;



}

___________________________________________________________