From: jim clark on
I'm investigating an issue with some old multi-threaded code that uses
SetWaitableTimer. Everything below occurs in the main thread:

The code creates configures the timer to fire at some point in the future
and uses the pfnCompletionRoutine parameter to configure an APC routine that
will be called at that moment.

The majority of the time the main thread sits in a loop, waiting using
WaitForMultipleObjects (not Ex) for approx 1 minute on various events that
occur in other threads. If one of the events fires it performs some other
actions and then starts waiting again. This process continues forever.

The APC routine configured with SetWaitableTimer has always fired at the
appropriate moment and has run without fault on hundreds of computers for
months. However, it has now started to fail on a few systems and no obvious
reason.

When I re-read the docs I realised that I should probably be using
WaitForMultipleObjectsEX with the alterable flag set. When I tried this it
seemed to work fine as expected...



Why did it work so well for so long and then fail on a few computers? (It
still works fine everywhere else, same code, same installed software, same
everything as far as I can tell)

Is a fix as simple as changed to WaitForMultipleObjectsEx? Have I missed
something (I feel pretty confused)?


James


From: jim clark on
To offer a partial answer to my own question:

Could it be that it has never worked but some other API call *normally*
called an Wait...Ex function, thus putting my thread in an alertable state,
except on these problem computers due to a currently unknown other factor?


"jim clark" <nospam-jimc(a)ergo.co.uk-nospam> wrote in message
news:sYQ7k.36202$Kb.12464(a)newsfe29.ams2...
> I'm investigating an issue with some old multi-threaded code that uses
> SetWaitableTimer. Everything below occurs in the main thread:
>
> The code creates configures the timer to fire at some point in the future
> and uses the pfnCompletionRoutine parameter to configure an APC routine
> that will be called at that moment.
>
> The majority of the time the main thread sits in a loop, waiting using
> WaitForMultipleObjects (not Ex) for approx 1 minute on various events that
> occur in other threads. If one of the events fires it performs some other
> actions and then starts waiting again. This process continues forever.
>
> The APC routine configured with SetWaitableTimer has always fired at the
> appropriate moment and has run without fault on hundreds of computers for
> months. However, it has now started to fail on a few systems and no
> obvious reason.
>
> When I re-read the docs I realised that I should probably be using
> WaitForMultipleObjectsEX with the alterable flag set. When I tried this it
> seemed to work fine as expected...
>
>
>
> Why did it work so well for so long and then fail on a few computers? (It
> still works fine everywhere else, same code, same installed software, same
> everything as far as I can tell)
>
> Is a fix as simple as changed to WaitForMultipleObjectsEx? Have I missed
> something (I feel pretty confused)?
>
>
> James
>


From: Jeroen Mostert on
jim clark wrote:
> "jim clark" <nospam-jimc(a)ergo.co.uk-nospam> wrote in message
> news:sYQ7k.36202$Kb.12464(a)newsfe29.ams2...
>> I'm investigating an issue with some old multi-threaded code that uses
>> SetWaitableTimer. Everything below occurs in the main thread:
>>
>> The code creates configures the timer to fire at some point in the future
>> and uses the pfnCompletionRoutine parameter to configure an APC routine
>> that will be called at that moment.
>>
>> The majority of the time the main thread sits in a loop, waiting using
>> WaitForMultipleObjects (not Ex) for approx 1 minute on various events that
>> occur in other threads. If one of the events fires it performs some other
>> actions and then starts waiting again. This process continues forever.
>>
>> The APC routine configured with SetWaitableTimer has always fired at the
>> appropriate moment and has run without fault on hundreds of computers for
>> months. However, it has now started to fail on a few systems and no
>> obvious reason.
>>
>> When I re-read the docs I realised that I should probably be using
>> WaitForMultipleObjectsEX with the alterable flag set. When I tried this it
>> seemed to work fine as expected...
>>
>>
>>
>> Why did it work so well for so long and then fail on a few computers? (It
>> still works fine everywhere else, same code, same installed software, same
>> everything as far as I can tell)
>>
>> Is a fix as simple as changed to WaitForMultipleObjectsEx? Have I missed
>> something (I feel pretty confused)?
> To offer a partial answer to my own question:
>
> Could it be that it has never worked but some other API call *normally*
> called an Wait...Ex function, thus putting my thread in an alertable state,
> except on these problem computers due to a currently unknown other factor?
>
Absolutely. In fact, this is one of the more serious problems with alertable
wait states -- they're supposed to guarantee APCs are only triggered when
the application's ready to handle them, but functions mostly don't document
if they can (indirectly) enter an alertable wait state, so you might find
yourself unexpectedly handling APCs. See the discussion in
http://blogs.msdn.com/oldnewthing/archive/2006/05/03/589110.aspx

You would expect functions doing their best to *not* enter a wait state (let
alone an alertable one), so it's certainly a possibility that code that's
expected to enter an alertable wait state... doesn't.

--
J.