From: Armin Zingler on
Michael Ober schrieb:
> The code below appears to work, but it eventually freezes [...]

I am not able to analyze the whole code, therefore I give only a general
answer. If an application freezes for an unknown reason, I use to press Ctrl+Break
and look at the callstacks of all threads to see what it's currently doing.
Have you tried it already? Dead locks can be one reason.


--
Armin



From: Peter Duniho on
Michael Ober wrote:
> The code below appears to work, but it eventually freezes and I have to
> reboot the server. Killing and restarting the program itself doesn't
> work as the listener socket is still bound.

For what it's worth, if you wait for the TCP CLOSE_WAIT to finish,
you'll be able to restart the server. While it's possible to do so, you
don't want to bypass the CLOSE_WAIT, because there is theoretically the
possibility of network messages still in transit that, if received after
you've restarted the server, could corrupt the state of your server.

> I have put debugging
> statements in to verify that the ThreadPool threads are being released
> and the most thread pool threads that this code used was 3, even under
> heavy load. The problem appears to be when there are no clients for an
> extended period and then a client attempts to connect. At that point,
> the application freezes.

When you break in the debugger, where are all the threads? What are
they doing?

> I have left out the BankInfo object which
> contains the bank name, address, phone number, and routing number as
> well as Library Routines such as WriteLog and SMTPMail.<method>. The
> library routines have been in use in other applications for several
> years now and are thoroughly debugged.

Unfortunately, your code example is far from concise, nor is it complete
(in particular, lacking the entire other half of the network
connection). Without any way for anyone else to run your program and
see it fail, there's no way to know for sure what's wrong.

That said, here are some highlights of what appear to me to be problem
areas. Some may even be related to the problem you're seeing:

-- Use of "On Error Resume Next". Don't do this. You are
basically just ignoring exceptions. Bad. Even in VB.NET, it's bad.

-- Use of ReaderWriterLockSlim.IsReadLockHeld and .IsWriteLockHeld.
It's bad enough that, contrary to the documentation's instructions,
you are using the property to control program flow. But to do it in a loop?

-- Unsynchronized access to the _ReceiveResults and _SendResults
members. You should at a minimum be using Thread.VolatileRead() and
..VolatileWrite(). Even better would be to just use SyncLock or similar.

-- That said, the mere presence of _ReceiveResults and _SendResults
is a red flag to me. It's possible that with a purely transactional
connection, this could work, and maybe that's what you're dealing with.
But it's bad form in any case, and if you have _any_ possibility of
overlapping send or receive operations, you've got big trouble. This
design is a maintenance problem, and potentially a serious one.

-- The code potentially raises a ConnectionClosed event for a
connection for which a NewConnection event was never raised. If nothing
else, this is IMHO poor design. If you fix the blank-check error
handling, it will likely be a real problem.

There may be other problems I didn't notice. Between the fact that I
hardly ever use VB.NET and the fact that the code example is not a
concise-but-complete one, I admit there may be things I overlooked.
But, at the very least, there are some things in the code that I
consider to be serious problems, and it's possible that fixing those
aspects of the code will make the problem go away (I'm particularly
concerned about the lack of synchronized access to fields that are used
to coordinate between threads and operations).

Pete
From: Michael Ober on

"Peter Duniho" <no.peted.spam(a)no.nwlink.spam.com> wrote in message
news:OnE9wC5eKHA.1596(a)TK2MSFTNGP06.phx.gbl...
> Michael Ober wrote:
>> The code below appears to work, but it eventually freezes and I have to
>> reboot the server. Killing and restarting the program itself doesn't
>> work as the listener socket is still bound.
>
> For what it's worth, if you wait for the TCP CLOSE_WAIT to finish, you'll
> be able to restart the server. While it's possible to do so, you don't
> want to bypass the CLOSE_WAIT, because there is theoretically the
> possibility of network messages still in transit that, if received after
> you've restarted the server, could corrupt the state of your server.

The IPServer class is actually used in two different server applications.
In this particular instance, which runs on Windows Server 2003 R2, I can
indeed wait for the TCP CLOSE_WAIT - it's faster to reboot. In the other
application, which runs on Windows XP, it appears that XP never closes the
listening socket. I suspect the XP behavior has more to do with another
server that must run on that particular box for which I have no source code
than it does with XP itself.

>
>> I have put debugging statements in to verify that the ThreadPool threads
>> are being released and the most thread pool threads that this code used
>> was 3, even under heavy load. The problem appears to be when there are
>> no clients for an extended period and then a client attempts to connect.
>> At that point, the application freezes.
>
> When you break in the debugger, where are all the threads? What are they
> doing?
>

Monday I'll modify our network to use my workstation as the server for this
particular application and I'll start a VS 2008 session. On Tuesday I'll be
able to see what happens. In the meantime I'm hoping for feedback on the
code to try to correct the problems it has. What's really strange is that
this application had been running fine for several months until I had to
rebuild the server after a disk crash.

>> I have left out the BankInfo object which contains the bank name,
>> address, phone number, and routing number as well as Library Routines
>> such as WriteLog and SMTPMail.<method>. The library routines have been
>> in use in other applications for several years now and are thoroughly
>> debugged.
>
> Unfortunately, your code example is far from concise, nor is it complete
> (in particular, lacking the entire other half of the network connection).
> Without any way for anyone else to run your program and see it fail,
> there's no way to know for sure what's wrong.

The other end is a currently a standard TCP client. It sends a request and
waits for the response. The client applications do have to be coded so
that if the server timeout expires they detect the dropped connection and
reconnect if they need to send again.

>
> That said, here are some highlights of what appear to me to be problem
> areas. Some may even be related to the problem you're seeing:
>
> -- Use of "On Error Resume Next". Don't do this. You are basically
> just ignoring exceptions. Bad. Even in VB.NET, it's bad.
>

I only use this where I don't care about the error but do want each
instruction to attempt execution. Is there a clean way of handling this
with try ... catch ... end try? In this particular case, removing the On
Error Resume Next statement was easily handled by checking for Nothing
(null) and collection.Contains() before doing the work.

> -- Use of ReaderWriterLockSlim.IsReadLockHeld and .IsWriteLockHeld.
> It's bad enough that, contrary to the documentation's instructions, you
> are using the property to control program flow. But to do it in a loop?

After static analisys of the code, a finally clause on each of the two try
blocks was sufficient to handle this. Since OnReceive would be called in
different threads and not be reentrant, the loops can safely be removed.
The purpose of this was to ensure I didn't have a ReaderWriterLockSlim
causing a connection to stall.

>
> -- Unsynchronized access to the _ReceiveResults and _SendResults
> members. You should at a minimum be using Thread.VolatileRead() and
> .VolatileWrite(). Even better would be to just use SyncLock or similar.
>
> -- That said, the mere presence of _ReceiveResults and _SendResults is
> a red flag to me. It's possible that with a purely transactional
> connection, this could work, and maybe that's what you're dealing with.
> But it's bad form in any case, and if you have _any_ possibility of
> overlapping send or receive operations, you've got big trouble. This
> design is a maintenance problem, and potentially a serious one.
>

Do you have a suggestion for a better design? I'm trying to keep this as
asynchronous as possible. The design concept was that if the routine
MustOverride routine has to do a lot of work and send back large numbers of
results, it can call the underlying send method multiple times before
returning, although looking at the code, I don't provide that particular
routine the ability to access the Send method.

Like I said in my original post, this program seems to lock up when there
are long periods (measured in hours) of no clients. If access to these two
variables was a problem, I would expect problems to show up during heavy
usage, not during idle periods. This server class is also used in another
application running on an XP SP3 system that is under significantly heavier
load with more multithreading going on than this particular application.

> -- The code potentially raises a ConnectionClosed event for a
> connection for which a NewConnection event was never raised. If nothing
> else, this is IMHO poor design. If you fix the blank-check error
> handling, it will likely be a real problem.
>

In my logs, I have never seen the exception handler for the OnAccept(ar as
IAsyncResults) subroutine called. I understand your comment, however. The
RaiseEvent ConnectionClosed event was actually added as part of the
troubleshooting so removing it isn't an issue.

> There may be other problems I didn't notice. Between the fact that I
> hardly ever use VB.NET and the fact that the code example is not a
> concise-but-complete one, I admit there may be things I overlooked. But,
> at the very least, there are some things in the code that I consider to be
> serious problems, and it's possible that fixing those aspects of the code
> will make the problem go away (I'm particularly concerned about the lack
> of synchronized access to fields that are used to coordinate between
> threads and operations).
>
> Pete

Pete, I definitely thank you for the feedback.

Mike.

From: Tom Shelton on
On 2009-12-13, Michael Ober <obermd> wrote:
>
> "Peter Duniho" <no.peted.spam(a)no.nwlink.spam.com> wrote in message
> news:OnE9wC5eKHA.1596(a)TK2MSFTNGP06.phx.gbl...
>> Michael Ober wrote:
>>> The code below appears to work, but it eventually freezes and I have to
>>> reboot the server. Killing and restarting the program itself doesn't
>>> work as the listener socket is still bound.
>>
>> For what it's worth, if you wait for the TCP CLOSE_WAIT to finish, you'll
>> be able to restart the server. While it's possible to do so, you don't
>> want to bypass the CLOSE_WAIT, because there is theoretically the
>> possibility of network messages still in transit that, if received after
>> you've restarted the server, could corrupt the state of your server.
>
> The IPServer class is actually used in two different server applications.
> In this particular instance, which runs on Windows Server 2003 R2, I can
> indeed wait for the TCP CLOSE_WAIT - it's faster to reboot. In the other
> application, which runs on Windows XP, it appears that XP never closes the
> listening socket. I suspect the XP behavior has more to do with another
> server that must run on that particular box for which I have no source code
> than it does with XP itself.
>
>>
>>> I have put debugging statements in to verify that the ThreadPool threads
>>> are being released and the most thread pool threads that this code used
>>> was 3, even under heavy load. The problem appears to be when there are
>>> no clients for an extended period and then a client attempts to connect.
>>> At that point, the application freezes.
>>
>> When you break in the debugger, where are all the threads? What are they
>> doing?
>>
>
> Monday I'll modify our network to use my workstation as the server for this
> particular application and I'll start a VS 2008 session. On Tuesday I'll be
> able to see what happens. In the meantime I'm hoping for feedback on the
> code to try to correct the problems it has. What's really strange is that
> this application had been running fine for several months until I had to
> rebuild the server after a disk crash.
>
>>> I have left out the BankInfo object which contains the bank name,
>>> address, phone number, and routing number as well as Library Routines
>>> such as WriteLog and SMTPMail.<method>. The library routines have been
>>> in use in other applications for several years now and are thoroughly
>>> debugged.
>>
>> Unfortunately, your code example is far from concise, nor is it complete
>> (in particular, lacking the entire other half of the network connection).
>> Without any way for anyone else to run your program and see it fail,
>> there's no way to know for sure what's wrong.
>
> The other end is a currently a standard TCP client. It sends a request and
> waits for the response. The client applications do have to be coded so
> that if the server timeout expires they detect the dropped connection and
> reconnect if they need to send again.
>
>>
>> That said, here are some highlights of what appear to me to be problem
>> areas. Some may even be related to the problem you're seeing:
>>
>> -- Use of "On Error Resume Next". Don't do this. You are basically
>> just ignoring exceptions. Bad. Even in VB.NET, it's bad.
>>
>
> I only use this where I don't care about the error but do want each
> instruction to attempt execution. Is there a clean way of handling this
> with try ... catch ... end try? In this particular case, removing the On
> Error Resume Next statement was easily handled by checking for Nothing
> (null) and collection.Contains() before doing the work.
>
>> -- Use of ReaderWriterLockSlim.IsReadLockHeld and .IsWriteLockHeld.
>> It's bad enough that, contrary to the documentation's instructions, you
>> are using the property to control program flow. But to do it in a loop?
>
> After static analisys of the code, a finally clause on each of the two try
> blocks was sufficient to handle this. Since OnReceive would be called in
> different threads and not be reentrant, the loops can safely be removed.
> The purpose of this was to ensure I didn't have a ReaderWriterLockSlim
> causing a connection to stall.
>
>>
>> -- Unsynchronized access to the _ReceiveResults and _SendResults
>> members. You should at a minimum be using Thread.VolatileRead() and
>> .VolatileWrite(). Even better would be to just use SyncLock or similar.
>>
>> -- That said, the mere presence of _ReceiveResults and _SendResults is
>> a red flag to me. It's possible that with a purely transactional
>> connection, this could work, and maybe that's what you're dealing with.
>> But it's bad form in any case, and if you have _any_ possibility of
>> overlapping send or receive operations, you've got big trouble. This
>> design is a maintenance problem, and potentially a serious one.
>>
>
> Do you have a suggestion for a better design? I'm trying to keep this as
> asynchronous as possible. The design concept was that if the routine
> MustOverride routine has to do a lot of work and send back large numbers of
> results, it can call the underlying send method multiple times before
> returning, although looking at the code, I don't provide that particular
> routine the ability to access the Send method.
>
> Like I said in my original post, this program seems to lock up when there
> are long periods (measured in hours) of no clients. If access to these two
> variables was a problem, I would expect problems to show up during heavy
> usage, not during idle periods. This server class is also used in another
> application running on an XP SP3 system that is under significantly heavier
> load with more multithreading going on than this particular application.
>
>> -- The code potentially raises a ConnectionClosed event for a
>> connection for which a NewConnection event was never raised. If nothing
>> else, this is IMHO poor design. If you fix the blank-check error
>> handling, it will likely be a real problem.
>>
>
> In my logs, I have never seen the exception handler for the OnAccept(ar as
> IAsyncResults) subroutine called. I understand your comment, however. The
> RaiseEvent ConnectionClosed event was actually added as part of the
> troubleshooting so removing it isn't an issue.
>
>> There may be other problems I didn't notice. Between the fact that I
>> hardly ever use VB.NET and the fact that the code example is not a
>> concise-but-complete one, I admit there may be things I overlooked. But,
>> at the very least, there are some things in the code that I consider to be
>> serious problems, and it's possible that fixing those aspects of the code
>> will make the problem go away (I'm particularly concerned about the lack
>> of synchronized access to fields that are used to coordinate between
>> threads and operations).
>>
>> Pete
>
> Pete, I definitely thank you for the feedback.
>
> Mike.
>

Mike... You mention that it seems to happen after long periods of
inactivity. I remember at a former employer having a similar issue with a
server - it had to do with the network card drivers power settings.
Basically, the card was being shutdown and not waking up properly when a
connection was made.... Just a thought :)

--
Tom Shelton
From: Michael Ober on

"Tom Shelton" <tom_shelton(a)comcastXXXXXXX.net> wrote in message
news:%23pu7DN6eKHA.4112(a)TK2MSFTNGP06.phx.gbl...
> On 2009-12-13, Michael Ober <obermd> wrote:
>>
>> "Peter Duniho" <no.peted.spam(a)no.nwlink.spam.com> wrote in message
>> news:OnE9wC5eKHA.1596(a)TK2MSFTNGP06.phx.gbl...
>>> Michael Ober wrote:
>>>> The code below appears to work, but it eventually freezes and I have to
>>>> reboot the server. Killing and restarting the program itself doesn't
>>>> work as the listener socket is still bound.
>>>
>>> For what it's worth, if you wait for the TCP CLOSE_WAIT to finish,
>>> you'll
>>> be able to restart the server. While it's possible to do so, you don't
>>> want to bypass the CLOSE_WAIT, because there is theoretically the
>>> possibility of network messages still in transit that, if received after
>>> you've restarted the server, could corrupt the state of your server.
>>
>> The IPServer class is actually used in two different server applications.
>> In this particular instance, which runs on Windows Server 2003 R2, I can
>> indeed wait for the TCP CLOSE_WAIT - it's faster to reboot. In the other
>> application, which runs on Windows XP, it appears that XP never closes
>> the
>> listening socket. I suspect the XP behavior has more to do with another
>> server that must run on that particular box for which I have no source
>> code
>> than it does with XP itself.
>>
>>>
>>>> I have put debugging statements in to verify that the ThreadPool
>>>> threads
>>>> are being released and the most thread pool threads that this code used
>>>> was 3, even under heavy load. The problem appears to be when there are
>>>> no clients for an extended period and then a client attempts to
>>>> connect.
>>>> At that point, the application freezes.
>>>
>>> When you break in the debugger, where are all the threads? What are
>>> they
>>> doing?
>>>
>>
>> Monday I'll modify our network to use my workstation as the server for
>> this
>> particular application and I'll start a VS 2008 session. On Tuesday I'll
>> be
>> able to see what happens. In the meantime I'm hoping for feedback on the
>> code to try to correct the problems it has. What's really strange is
>> that
>> this application had been running fine for several months until I had to
>> rebuild the server after a disk crash.
>>
>>>> I have left out the BankInfo object which contains the bank name,
>>>> address, phone number, and routing number as well as Library Routines
>>>> such as WriteLog and SMTPMail.<method>. The library routines have been
>>>> in use in other applications for several years now and are thoroughly
>>>> debugged.
>>>
>>> Unfortunately, your code example is far from concise, nor is it complete
>>> (in particular, lacking the entire other half of the network
>>> connection).
>>> Without any way for anyone else to run your program and see it fail,
>>> there's no way to know for sure what's wrong.
>>
>> The other end is a currently a standard TCP client. It sends a request
>> and
>> waits for the response. The client applications do have to be coded so
>> that if the server timeout expires they detect the dropped connection and
>> reconnect if they need to send again.
>>
>>>
>>> That said, here are some highlights of what appear to me to be problem
>>> areas. Some may even be related to the problem you're seeing:
>>>
>>> -- Use of "On Error Resume Next". Don't do this. You are basically
>>> just ignoring exceptions. Bad. Even in VB.NET, it's bad.
>>>
>>
>> I only use this where I don't care about the error but do want each
>> instruction to attempt execution. Is there a clean way of handling this
>> with try ... catch ... end try? In this particular case, removing the On
>> Error Resume Next statement was easily handled by checking for Nothing
>> (null) and collection.Contains() before doing the work.
>>
>>> -- Use of ReaderWriterLockSlim.IsReadLockHeld and .IsWriteLockHeld.
>>> It's bad enough that, contrary to the documentation's instructions, you
>>> are using the property to control program flow. But to do it in a loop?
>>
>> After static analisys of the code, a finally clause on each of the two
>> try
>> blocks was sufficient to handle this. Since OnReceive would be called in
>> different threads and not be reentrant, the loops can safely be removed.
>> The purpose of this was to ensure I didn't have a ReaderWriterLockSlim
>> causing a connection to stall.
>>
>>>
>>> -- Unsynchronized access to the _ReceiveResults and _SendResults
>>> members. You should at a minimum be using Thread.VolatileRead() and
>>> .VolatileWrite(). Even better would be to just use SyncLock or similar.
>>>
>>> -- That said, the mere presence of _ReceiveResults and _SendResults
>>> is
>>> a red flag to me. It's possible that with a purely transactional
>>> connection, this could work, and maybe that's what you're dealing with.
>>> But it's bad form in any case, and if you have _any_ possibility of
>>> overlapping send or receive operations, you've got big trouble. This
>>> design is a maintenance problem, and potentially a serious one.
>>>
>>
>> Do you have a suggestion for a better design? I'm trying to keep this as
>> asynchronous as possible. The design concept was that if the routine
>> MustOverride routine has to do a lot of work and send back large numbers
>> of
>> results, it can call the underlying send method multiple times before
>> returning, although looking at the code, I don't provide that particular
>> routine the ability to access the Send method.
>>
>> Like I said in my original post, this program seems to lock up when there
>> are long periods (measured in hours) of no clients. If access to these
>> two
>> variables was a problem, I would expect problems to show up during heavy
>> usage, not during idle periods. This server class is also used in
>> another
>> application running on an XP SP3 system that is under significantly
>> heavier
>> load with more multithreading going on than this particular application.
>>
>>> -- The code potentially raises a ConnectionClosed event for a
>>> connection for which a NewConnection event was never raised. If nothing
>>> else, this is IMHO poor design. If you fix the blank-check error
>>> handling, it will likely be a real problem.
>>>
>>
>> In my logs, I have never seen the exception handler for the OnAccept(ar
>> as
>> IAsyncResults) subroutine called. I understand your comment, however.
>> The
>> RaiseEvent ConnectionClosed event was actually added as part of the
>> troubleshooting so removing it isn't an issue.
>>
>>> There may be other problems I didn't notice. Between the fact that I
>>> hardly ever use VB.NET and the fact that the code example is not a
>>> concise-but-complete one, I admit there may be things I overlooked. But,
>>> at the very least, there are some things in the code that I consider to
>>> be
>>> serious problems, and it's possible that fixing those aspects of the
>>> code
>>> will make the problem go away (I'm particularly concerned about the lack
>>> of synchronized access to fields that are used to coordinate between
>>> threads and operations).
>>>
>>> Pete
>>
>> Pete, I definitely thank you for the feedback.
>>
>> Mike.
>>
>
> Mike... You mention that it seems to happen after long periods of
> inactivity. I remember at a former employer having a similar issue with a
> server - it had to do with the network card drivers power settings.
> Basically, the card was being shutdown and not waking up properly when a
> connection was made.... Just a thought :)
>
> --
> Tom Shelton

It's a VMWare guest. I just checked the guest OS from home and I'll check
the host on Monday. Thanks for the idea.

Mike.