From: Peter Duniho on
Goran Sliskovic wrote:
> CLOSE_WAIT means that the TCP connection has been terminated, but the
> socket descriptor has not been closed (by calling close by application,
> of course). There is no possibility that there are messages still in
> transit (at least messages that could do any damage). TIME_WAIT state is
> there to ensure this.

Sorry�I have confused the two. Thanks.

> The socket is bound in OP's case because it is not closed, for whatever
> reason. Connection is closed, however socket is not.

The OP mentions another case where the connection indefinitely remains
in CLOSE_WAIT even after the process exits. Can you think of a reason
that might happen? I couldn't.

Pete
From: Goran Sliskovic on

"Peter Duniho" <no.peted.spam(a)no.nwlink.spam.com> wrote in message
news:eUPItEJfKHA.5020(a)TK2MSFTNGP02.phx.gbl...
....
> The OP mentions another case where the connection indefinitely remains in
> CLOSE_WAIT even after the process exits. Can you think of a reason that
> might happen? I couldn't.
>
> Pete

Once I saw this with WebRequest class. Either .NET uses some other process
that cleans up the socket (or some other class) or process is not shown in
task manager but is still somehow active.But this would be internals of
windows/.net that I'm not familiar with. I'm pretty sure that exiting the
process should close all sockets.
The OP problem looks to me as a case of deadlock, possibly in the finalizer.
The only way to see what is going on is to attach debugger (eg. Windbg with
SOS.dll).

Regards,
Goran

From: Michael Ober on

"Goran Sliskovic" <gsliskov(a)yahoo.com> wrote in message
news:%23BVPjuQfKHA.5608(a)TK2MSFTNGP05.phx.gbl...
>
> "Peter Duniho" <no.peted.spam(a)no.nwlink.spam.com> wrote in message
> news:eUPItEJfKHA.5020(a)TK2MSFTNGP02.phx.gbl...
> ...
>> The OP mentions another case where the connection indefinitely remains in
>> CLOSE_WAIT even after the process exits. Can you think of a reason that
>> might happen? I couldn't.
>>
>> Pete
>
> Once I saw this with WebRequest class. Either .NET uses some other process
> that cleans up the socket (or some other class) or process is not shown in
> task manager but is still somehow active.But this would be internals of
> windows/.net that I'm not familiar with. I'm pretty sure that exiting the
> process should close all sockets.
> The OP problem looks to me as a case of deadlock, possibly in the
> finalizer. The only way to see what is going on is to attach debugger (eg.
> Windbg with SOS.dll).
>
> Regards,
> Goran

Unfortunately I can't attach a debugger to the XP system, but I'll try the
updated server code on Wednesday to see if the problem is corrected.

Mike.

From: Michael Ober on
Peter,

I think I found it this morning. It actually wasn't in the IPServer
namespace at all. Using Remote Desktop, I started the server inside the
debugger on my workstation yesterday and checked it a couple of times during
the day several hours apart. No problems - making the "long idle" period
unlikely. This morning I put a breakpoint on the OnAccept method and
connected, single stepping through it. Inside this method there is a call
to WriteLog, which creates consistent log files across multiple Windows
development environments (not dotNet logging) and during the change of day
processing an infinite recursive loop occurred resulting in a stack
overflow. I fixed that problem and tested by changing the clock on my
system. I'll check tomorrow morning to verify. What baffles me is why I
didn't get a Stack Overflow error on the console, even in a Debug release.
The My Namespace event handlers were a late addition to the application and
I would have expected a Stack Overflow error to be reported prior to my
adding them. That would have made it a lot quicker to find and fix.

I think I corrected all but the _SendResults and _bufOut issues that you
pointed out and I'm simply not sure how to fix that one, short of using
SyncLock. Assuming the server is working tomorrow morning, I'll post the
updated code tomorrow evening.

I also realized that creating a disconnect timer for each connection was
extremely inefficient so I recoded this feature to use a single timer in the
IPServer class and a decrement counter in each ClientInformation object.
Since timeouts are always measured in minutes and the exact timeout period
isn't critical, this change too hard to do. The decrement counter is set to
the max inactivity time passed to the IPServer.New method and decremented
once a minute by a timertick handler in the IPServer class itself. The Send
and Receive methods of the ClientInformation class reset this counter to the
time out value. Yes, there can be a race condition here, but if it occurs,
one of two things will happen. Either the server will break down the
connection, which still complies with the server's contract with its
clients, or the connection timeout will be reset, which will simply keep the
connection alive longer. In either case, if the client disconnects, the
EndRecieve method of the socket will return 0 bytes and the connection will
be shutdown by the server.

The events being raised were really for user feedback, so I combined them
into a new Event that better describes what it's for. This broke the
interface "contract" presented by the IPServer class to the application, but
as there were only two applications using this class, that wasn't a big
deal. If this had been for external use or there had been a lot of
applications using this interface, I would have added the new event and left
the old ones there.

The new code actually hides the Clients List and the ConnectionInformation
class by making them private to the IPServer class as well as flattening the
NameSpace. The original desired design was to inherit this class and
override the ProcessMessage method for the business logic, hiding all the
complexity of handling the TCP communications. There wasn't supposed to be
external access to the client connections. I was close previously, but not
quite there. This has the additional benefit of allowing multiple IPServers
to exist in the same application without the possibility of them interfering
with each other. Each IPServer override could then implement different
business logic, simplifying the ProcessMessage override. In the environment
I work in, this is useful as I have a server that collects information from
multiple sources and updates multiple external applications. This
particular application is the other one that uses this class. It uses the
IPServer class and the MSMQ interface to collect data and it updates three
different systems, providing an integrated view of the data in a near
real-time environment. The vendor of the external system would like to do
away with the MSMQ interface and go strictly TCP for updates.

As for the recompile - I'm actually surprised this application worked at all
during date changes. After the rebuild I did restore from tape and the
restored executable also crashed. The WriteLog stack overflow problem has
been in the code since it was ported from VC6 and VB6 to VB2005 several
years ago. This bug doesn't exist in the VC6 version as I have another
application server, writtin in VC++/MFC, running on the same server that
doesn't exhibit this problem. I don't know if it exists in the VB6 version
of the class and will probably never find out as all new development is in
C# and VB 2008. In my environment, there are no extended execution TCP
servers written in VB6. I used VC++/MFC for those as it was much easier for
me to handle TCP messages in VC++/MFC than in VB6.

Once again, thanks for all your feedback.

Mike Ober.

"Peter Duniho" <no.peted.spam(a)no.nwlink.spam.com> wrote in message
news:eXhLzCJfKHA.5292(a)TK2MSFTNGP06.phx.gbl...
> Michael Ober wrote:
>> [...]
>> I can duplicate this with a simple connect/disconnect. The problem
>> occurs on the connection.
>
> According to your original problem statement, the problem occurs _after_
> there has been some activity, followed by a period of idle time, and then
> a new connection. Until you've completely isolated the problem, it's not
> possible to rule out potential dependencies on the client with respect to
> the problem.
>
> If you _can_ in fact reproduce the problem with a simple connect, perhaps
> after some specific period of being idle, even so...you need to provide
> the code for that. If you expect people to reproduce the problem, you
> can't expect them to write new code to do that.
>
> If you are satisfied with conjecture, then don't provide a
> concise-but-complete code example.
>
>> [...]
>> Let me see if I understand. Dump the two class level variables and pass
>> the class (Me) through the Begin... method. Then pull the class out of
>> the AsyncState property for access to the buffers.
>
> I'm not sure I understand the above. Generally, an i/o system will track
> two kinds of data structures: "per-connection", and "per-operation". You
> seem to have a per-connection structure now. That structure would not
> contain any per-operation data (i.e. buffers). Instead, you'd have a
> separate data structure for that data.
>
>> [...]
>> I have made several changes based on your comments. I haven't touched
>> the _bufOut or the _SendResults variables, but the _ReceiveResults
>> variable was actually not used anywhere and is now gone. I've started
>> the program inside the VS debugger. I've confirmed that it is accepting
>> connections and returning results today and tomorrow morning I'll see
>> what happens when I try to connect.
>
> Just to be clear: I have not identified any "smoking gun" in the code you
> posted. There are a number of things that I feel are potential issues, at
> least with respect to maintainability if not correctness. But none are
> obviously causing problems, never mind the problem you describe.
>
> I remain curious about the answers to the questions I asked in my other
> reply, with respect to your comments that the problem started occurring
> after you had other problems with the computer and had recompiled the code
> that had been working fine for nearly a year.
>
> In spite of all the "fishy" things about the code, it's entirely possible
> that there's nothing wrong with the code that would lead to incorrect
> behavior, and that the problem lies elsewhere.
>
> Pete

From: Peter Duniho on
Michael Ober wrote:
> Peter,
>
> I think I found it this morning. It actually wasn't in the IPServer
> namespace at all.

Just as I suggested might be the case.

I'm glad you were able to find the problem. But I hope you can see,
based on this experience, why a concise-but-complete code example is so
important in terms of obtaining help. Without one, all anyone can do is
make random guesses and suggestions, none of which may have anything at
all to do with the problem.

While comments you received to your question might have been helpful in
a general sense, I doubt any of them were instrumental in leading you to
a solution to the specific problem you were asking about.

> [...]
> I think I corrected all but the _SendResults and _bufOut issues that you
> pointed out and I'm simply not sure how to fix that one, short of using
> SyncLock.

Nothing wrong with a SyncLock, IMHO. It's always most important to make
sure the code works right, then worry about whether there's a better,
more efficient solution. And .NET's Monitor class is reasonably efficient.

> [...]
> I also realized that creating a disconnect timer for each connection was
> extremely inefficient so I recoded this feature to use a single timer in
> the IPServer class and a decrement counter in each ClientInformation
> object.

If I were to code the timeout/disconnect logic from scratch, I might use
a sorted event queue, sorted by the event time and with a single thread,
and a call to Thread.Sleep() to delay until the next event. Add the
various logic to deal with resetting timers, adding new ones with a
future time earlier than the earliest current one in the queue, etc. and
you're done.

Fortunately, I believe that's basically what the System.Timers.Timer
timer does. It's specifically designed for server-related timers (or at
least, so the documentation says :) ), and I'm pretty sure one of its
main features is the ability to track a large number of timed events as
timers, without putting a significant load on the system.

Note that as compared to this approach:

> Since timeouts are always measured in minutes and the exact
> timeout period isn't critical, this change too hard to do. The
> decrement counter is set to the max inactivity time passed to the
> IPServer.New method and decremented once a minute by a timertick handler
> in the IPServer class itself. The Send and Receive methods of the
> ClientInformation class reset this counter to the time out value.

Advantage to your approach: reset of a timer is simply a new counter
value. Disadvantage to your approach: on every tick of the timer, you
have to inspect every connected client's data structure.

One downside to the sorted list of future events is the cost of the
insertion of a new element (either O(n) or O(log n), depending on the
implementation, which I don't know), borne every time you add a timer or
reset one. And of course, for large numbers of clients, this can be
large too.

Of course, if all the connections use the same timeout (which seems
likely), than any reset timer always winds up at the very end of the
sorted list, and the cost of doing that can be almost as cheap as
resetting a counter, because you already know where the new position is
in the list rather than having to search for it.

Obviously it depends somewhat on the specifics. But based on what I
think is likely about your implementation (specifically, every
connection uses the same timeout), you may find it preferable to
actually just create your own single-threaded event queue for the
timing, taking advantage of the fact that you know every new event added
(whether a new connection or one for which i/o operation happened and
thus needs a reset timer) goes to the end of the list.

I don't know for sure, but it's possible System.Timers.Timer includes an
optimization to check for end-of-list insertions, and so could in fact
perform basically as well as a custom implementation would. (I took a
quick look in Reflector, but unfortunately, the meat of the
implementation seems to be in native code, which doesn't show up in
Reflector; I can, however, verify that System.Timers.Timer just uses
System.Threading.Timer underneath, so whatever characteristics one has,
the other is pretty much the same :) ).

> Yes,
> there can be a race condition here, but if it occurs, one of two things
> will happen. Either the server will break down the connection, which
> still complies with the server's contract with its clients, or the
> connection timeout will be reset, which will simply keep the connection
> alive longer. In either case, if the client disconnects, the EndRecieve
> method of the socket will return 0 bytes and the connection will be
> shutdown by the server. [...]

Right. Timeouts always involve race conditions. The best you can do is
just make sure you have a way of knowing who won, and then get that part
right without corrupting your data. :)

Pete