From: Kevin on
I have Solaris 9 on my production server which runs Apache and some
other web servers. During peak load I see around 12,000 connections in
time_wait. When I perform a netstat -a | grep -i wait | wc -l, I see
around 12,000. However, when I perform a lsof -i | grep -i wait | wc -
l, I see only 100. Is this normal ?

Also, how can I use lsof to see which process is taking up maximum
connections in wait state ? Using lsof -i on the or lsof -p does not
show the correct connections in time_wait.

I just so wish that Solaris would provide a "netstat -p" option to
list processes with netstat, would make my life so much easier !

Kevin.

From: Rick Jones on
Kevin <kejoseph(a)hotmail.com> wrote:
> I have Solaris 9 on my production server which runs Apache and some
> other web servers. During peak load I see around 12,000 connections
> in time_wait. When I perform a netstat -a | grep -i wait | wc -l, I
> see around 12,000. However, when I perform a lsof -i | grep -i wait
> | wc - l, I see only 100. Is this normal ?

Your commands will include CLOSE_WAIT, FIN_WAIT_1 and FIN_WAIT_2 in
addition to TIME_WAIT. You probably should add a '-n' to that netstat
command as there isn't much point in resolving IPs and port numbers to
names for the counting.

> Also, how can I use lsof to see which process is taking up maximum
> connections in wait state ? Using lsof -i on the or lsof -p does not
> show the correct connections in time_wait.

> I just so wish that Solaris would provide a "netstat -p" option to
> list processes with netstat, would make my life so much easier !

I'm not sure it would. 99 times out of ten, a TCP connection is in
TIME_WAIT after both sides have called close(), which means there is
no longer any association with a process. About the only time there
would still be a chance of an assocation with a process is if the
process used shutdown() and had not yet called close(). That would be
the 100th time out of ten.

For a complete description of things like the TCP connection states,
the works of the late W. Richard Stevens et al and/or the not late
Stallings could be useful.

rick jones

ftp://ftp.cup.hp.com/dist/networking/tools/connhist - might need a
little polish to remove bitrot, but it may be of some help

--
denial, anger, bargaining, depression, acceptance, rebirth...
where do you want to be today?
these opinions are mine, all mine; HP might not want them anyway... :)
feel free to post, OR email to rick.jones2 in hp.com but NOT BOTH...
From: Darren Dunham on
Kevin <kejoseph(a)hotmail.com> wrote:
> Also, how can I use lsof to see which process is taking up maximum
> connections in wait state ? Using lsof -i on the or lsof -p does not
> show the correct connections in time_wait.

I'm not sure there is one. TIME_WAIT is a state for the connection
after the process has closed it. So I wouldn't expect a process to be
associated with such a state. Even if you could discover the process
that used to have the connection, it might not be running any longer.

> I just so wish that Solaris would provide a "netstat -p" option to
> list processes with netstat, would make my life so much easier !

There was some discussion about that on one of the opensolaris forums.
I don't recall any specifics.

--
Darren Dunham ddunham(a)taos.com
Senior Technical Consultant TAOS http://www.taos.com/
Got some Dr Pepper? San Francisco, CA bay area
< This line left intentionally blank to confuse you. >
From: Kevin on
Rick/Darren,

First off thanks for your reply. Based on what you are stating then,
I take it you mean that once a process is in time_wait, it does not
use up a file handle (which is why, as per your argument, lsof does
not show it). However, I am having a tough time buying it because
based on what I know even a connection in time_wait or close_wait is a
network connection and has to be associated with a process ; and if it
is associated with a process it has to take up a file handle. This
explains why many times you see a process "run out of file handles"
and notice thousands of network connections in time_wait (while there
are hardly any regular files in use). Again, in Solaris its not
possible to see this as lsof does not show it, but I have definitely
seen these in Linux and Win2k.

But, if there is a link you can provide me which validates your point
I will be more than willing to go over it.

PS: Why do I think it needs to be associated with a process ? Lets
take a step back and try to understand why a process enters TIME_WAIT
and does not get closed immediately. It enters TIME_WAIT so as to
"allow time for any remaining packets to arrive before the port gets
reused" (taken from gottry.com). Now, assuming the network connection
is not associated with a process, then if a packet arrives at that
port, it would have no way of knowing which process to associate it
with.

Thanks again,
Kevin.


From: Frank Cusack on
On 5 Mar 2007 10:12:39 -0800 "Kevin" <kejoseph(a)hotmail.com> wrote:
> First off thanks for your reply. Based on what you are stating then,
> I take it you mean that once a process is in time_wait, it does not
> use up a file handle (which is why, as per your argument, lsof does
> not show it). However, I am having a tough time buying it because
> based on what I know even a connection in time_wait or close_wait is a
> network connection and has to be associated with a process ; and if it

what you know is wrong :-)

> is associated with a process it has to take up a file handle. This
> explains why many times you see a process "run out of file handles"
> and notice thousands of network connections in time_wait (while there
> are hardly any regular files in use). Again, in Solaris its not
> possible to see this as lsof does not show it, but I have definitely
> seen these in Linux and Win2k.
>
> But, if there is a link you can provide me which validates your point
> I will be more than willing to go over it.

Pick up the Stevens TCP/IP book.

> PS: Why do I think it needs to be associated with a process ? Lets
> take a step back and try to understand why a process enters TIME_WAIT
> and does not get closed immediately. It enters TIME_WAIT so as to
> "allow time for any remaining packets to arrive before the port gets
> reused" (taken from gottry.com). Now, assuming the network connection
> is not associated with a process, then if a packet arrives at that
> port, it would have no way of knowing which process to associate it
> with.

Then the packet is dropped, just exactly the same as if a process is
associated with it. In CLOSE_WAIT the connection moves to TIME_WAIT.
This still does not require any process-specific handling.

-frank