From: bzaman on
Hi ,

The issue is regarding some NFS issue we are seeing in our
production hosts.

One of our process is failing regularly because of unable to read
some files from a NFS filer .

The following are the error messages that can be seen .

======
/net/edw12/warehouse/stores//links//20091112/l/us/data/170.gz:
uncompress failed/net/edw12/warehouse/stores/links/20091112/l/us/data/
270.gz: uncompress failed

Broken pipe
/net/edw12/warehouse/stores/links/20091112/l/us/data/271.gz:
uncompress failed
======

We are suspecting high load on the NFS filer due to which it is unable
to read the file . I tried using nfsstat and could see some results
which looks like NFS issues only . I am not able to interpret the
result as the manual page does not have enough info on interpreting
the output.
The following is the output of nfsstat

======
-bash-3.2$ nfsstat -s -W

Server Info:
Getattr Setattr Lookup Readlink Read Write
Create Remove
14 19847 119217 7 4122 26662
12739 3409
Rename Link Symlink Mkdir Rmdir Readdir
RdirPlus Access
10071 0 0 5556 163 2086
0 5750206
Mknod Fsstat Fsinfo PathConf Commit
0 1740332 123 0 14498
Server Ret-Failed
62491
Server Faults
0
Server Cache Stats:
Inprog Idem Non-idem Misses
0 0 0 7709227
Server Write Gathering:
WriteOps WriteRPC Opsaved
26662 26662 0
=====


Many other process from different host are accessing the filer but we
are seeing the issue on a particular host only. The above output is
from the affected host only . We can see Server Ret-Failed Field has a
larger value while the value of this field is zero on all other
hosts. Is it pointing that we are seeing timeout while accessing
files from the NFS filer on this host . The Filer is busy during the
time on which this process tries to access the filer , so probably we
are not seeing the read failue issue on other hosts.
Please clarify . If you need any further info , please let me know.
Also , any pointers or documentations on nfsstat will be highly
appreciated .


Thanks in Advance
Zaman