From: zedkay on
On 08/06/2010 03:36 PM, J G Miller wrote:
> On Fri, 06 Aug 2010 07:22:31 +0000, Huge wrote:
> If writing data to an NFS mounted disk, then it is much safer to use
> hard to avoid possible data corruption issues, but if your data is
> really that critical, you should not be writing it to an NFS mounted disk,
> eg save locally and then use rsync, and obviously for multi client
> operation you would be using a database anyways ...

Actually, one should a clustered filesystem,
http://en.wikipedia.org/wiki/Clustered_file_system
and then preferably in a SAN environment.

--
Please do not reply to my Email address. It is a faux Email address.
Cyberpunk FPS/MMORG www.neocron.com
Runs on Windows, platinum in latest WINE/Ubuntu. Running since 2002.
From: Ignoramus29207 on
On 2010-08-09, zedkay <zedkay(a)maileater.com> wrote:
> On 08/06/2010 03:36 PM, J G Miller wrote:
>> On Fri, 06 Aug 2010 07:22:31 +0000, Huge wrote:
>> If writing data to an NFS mounted disk, then it is much safer to use
>> hard to avoid possible data corruption issues, but if your data is
>> really that critical, you should not be writing it to an NFS mounted disk,
>> eg save locally and then use rsync, and obviously for multi client
>> operation you would be using a database anyways ...
>
> Actually, one should a clustered filesystem,
> http://en.wikipedia.org/wiki/Clustered_file_system
> and then preferably in a SAN environment.
>

Guys, with all due respect for these ideas, I would really to find an
answer to my problem. The key to it seems to be this message in dmesg:

[44743.592195] nfs: server myserver not responding, still trying
[45103.592528] nfs: server myserver OK
[45980.844190] nfs: server myserver not responding, still trying

The message is not really true, the servers (at least at the time of
checking) was accessible.

i
From: Ignoramus29207 on
On 2010-08-05, Rahul <nospam(a)nospam.invalid> wrote:
> Ignoramus16841 <ignoramus16841(a)NOSPAM.16841.invalid> wrote in news:5
> _KdnW29vrfPisbRnZ2dnUVZ_rmdnZ2d(a)giganews.com:
>
>> Could be some sort of timing bug in the NFS server code?
>>>
>>
>> It could be anything, but I am more inclined to blame the client, as
>>
>> 1) Many clients work just fine
>> 2) Even my bad desktop works fine when it connects after a reboot
>>
>
> Does it matter if it is a soft or hard mount?
>
> I'm just guessing here. I've a long mount option string on my server:
>
> eustorage:/opt /opt nfs rw,nodev,noatime,nfsvers=3,timeo=110,retrans=
> 50,hard,intr,proto=udp,rsize=32768,wsize=32768 0 0
>
> But I'd be hard pressed to explain why I chose each option that I have in
> there. :)
>
>

None of this really makes any difference.

i
From: Ignoramus29207 on
On 2010-08-05, Rahul <nospam(a)nospam.invalid> wrote:
> Ignoramus16841 <ignoramus16841(a)NOSPAM.16841.invalid> wrote in news:5
> _KdnW29vrfPisbRnZ2dnUVZ_rmdnZ2d(a)giganews.com:
>
>> Could be some sort of timing bug in the NFS server code?
>>>
>>
>> It could be anything, but I am more inclined to blame the client, as
>>
>> 1) Many clients work just fine
>> 2) Even my bad desktop works fine when it connects after a reboot
>>
>
> Does it matter if it is a soft or hard mount?
>
> I'm just guessing here. I've a long mount option string on my server:
>
> eustorage:/opt /opt nfs rw,nodev,noatime,nfsvers=3,timeo=110,retrans=
> 50,hard,intr,proto=udp,rsize=32768,wsize=32768 0 0
>
> But I'd be hard pressed to explain why I chose each option that I have in
> there. :)
>
>

I culd explain some, but what is broken is something much more basic.

i
From: Rahul on
Ignoramus29207 <ignoramus29207(a)NOSPAM.29207.invalid> wrote in
news:2Oqdnb96RYTHvf3RnZ2dnUVZ_tOdnZ2d(a)giganews.com:

> [44743.592195] nfs: server myserver not responding, still trying
> [45103.592528] nfs: server myserver OK
> [45980.844190] nfs: server myserver not responding, still trying

Now that's interesting. I had the exact same problem on many machines a few
months ago. In that case we diagnosed it to be a problem with the Broadcom
drivers that caused them to hang under load.

http://bugs.centos.org/view.php?id=3832
http://lopsa.org/node/1836

The solution was to update modprobe.conf

options bnx2 disable_msi=1

Now it would be a stretch to imagine this is the same you are facing but I
thought I'd throw it out there.

In the past I've also seen this same error when many clients mount to the
same server and the number of NFS threads is too low. But I guess if you
see it on only one client I doubt this could be your issue. Have you tried
cat /proc/net/rpc/nfsd on the server to see if threads are all busy often?

--
Rahul