From: Dave Chinner on
On Mon, May 03, 2010 at 01:54:38PM +0200, Peter Palfrader wrote:
> Hi,
>
> I have an xfs filesystem in a KVM domain with 512megs of memory and 2 gigs of
> swap.
>
> The filesystem is 750g in size, of which some 500g are in use in about 6
> million files. (This XFS filesystem is exported via nfs4. I haven't tested if
> this makes any difference.)
>
> Starting in 2.6.32.12 running something like "find | wc -l" on this
> filesystem's mountpoint causes the OOM killer to kill off most of the
> system. (See kern.log[1])

Knwon problem.

As a workaraound, you can increase the frequency at which the
xfssyncd runs so that it is less than the default 30s between
background reclaim runs.

> With 2.6.32.11 the system does not behave like this.
>
> Bisecting turned up the following commit. Reverting it in 2.6.32.12
> also results in a system that works.
>
> | 9e1e9675fb29c0e94a7c87146138aa2135feba2f is first bad commit
> | commit 9e1e9675fb29c0e94a7c87146138aa2135feba2f
> | Author: Dave Chinner <david(a)fromorbit.com>
> | Date: Fri Mar 12 09:42:10 2010 +1100
> |
> | xfs: reclaim all inodes by background tree walks

Reverting this leaves you running with a subtly altered and
completely untested reclaim path that I'm not sure does the right
thing in all situations. I wouldn't run that revert on my machines,
nor recommend it for anyone else. But it's up to you if you want to
run it on your machines....

The fix for this problem only got to mainline a couple of days ago.

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=9bf729c0af67897ea8498ce17c29b0683f7f2028

I've got to backport it to the stable kernel tree so the next stable
kernel should fix this.

Cheers,

Dave.
--
Dave Chinner
david(a)fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Peter Palfrader on
On Mon, 03 May 2010, Dave Chinner wrote:

> > Starting in 2.6.32.12 running something like "find | wc -l" on this
> > filesystem's mountpoint causes the OOM killer to kill off most of the
> > system. (See kern.log[1])
>
> Knwon problem.

> The fix for this problem only got to mainline a couple of days ago.
>
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=9bf729c0af67897ea8498ce17c29b0683f7f2028
>
> I've got to backport it to the stable kernel tree so the next stable
> kernel should fix this.

Thanks, I'll stay on .11 on that machine for now then.

--
| .''`. ** Debian GNU/Linux **
Peter Palfrader | : :' : The universal
http://www.palfrader.org/ | `. `' Operating System
| `- http://www.debian.org/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/