Do not call ->writepage[s] from direct reclaim and use a_ops->writepages() where possible [Kernel]

Prev: [GIT PULL] UBI changes for 2.6.35-rc3
Next: [PATCH 1/5] ACPI / ACPICA: Use helper function for computing GPE masks

From: Christoph Hellwig on 11 Jun 2010 12:30

On Tue, Jun 08, 2010 at 10:28:14AM +0100, Mel Gorman wrote:
> > - we also need to care about ->releasepage. At least for XFS it
> > can end up in the same deep allocator chain as ->writepage because
> > it does all the extent state conversions, even if it doesn't
> > start I/O.
>
> Dang.
>
> > I haven't managed yet to decode the ext4/btrfs codepaths
> > for ->releasepage yet to figure out how they release a page that
> > covers a delayed allocated or unwritten range.
> >
>
> If ext4/btrfs are also very deep call-chains and this series is going more
> or less the right direction, then avoiding calling ->releasepage from direct
> reclaim is one, somewhat unfortunate, option. The second is to avoid it on
> a per-filesystem basis for direct reclaim using PF_MEMALLOC to detect
> reclaimers and PF_KSWAPD to tell the difference between direct
> reclaimers and kswapd.

I went throught this a bit more and I can't actually hit that code in
XFS ->releasepage anymore. I've also audited the caller and can't see
how we could theoretically hit it anymore. Do the VM gurus know a case
where we would call ->releasepage on a page that's actually dirty and
hasn't been through block_invalidatepage before?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Christoph Hellwig on 11 Jun 2010 12:40

On Fri, Jun 11, 2010 at 01:33:20PM +0100, Mel Gorman wrote:
> Ok, I was under the mistaken impression that filesystems wanted to be
> given ranges of pages where possible. Considering that there has been no
> reaction to the patch in question from the filesystem people cc'd, I'll
> drop the problem for now.

Yes, we'd prefer them if possible. Then again we'd really prefer to
get as much I/O as possible from the flusher threads, and not kswapd.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Chris Mason on 11 Jun 2010 15:20

On Fri, Jun 11, 2010 at 12:29:12PM -0400, Christoph Hellwig wrote:
> On Tue, Jun 08, 2010 at 10:28:14AM +0100, Mel Gorman wrote:
> > > - we also need to care about ->releasepage. At least for XFS it
> > > can end up in the same deep allocator chain as ->writepage because
> > > it does all the extent state conversions, even if it doesn't
> > > start I/O.
> >
> > Dang.
> >
> > > I haven't managed yet to decode the ext4/btrfs codepaths
> > > for ->releasepage yet to figure out how they release a page that
> > > covers a delayed allocated or unwritten range.
> > >
> >
> > If ext4/btrfs are also very deep call-chains and this series is going more
> > or less the right direction, then avoiding calling ->releasepage from direct
> > reclaim is one, somewhat unfortunate, option. The second is to avoid it on
> > a per-filesystem basis for direct reclaim using PF_MEMALLOC to detect
> > reclaimers and PF_KSWAPD to tell the difference between direct
> > reclaimers and kswapd.
>
> I went throught this a bit more and I can't actually hit that code in
> XFS ->releasepage anymore. I've also audited the caller and can't see
> how we could theoretically hit it anymore. Do the VM gurus know a case
> where we would call ->releasepage on a page that's actually dirty and
> hasn't been through block_invalidatepage before?

Which part of xfs releasepage are you trying to avoid?

dirty = xfs_page_state_convert(inode, page, &wbc, 0, 0);
if (dirty == 0 && !unwritten)
goto free_buffers;

I'd expect the above was fixed by page_mkwrite, which should be dealing
with all the funny corners that we used to have to mess with in
releasepage.

btrfs_release_page does no allocations, it only checks to see if the
page is busy somehow (dirty/writeback etc).

-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Andrea Arcangeli on 15 Jun 2010 10:10

Hi Mel,

I know lots of people doesn't like direct reclaim, but I personally do
and I think if memory pressure is hard enough we should eventually
enter direct reclaim full force including ->writepage to avoid false
positive OOM failures. Transparent hugepage allocation in fact won't
even wakeup kswapd that would be insist to create hugepages and shrink
an excessive amount of memory (especially before memory compaction was
merged, it shall be tried again but if memory compaction fails in
kswapd context, definitely kswapd should immediately stop and not go
ahead trying the create hugepages the blind way, kswapd
order-awareness the blind way is surely detrimental and pointless).

When memory pressure is low, not going into ->writepage may be
beneficial from latency prospective too. (but again it depends how
much it matters to go in LRU and how beneficial is the cache, to know
if it's worth taking clean cache away even if hotter than dirty cache)

About the stack overflow did you ever got any stack-debug error? We've
plenty of instrumentation and ->writepage definitely runs with irq
enable, so if there's any issue, it can't possibly be unnoticed. The
worry about stack overflow shall be backed by numbers.

You posted lots of latency numbers (surely latency will improve but
it's only safe approach on light memory pressure, on heavy pressure
it'll early-oom not to call ->writepage, and if cache is very
important and system has little ram, not going in lru order may also
screw fs-cache performance), but I didn't see any max-stack usage hard
numbers, to back the claim that we're going to overflow.

In any case I'd prefer to be able to still call ->writepage if memory
pressure is high (at some point when priority going down and
collecting clean cache doesn't still satisfy the allocation), during
allocations in direct reclaim and increase the THREAD_SIZE than doing
this purely for stack reasons as the VM will lose reliability if we
forbid ->writepage at all in direct reclaim. Throttling on kswapd is
possible but it's probably less efficient and on the stack we know
exactly which kind of memory we should allocate, kswapd doesn't and it
works global.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Christoph Hellwig on 15 Jun 2010 10:20

On Tue, Jun 15, 2010 at 04:00:11PM +0200, Andrea Arcangeli wrote:
> collecting clean cache doesn't still satisfy the allocation), during
> allocations in direct reclaim and increase the THREAD_SIZE than doing
> this purely for stack reasons as the VM will lose reliability if we

This basically means doubling the stack size, as you can splice together
two extremtly stack hungry codepathes in the worst case. Do you really
want order 2 stack allocations?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7
Prev: [GIT PULL] UBI changes for 2.6.35-rc3
Next: [PATCH 1/5] ACPI / ACPICA: Use helper function for computing GPE masks