From: Christoph Hellwig on
On Wed, Oct 07, 2009 at 03:38:49PM +0800, Wu Fengguang wrote:
> A background flush work may run for ever. So it's reasonable for it to
> mimic the kupdate behavior of syncing old/expired inodes first.

I've looked at this a bit again after you pointed to this thread in
the direct reclaim thread, and I think we should be even more aggressive
in pushing out old inodes.

We basically have two types of I/O done from wb_do_writeback:

- either we want to write all inodes for a given bdi/superblock. That
includes all WB_SYNC_ALL callers, but also things like
writeback_inodes_sb and the wakeup_flusher_threads call from
sys_sync.
- or we have a specific goal, like for the background writeback or
the wakeup_flusher_threads from free_more_memory.

For the first case there's obviously no point in doing any
older_than_this processing as we write out all inodes anyway.

For the second case we should always do a older_than_this pass _first_.
Rationale: we really should get the old inodes out ASAP, so that we
keep the amount of changes lost on a crash in the boundaries.
Furthermore the callers only need N pages cleaned, and they don't care
where from. So if we can reach our goal with the older_than_this
writeback we're fine. If the writeback loop is long enough we can
keep doing more of these later on as well.

Doing this should also help cleaning the code up a bit by moving the
wb_check_old_data_flush logic into wb_writeback and getting rid of the
for_kupdate paramter in struct wb_writeback_work. I'm not even sure
it's worth keeping it in struct writeback_control.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Wu Fengguang on
On Mon, Jul 12, 2010 at 11:01:29AM +0800, Christoph Hellwig wrote:
> On Wed, Oct 07, 2009 at 03:38:49PM +0800, Wu Fengguang wrote:
> > A background flush work may run for ever. So it's reasonable for it to
> > mimic the kupdate behavior of syncing old/expired inodes first.
>
> I've looked at this a bit again after you pointed to this thread in
> the direct reclaim thread, and I think we should be even more aggressive
> in pushing out old inodes.

Agreed.

> We basically have two types of I/O done from wb_do_writeback:
>
> - either we want to write all inodes for a given bdi/superblock. That
> includes all WB_SYNC_ALL callers, but also things like
> writeback_inodes_sb and the wakeup_flusher_threads call from
> sys_sync.
> - or we have a specific goal, like for the background writeback or
> the wakeup_flusher_threads from free_more_memory.
>
> For the first case there's obviously no point in doing any
> older_than_this processing as we write out all inodes anyway.

We may also do older_than_this even for the sync-the-whole-world case,
as long as this simplifies wb_writeback() and/or other code. This may
make a difference for slow devices.

> For the second case we should always do a older_than_this pass _first_.

Agree in general.

> Rationale: we really should get the old inodes out ASAP, so that we
> keep the amount of changes lost on a crash in the boundaries.
> Furthermore the callers only need N pages cleaned, and they don't care
> where from. So if we can reach our goal with the older_than_this
> writeback we're fine. If the writeback loop is long enough we can
> keep doing more of these later on as well.

Right.

> Doing this should also help cleaning the code up a bit by moving the
> wb_check_old_data_flush logic into wb_writeback and getting rid of the
> for_kupdate paramter in struct wb_writeback_work. I'm not even sure
> it's worth keeping it in struct writeback_control.

I'd also like to see less for_kupdate tests. Whether or not we can
totally get rid of the explicit for_kupdate case, there are always
four main writeback goals/semantics:
- periodic stop when all 30s-old inodes are written
- background stop when background threshold is reached
- nr_pages stop when nr_pages written (or when all clean)
- sync stop when all older-than-sync-time inodes are written

Note that
- the "sync" goal is obviously a superset of the "periodic" goal
- the "background" goal may be expanded to include the "periodic" goal
- the latter three goals may all do some "periodic" goal loops, with
a moving "old" criterion.

Thanks,
Fengguang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Christoph Hellwig on
On Mon, Jul 19, 2010 at 02:11:29PM +0100, Mel Gorman wrote:
> From: Wu Fengguang <fengguang.wu(a)intel.com>
>
> A background flush work may run for ever. So it's reasonable for it to
> mimic the kupdate behavior of syncing old/expired inodes first.
>
> This behavior also makes sense from the perspective of page reclaim.
> File pages are added to the inactive list and promoted if referenced
> after one recycling. If not referenced, it's very easy for pages to be
> cleaned from reclaim context which is inefficient in terms of IO. If
> background flush is cleaning pages, it's best it cleans old pages to
> help minimise IO from reclaim.

Yes, we absolutely do this. Wu, do you have an improved version of the
pending or should we put it in this version for now?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Christoph Hellwig on
On Mon, Jul 19, 2010 at 03:40:47PM +0100, Mel Gorman wrote:
> > Yes, we absolutely do this.
>
> Do you mean we absolutely want to do this?

Ermm yes, sorry.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Wu Fengguang on
On Mon, Jul 19, 2010 at 10:21:45PM +0800, Christoph Hellwig wrote:
> On Mon, Jul 19, 2010 at 02:11:29PM +0100, Mel Gorman wrote:
> > From: Wu Fengguang <fengguang.wu(a)intel.com>
> >
> > A background flush work may run for ever. So it's reasonable for it to
> > mimic the kupdate behavior of syncing old/expired inodes first.
> >
> > This behavior also makes sense from the perspective of page reclaim.
> > File pages are added to the inactive list and promoted if referenced
> > after one recycling. If not referenced, it's very easy for pages to be
> > cleaned from reclaim context which is inefficient in terms of IO. If
> > background flush is cleaning pages, it's best it cleans old pages to
> > help minimise IO from reclaim.
>
> Yes, we absolutely do this. Wu, do you have an improved version of the
> pending or should we put it in this version for now?

Sorry for the delay! The code looks a bit hacky, and there is a problem:
it only decrease expire_interval and never increase/reset it.
So it's possible when dirty workload first goes light then goes heavy,
expire_interval may be reduced to 0 and never be able to grow up again.
In the end we revert to the old behavior of ignoring dirtied_when totally.

A more complete solution would be to make use of older_than_this not
only for the kupdate case, but also for the background and sync cases.
The policies can be most cleanly carried out in move_expired_inodes().

- kupdate: older_than_this = jiffies - 30s
- background: older_than_this = TRY FROM (jiffies - 30s) TO (jiffies),
UNTIL get some inodes to sync
- sync: older_than_this = start time of sync

I'll post an untested RFC patchset for the kupdate and background
cases. The sync case will need two more patch series due to other
problems.

Thanks,
Fengguang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/