writeback: lessen sync_supers wakeup count [Kernel]

Prev: staging: hv: Fix race condition on IC channel initialization (modified)
Next: [PATCH] power_end event (Resend)

From: Artem Bityutskiy on 29 May 2010 04:10

On Fri, 2010-05-28 at 13:29 -0700, Andrew Morton wrote:
> void mark_sb_dirty(struct super_block *sb)
> {
> sb->s_dirty = 1;
>
> if (!supers_timer_armed) {
> spin_lock(&supers_timer_lock);
> if (!supers_timer_armed) {
> bdi_arm_supers_timer();
> supers_timer_armed = 1;
> }
> } else if (supers_timer_armed == -1)
> spin_lock(&supers_timer_lock);
> if (supers_timer_armed == -1)
> supers_timer_armed = 1;
> spin_unlock(&supers_timer_lock);
> }
> }
>
> I didn't try very hard there, but you get the idea: examine the state
> before taking that expensive global spinlock, so we only end up taking
> the lock once per five seconds, rather than once per possible
> superblock dirtying. That's like a six-orders-of-magnitude reduction
> in locking frequency, which is worth putting some effort into.

Andrew, thanks for review!

I just did not consider spinlock to be expensive because I thought that
marking superblock as dirty is a relatively rare operation. And my small
experiments kind of confirmed that.

But Nick suggested a good locking scheme which uses only smp_mb() in
this thread, which I am going to stick with.

--
Best Regards,
Artem Bityutskiy (Артём Битюцкий)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Artem Bityutskiy on 31 May 2010 04:30

On Fri, 2010-05-28 at 01:44 +1000, Nick Piggin wrote:
> > if (supers_dirty)
> > bdi_arm_supers_timer();
> > set_current_state(TASK_INTERRUPTIBLE);
> > schedule();

> But we cannot do the above, because again the timer might go off
> before we set current state. We'd lose the wakeup and never wake
> up again.
>
> Putting it inside set_current_state() should be OK. I suppose.

Hmm, but it looks like we cannot do that either. If we do

set_current_state(TASK_INTERRUPTIBLE);
if (supers_dirty)
bdi_arm_supers_timer();
schedule();

and the kernel is preemptive, is it possible that we get preempted
before we run 'bdi_arm_supers_timer()', but after we do
'set_current_state(TASK_INTERRUPTIBLE)'. And we will never wake up if
the timer armed in mark_sb_dirty() went off.

So it looks like this is the way to go:

/*
* Disable preemption for a while to make sure we are not
* preempted before the timer is armed.
*/
preempt_disable();
set_current_state(TASK_INTERRUPTIBLE);
if (supers_dirty)
bdi_arm_supers_timer();
preempt_enable();
schedule();

--
Best Regards,
Artem Bityutskiy (Артём Битюцкий)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Nick Piggin on 31 May 2010 04:40

On Mon, May 31, 2010 at 11:25:52AM +0300, Artem Bityutskiy wrote:
> On Fri, 2010-05-28 at 01:44 +1000, Nick Piggin wrote:
> > > if (supers_dirty)
> > > bdi_arm_supers_timer();
> > > set_current_state(TASK_INTERRUPTIBLE);
> > > schedule();
>
> > But we cannot do the above, because again the timer might go off
> > before we set current state. We'd lose the wakeup and never wake
> > up again.
> >
> > Putting it inside set_current_state() should be OK. I suppose.
>
> Hmm, but it looks like we cannot do that either. If we do
>
> set_current_state(TASK_INTERRUPTIBLE);
> if (supers_dirty)
> bdi_arm_supers_timer();
> schedule();
>
> and the kernel is preemptive, is it possible that we get preempted
> before we run 'bdi_arm_supers_timer()', but after we do
> 'set_current_state(TASK_INTERRUPTIBLE)'. And we will never wake up if
> the timer armed in mark_sb_dirty() went off.
>
> So it looks like this is the way to go:
>
> /*
> * Disable preemption for a while to make sure we are not
> * preempted before the timer is armed.
> */
> preempt_disable();
> set_current_state(TASK_INTERRUPTIBLE);
> if (supers_dirty)
> bdi_arm_supers_timer();
> preempt_enable();
> schedule();

This should not be required because preempt is transparent to these
task sleep/schedule APIs.

The preempt event will not clear TASK_INTERRUPTIBLE, and so the timer
wakeup will set it to TASK_RUNNING (whether or not it has called
schedule() yet and whether or not it is currently preempted).

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Artem Bityutskiy on 31 May 2010 10:10

On Thu, 2010-05-27 at 07:50 +0100, Al Viro wrote:
> Note, e.g., that in your series you've touched udf; it can set s_dirt
> until the cows come home, but without ->write_super() it'll be ignored
> by everything in VFS and fs/udf itself never looks at the damn thing.
>
> A look around it shows fs/sysv, where we never clean the damn flag anymore
> for r/w mounts. Yes, really (got broken a year ago, nobody noticed).
>
> Or, e.g., BFS - there we have ->write_super() mark the buffer_head that
> contains on-disk sb dirty, and the only place that sets ->s_dirt is doing
> that immediately after having marked the same bh dirty itself. Interesting
> place, at that - bfs_fill_super() at r/w mount time... Note that ->sync_fs()
> there does *not* wait for anything, which is not the right thing to do.
>
> IOW, this thing is a good topic for code review; I suspect that quite a few
> users might be gone as the result.

Al,

you requested me to review s_dirt usage, well, I'm trying now. One thing
I do not understand is s_dirt serialization, which seems to be just
absent in some FSes. I checked affs and ext2. E.g., affs does:

affs_alloc_block()
{
mark_buffer_dirty(bh);
sb->s_dirt = 1;
}

vs

affs_write_super()
{
affs_commit_super();
/* YYY: what if sb is marked as dirty right here? */
sb->s_dirt = 0;
}

vs

/* This wakes up periodically */
sync_super()
{
if (sb->s_root && sb->s_dirt)
sb->s_op->write_super(sb);
}

ext2 seems to be doing something similar. It seems to me that FSes
should serialize s_dirt changes somehow, but they don't? Why this is not
a problem?

--
Best Regards,
Artem Bityutskiy (Артём Битюцкий)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Al Viro on 4 Jun 2010 00:30

On Mon, May 31, 2010 at 05:07:00PM +0300, Artem Bityutskiy wrote:
> you requested me to review s_dirt usage, well, I'm trying now. One thing
> I do not understand is s_dirt serialization, which seems to be just
> absent in some FSes. I checked affs and ext2. E.g., affs does:
>
> affs_alloc_block()
> {
> mark_buffer_dirty(bh);
> sb->s_dirt = 1;
> }
>
> vs
>
> affs_write_super()
> {
> affs_commit_super();
> /* YYY: what if sb is marked as dirty right here? */
> sb->s_dirt = 0;
> }
>
> vs
>
> /* This wakes up periodically */
> sync_super()
> {
> if (sb->s_root && sb->s_dirt)
> sb->s_op->write_super(sb);
> }
>
> ext2 seems to be doing something similar. It seems to me that FSes
> should serialize s_dirt changes somehow, but they don't? Why this is not
> a problem?

I suspect that most of those used to rely on lock_super() way back.
In case of ext2_sync_super() we probably want just to move ->s_dirt = 0
into the very beginning; no serialization is really needed beyond (_maybe_)
some barriers. No idea about affs, needs to be checked.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

First | Prev | Next | Last
Pages: 1 2 3 4
Prev: staging: hv: Fix race condition on IC channel initialization (modified)
Next: [PATCH] power_end event (Resend)