From: Jan Kara on
> Badari Pulavarty wrote:
>
> >Here is what I think is happening..
> >
> >journal_unmap_buffer() - cleaned the buffer, since its outside EOF, but
> >its a part of the same page. So it remained on the page->buffers
> >list. (at this time its not part of any transaction).
> >
> >Then, ordererd_commit_write() called journal_dirty_data() and we added
> >all these buffers to BJ_SyncData list. (at this time buffer is clean -
> >not dirty).
> >
> >Now msync() called __set_page_dirty_buffers() and dirtied *all* the
> >buffers attached to this page.
> >
> >journal_submit_data_buffers() got around to this buffer and tried to
> >submit the buffer...
Yes, this is certainly one we need to fix.

> This seems about right, but one thing bothers me in the traces; it seems
> like there is some locking that is missing. In
> http://people.redhat.com/esandeen/traces/eric_ext3_oops1.txt
> for example, it looks like journal_dirty_data gets started, but then the
> buffer_head is acted on by journal_unmap_buffer, which decides this buffer
> is part of the running transaction, past EOF, and clears mapped, dirty,
It's part of the committing transaction.

> etc. Then journal_dirty_data picks up again, decides that the buffer is
> not on the right list (now BJ_None) and puts it back on BJ_SyncData. Then
> it gets picked up by journal_submit_data_buffers and submitted, and oops.
Now it is put on the running transaction's BJ_SyncData list. But
otherwise you're right.

> Talking with Stephen, it seemed like the page lock should synchronize these
> threads, but I've found that we can get to journal_dirty_data acting on the
> buffer heads w/o having the page locked...
Yes, PageLock should protect us. Where can we call
journal_dirty_data() without PageLock? I see the following callers:
ext3_ordered_commit_write - should have PageLock
ext3_ordered_writepage - has PageLock
ext3_block_truncate_page - has PageLock

And that are all callers from ext3. Am I missing something?

Honza
--
Jan Kara <jack(a)suse.cz>
SuSE CR Labs
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Eric Sandeen on
Jan Kara wrote:

>> Talking with Stephen, it seemed like the page lock should synchronize these
>> threads, but I've found that we can get to journal_dirty_data acting on the
>> buffer heads w/o having the page locked...
> Yes, PageLock should protect us. Where can we call
> journal_dirty_data() without PageLock? I see the following callers:
> ext3_ordered_commit_write - should have PageLock
> ext3_ordered_writepage - has PageLock
> ext3_block_truncate_page - has PageLock
>
> And that are all callers from ext3. Am I missing something?
>
> Honza

I put an assert about the page being locked in journal_dirty_data, and hit it
right away. I'll look a bit more but I think this is how I got there:


ext3_ordered_writepage <-- assert PageLocked
...
block_write_full_page
__block_write_full_page
unlock_page()
...
walk_page_buffers
journal_dirty_data_fn
ext3_journal_dirty_data
journal_dirty_data <-- find page unlocked

there's a comment in ext3_ordered_writepage:

/*
* The page can become unlocked at any point now, and
* truncate can then come in and change things. So we
* can't touch *page from now on. But *page_bufs is
* safe due to elevated refcount.
*/

-Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Eric Sandeen on
Andrew Morton wrote:
> On Thu, 12 Oct 2006 14:28:20 +0200
> Jan Kara <jack(a)suse.cz> wrote:
>
>> Where can we call
>> journal_dirty_data() without PageLock?
>
> block_write_full_page() will unlock the page, so ext3_writepage()
> will run journal_dirty_data_fn() against an unlocked page.
>
> I haven't looked into the exact details of the race, but it should
> be addressable via jbd_lock_bh_state() or j_list_lock coverage.

Yep, that's what I've been hashing out with Stephen today...

In one of my cases journal_dirty_data has dropped & re-acquired the
bh_state lock and j_list_lock, and journal_unmap_buffer has come along
in the meantime.

So it looks like we are missing some state tests, i.e. buffer_mapped(),
at a couple points after we acquire jbd_lock_bh_state().

-Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Andrew Morton on
On Thu, 12 Oct 2006 14:28:20 +0200
Jan Kara <jack(a)suse.cz> wrote:

> Where can we call
> journal_dirty_data() without PageLock?

block_write_full_page() will unlock the page, so ext3_writepage()
will run journal_dirty_data_fn() against an unlocked page.

I haven't looked into the exact details of the race, but it should
be addressable via jbd_lock_bh_state() or j_list_lock coverage.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Eric Sandeen on
Andrew Morton wrote:

> On Thu, 12 Oct 2006 14:28:20 +0200
> Jan Kara <jack(a)suse.cz> wrote:
>
>
>> Where can we call
>> journal_dirty_data() without PageLock?
>>
>
> block_write_full_page() will unlock the page, so ext3_writepage()
> will run journal_dirty_data_fn() against an unlocked page.
>
> I haven't looked into the exact details of the race, but it should
> be addressable via jbd_lock_bh_state() or j_list_lock coverage
I'm testing with something like this now; seem sane?

journal_dirty_data & journal_unmap_data both check do
jbd_lock_bh_state(bh) close to the top... journal_dirty_data_fn has checked
buffer_mapped before getting into journal_dirty_data, but that state may
change before the lock is grabbed. Similarly re-check after we drop the lock.

-Eric

Index: linux-2.6.18-1.2737.fc6/fs/jbd/transaction.c
===================================================================
--- linux-2.6.18-1.2737.fc6.orig/fs/jbd/transaction.c
+++ linux-2.6.18-1.2737.fc6/fs/jbd/transaction.c
@@ -967,6 +967,13 @@ int journal_dirty_data(handle_t *handle,
*/
jbd_lock_bh_state(bh);
spin_lock(&journal->j_list_lock);
+
+ /* Now that we have bh_state locked, are we really still mapped? */
+ if (!buffer_mapped(bh)) {
+ JBUFFER_TRACE(jh, "unmapped, bailing out");
+ goto no_journal;
+ }
+
if (jh->b_transaction) {
JBUFFER_TRACE(jh, "has transaction");
if (jh->b_transaction != handle->h_transaction) {
@@ -1028,6 +1036,11 @@ int journal_dirty_data(handle_t *handle,
sync_dirty_buffer(bh);
jbd_lock_bh_state(bh);
spin_lock(&journal->j_list_lock);
+ /* Since we dropped the lock... */
+ if (!buffer_mapped(bh)) {
+ JBUFFER_TRACE(jh, "Got unmapped");
+ goto no_journal;
+ }
/* The buffer may become locked again at any
time if it is redirtied */
}

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/