From: Nick Piggin on
On Sat, Jul 03, 2010 at 03:06:52PM +1000, Nick Piggin wrote:
> It is possible that locking can be reduced if some things are verified
> and carefully shown not to matter. I just don't see the need yet and it
> would make things overly complicated I think. Introducing any more
> complexity will sink this patchset.

By overly complicated, I mean, for this patchset where locking is
already been rewritten. It would then be no more complicated (actually
far less) than equivalently trying to lift inode_lock from parts of the
code where it is causing contention times.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Dave Chinner on
On Sat, Jul 03, 2010 at 03:06:52PM +1000, Nick Piggin wrote:
> So it makes a lot of sense to have a lock to rule the inode (as opposed
> to now we have a lock to rule *all* inodes).

I don't disagree with this approach - I object to the fact that you
repurpose an existing lock and change it's locking rules to "rule
the inode". We don't have any one lock that "rules the inode",
anyway, so adding a new "i_list_lock" for the new VFS level locking
strategies makes it a lot more self-contained. Fundamentally I'm
less concerned about the additional memory usage than I am about
having landmines planted around i_lock...

Cheers,

Dave.
--
Dave Chinner
david(a)fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Nick Piggin on
On Tue, Jul 06, 2010 at 08:41:06AM +1000, Dave Chinner wrote:
> On Sat, Jul 03, 2010 at 03:06:52PM +1000, Nick Piggin wrote:
> > So it makes a lot of sense to have a lock to rule the inode (as opposed
> > to now we have a lock to rule *all* inodes).
>
> I don't disagree with this approach - I object to the fact that you
> repurpose an existing lock and change it's locking rules to "rule
> the inode". We don't have any one lock that "rules the inode",

"rule the inode" was in regard to the inode cache / dirty / writeout
handling; ie. what inode_lock is currently for. The idea is that in
these code paths, we tend to want to take an inode, lock it, and then
manipulate it (including putting it on or taking it off data
structures).


> anyway, so adding a new "i_list_lock" for the new VFS level locking
> strategies makes it a lot more self-contained. Fundamentally I'm
> less concerned about the additional memory usage than I am about
> having landmines planted around i_lock...

(it's not just lists, its refcount and i_state too)

I just don't see a problem. _No_ filesystem takes any of the locks
that nest inside i_lock. They may call some inode.c functions, but
shouldn't do so while holding i_lock if they are treating i_lock as
an innermost lock. So they can keep using i_lock. I did go through
and attempt to look at all filesystems.

Within inode.c, lock ordering is 2 levels deep, same as it was when
we had inode_lock and i_lock.

If some filesystem introduces a lock ordering problem from not
reading the newly added documentation, lockdep should catch it pretty
quick.

So I don't see a landmine they don't have (in much larger doses) with
i_mutex, reentrant reclaim, page lock, buffer lock, mmap_sem etc
already.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Theodore Tso on

On Jul 6, 2010, at 12:34 AM, Nick Piggin wrote:
> On Tue, Jul 06, 2010 at 08:41:06AM +1000, Dave Chinner wrote:
>>
>>
>> I don't disagree with this approach - I object to the fact that you
>> repurpose an existing lock and change it's locking rules to "rule
>> the inode". We don't have any one lock that "rules the inode",
>> anyway, so adding a new "i_list_lock" for the new VFS level locking
>> strategies makes it a lot more self-contained. Fundamentally I'm
>> less concerned about the additional memory usage than I am about
>> having landmines planted around i_lock...
>
> If some filesystem introduces a lock ordering problem from not
> reading the newly added documentation, lockdep should catch it pretty
> quick.

I assume you mean inline documentation in the source, because I
quickly scanned the source and couldn't find any significant changes
to any files in Documentation.

It would be nice if the new state of affairs is documented in a single file,
so that people who want to understand this new locking system don't
have to go crawling through the code, or searching mailing list archives
to figure out what's going on.

A lot of the text in this mail thread, including your discussion of the new
locking hierarchy, and why things are the way they are, would be good
fodder for a new documentation file. And if you don't want to rename
i_lock, because no better name can be found, we should at least
document that starting as of 2.6.35/36 the meaning of i_lock changed.

-- Ted

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Nick Piggin on
On Tue, Jul 06, 2010 at 06:38:28AM -0400, Theodore Tso wrote:
>
> On Jul 6, 2010, at 12:34 AM, Nick Piggin wrote:
> > On Tue, Jul 06, 2010 at 08:41:06AM +1000, Dave Chinner wrote:
> >>
> >>
> >> I don't disagree with this approach - I object to the fact that you
> >> repurpose an existing lock and change it's locking rules to "rule
> >> the inode". We don't have any one lock that "rules the inode",
> >> anyway, so adding a new "i_list_lock" for the new VFS level locking
> >> strategies makes it a lot more self-contained. Fundamentally I'm
> >> less concerned about the additional memory usage than I am about
> >> having landmines planted around i_lock...
> >
> > If some filesystem introduces a lock ordering problem from not
> > reading the newly added documentation, lockdep should catch it pretty
> > quick.
>
> I assume you mean inline documentation in the source, because I
> quickly scanned the source and couldn't find any significant changes
> to any files in Documentation.
>
> It would be nice if the new state of affairs is documented in a single file,
> so that people who want to understand this new locking system don't
> have to go crawling through the code, or searching mailing list archives
> to figure out what's going on.

These type of internal details of lock ordering I think work best in
source files (see rmap.c and filemap.c) so it's a little closer to the
source code. That's in inode.c and dcache.c.

Locking for filesystem callback APIs I agree is just as good to be in
Documentation/filesystems/Locking (which I need to update a bit), but
it's never been used for this stuff before. I'm always open to
suggestions of how to document it better though.


> A lot of the text in this mail thread, including your discussion of the new
> locking hierarchy, and why things are the way they are, would be good
> fodder for a new documentation file. And if you don't want to rename
> i_lock, because no better name can be found, we should at least
> document that starting as of 2.6.35/36 the meaning of i_lock changed.

Well there is not much definition of what i_lock is. It is really not
an "innermost" lock anyway (whatever that exactly means). CEPH even
takes dcache_lock inside i_lock. NFS uses it pretty extensively too.

So it really is *the* non sleeping lock to protect inode fields that
I can see.

As far as filesystems go, inode changes matter very little really,
but the best I can do is just to comment and document the locking
and try to audit each filesystem.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/