From: Dave Chinner on
On Thu, Jun 24, 2010 at 01:02:41PM +1000, npiggin(a)suse.de wrote:
> Protect inode->i_count with i_lock, rather than having it atomic.
> Next step should also be to move things together (eg. the refcount increment
> into d_instantiate, which will remove a lock/unlock cycle on i_lock).
......
> Index: linux-2.6/fs/inode.c
> ===================================================================
> --- linux-2.6.orig/fs/inode.c
> +++ linux-2.6/fs/inode.c
> @@ -33,14 +33,13 @@
> * inode_hash_lock protects:
> * inode hash table, i_hash
> * inode->i_lock protects:
> - * i_state
> + * i_state, i_count
> *
> * Ordering:
> * inode_lock
> * sb_inode_list_lock
> * inode->i_lock
> - * inode_lock
> - * inode_hash_lock
> + * inode_hash_lock
> */

I thought that the rule governing the use of inode->i_lock was that
it can be used anywhere as long as it is the innermost lock.

Hmmm, no references in the code or documentation. Google gives a
pretty good reference:

http://www.mail-archive.com/linux-ext4(a)vger.kernel.org/msg02584.html

Perhaps a different/new lock needs to be used here?

Cheers,

Dave.
--
Dave Chinner
david(a)fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Nick Piggin on
On Wed, Jun 30, 2010 at 05:27:02PM +1000, Dave Chinner wrote:
> On Thu, Jun 24, 2010 at 01:02:41PM +1000, npiggin(a)suse.de wrote:
> > Protect inode->i_count with i_lock, rather than having it atomic.
> > Next step should also be to move things together (eg. the refcount increment
> > into d_instantiate, which will remove a lock/unlock cycle on i_lock).
> .....
> > Index: linux-2.6/fs/inode.c
> > ===================================================================
> > --- linux-2.6.orig/fs/inode.c
> > +++ linux-2.6/fs/inode.c
> > @@ -33,14 +33,13 @@
> > * inode_hash_lock protects:
> > * inode hash table, i_hash
> > * inode->i_lock protects:
> > - * i_state
> > + * i_state, i_count
> > *
> > * Ordering:
> > * inode_lock
> > * sb_inode_list_lock
> > * inode->i_lock
> > - * inode_lock
> > - * inode_hash_lock
> > + * inode_hash_lock
> > */
>
> I thought that the rule governing the use of inode->i_lock was that
> it can be used anywhere as long as it is the innermost lock.
>
> Hmmm, no references in the code or documentation. Google gives a
> pretty good reference:
>
> http://www.mail-archive.com/linux-ext4(a)vger.kernel.org/msg02584.html
>
> Perhaps a different/new lock needs to be used here?

Well I just changed the order (and documented it to boot :)). It's
pretty easy to verify that LOR is no problem. inode hash is only
taken in a very few places so other code outside inode.c is fine to
use i_lock as an innermost lock.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Dave Chinner on
On Wed, Jun 30, 2010 at 10:05:02PM +1000, Nick Piggin wrote:
> On Wed, Jun 30, 2010 at 05:27:02PM +1000, Dave Chinner wrote:
> > On Thu, Jun 24, 2010 at 01:02:41PM +1000, npiggin(a)suse.de wrote:
> > > Protect inode->i_count with i_lock, rather than having it atomic.
> > > Next step should also be to move things together (eg. the refcount increment
> > > into d_instantiate, which will remove a lock/unlock cycle on i_lock).
> > .....
> > > Index: linux-2.6/fs/inode.c
> > > ===================================================================
> > > --- linux-2.6.orig/fs/inode.c
> > > +++ linux-2.6/fs/inode.c
> > > @@ -33,14 +33,13 @@
> > > * inode_hash_lock protects:
> > > * inode hash table, i_hash
> > > * inode->i_lock protects:
> > > - * i_state
> > > + * i_state, i_count
> > > *
> > > * Ordering:
> > > * inode_lock
> > > * sb_inode_list_lock
> > > * inode->i_lock
> > > - * inode_lock
> > > - * inode_hash_lock
> > > + * inode_hash_lock
> > > */
> >
> > I thought that the rule governing the use of inode->i_lock was that
> > it can be used anywhere as long as it is the innermost lock.
> >
> > Hmmm, no references in the code or documentation. Google gives a
> > pretty good reference:
> >
> > http://www.mail-archive.com/linux-ext4(a)vger.kernel.org/msg02584.html
> >
> > Perhaps a different/new lock needs to be used here?
>
> Well I just changed the order (and documented it to boot :)). It's
> pretty easy to verify that LOR is no problem. inode hash is only
> taken in a very few places so other code outside inode.c is fine to
> use i_lock as an innermost lock.

It's not just the inode_hash_lock - you move four or five other
locks under inode->i_lock as the series progresses. IOWs, there's
now many paths and locking orders where the i_lock is not innermost.
If we go forward with this, it's only going to get more complex and
eventually somewhere we'll need a new lock for an innermost
operation because inode->i_lock is no longer safe to use....

Seriously: use a new lock for high level inode operations you are
optimising - don't repurpose an existing lock with different usage
rules just because it's convenient.

Cheers,

Dave.
--
Dave Chinner
david(a)fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Nick Piggin on
On Thu, Jul 01, 2010 at 12:36:18PM +1000, Dave Chinner wrote:
> On Wed, Jun 30, 2010 at 10:05:02PM +1000, Nick Piggin wrote:
> > On Wed, Jun 30, 2010 at 05:27:02PM +1000, Dave Chinner wrote:
> > > On Thu, Jun 24, 2010 at 01:02:41PM +1000, npiggin(a)suse.de wrote:
> > > > Protect inode->i_count with i_lock, rather than having it atomic.
> > > > Next step should also be to move things together (eg. the refcount increment
> > > > into d_instantiate, which will remove a lock/unlock cycle on i_lock).
> > > .....
> > > > Index: linux-2.6/fs/inode.c
> > > > ===================================================================
> > > > --- linux-2.6.orig/fs/inode.c
> > > > +++ linux-2.6/fs/inode.c
> > > > @@ -33,14 +33,13 @@
> > > > * inode_hash_lock protects:
> > > > * inode hash table, i_hash
> > > > * inode->i_lock protects:
> > > > - * i_state
> > > > + * i_state, i_count
> > > > *
> > > > * Ordering:
> > > > * inode_lock
> > > > * sb_inode_list_lock
> > > > * inode->i_lock
> > > > - * inode_lock
> > > > - * inode_hash_lock
> > > > + * inode_hash_lock
> > > > */
> > >
> > > I thought that the rule governing the use of inode->i_lock was that
> > > it can be used anywhere as long as it is the innermost lock.
> > >
> > > Hmmm, no references in the code or documentation. Google gives a
> > > pretty good reference:
> > >
> > > http://www.mail-archive.com/linux-ext4(a)vger.kernel.org/msg02584.html
> > >
> > > Perhaps a different/new lock needs to be used here?
> >
> > Well I just changed the order (and documented it to boot :)). It's
> > pretty easy to verify that LOR is no problem. inode hash is only
> > taken in a very few places so other code outside inode.c is fine to
> > use i_lock as an innermost lock.
>
> It's not just the inode_hash_lock - you move four or five other
> locks under inode->i_lock as the series progresses. IOWs, there's
> now many paths and locking orders where the i_lock is not innermost.
> If we go forward with this, it's only going to get more complex and
> eventually somewhere we'll need a new lock for an innermost
> operation because inode->i_lock is no longer safe to use....

OK yes it's more than one lock, but I don't quite see the problem.
The locks are mostly confined to inode.c and fs-writeback.c, and
filesystems can basically use i_lock as inner most for their purposes.
If they get it wrong, lockdep will tell them pretty quick. And it's
documented to boot.


> Seriously: use a new lock for high level inode operations you are
> optimising - don't repurpose an existing lock with different usage
> rules just because it's convenient.

That's what scalability development is all about, I'm afraid. Just
adding more and more locks is what makes things more complex, so
you have to juggle around or change locks when possible. If there is a
difficulty with locking pops up in future, I'd prefer to look at it
then.

I don't think any filesystems cared at all when I converted them.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Nick Piggin on
On Thu, Jul 01, 2010 at 05:54:26PM +1000, Nick Piggin wrote:
> On Thu, Jul 01, 2010 at 12:36:18PM +1000, Dave Chinner wrote:
> If there is a
> difficulty with locking pops up in future, I'd prefer to look at it
> then.
>
> I don't think any filesystems cared at all when I converted them.

What I mean by this is that _today_ no filesystems seemed to have
any problems with how I did it. I did touch quota and notify code,
which iterates inode sb lists, but it was pretty trivial. Not many
others are about inode locking details enough to care about any
of the locks in fs/inode.c.

And so instead of adding another lock now when I already have a
(IMO) nice and working code, I will prefer to wait until some fs
development runs into problem with locking.

There are several things that can be done. Using RCU for more of
the inode lists is a possibility, and can improve lock order problems
while actually reducing the amount of locking rather than adding
locks.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/