From: Arnd Bergmann on
On Sunday 28 March 2010, Stefan Richter wrote:
> Arnd Bergmann wrote:
> > Your patches look good, but it would be helpful to also set .llseek = no_llseek
> > in the file operations, because that is much easier to grep for than
> > only the nonseekable_open. While it's technically a NOP on the presence of
> > nonseekable_open, it will help that I don't accidentally apply my patch on
> > top of yours.
>
> Sounds like a plan, but (a) if my .llseek = no_llseek and your .llseek =
> default_llseek are not within diff context range, you (or whoever else
> merges mine and yours) only get a compiler warning (Initializer entry
> defined twice) rather than a merge conflict which couldn't be missed,
> (b) there won't be a merge conflict in "BKL removal: mark remaining
> users as 'depends on BKL'". (c) While I don't mind adding more visual
> clutter to ieee1394/*, I prefer terse coding in firewire/*.
>
> How about I put my nonseekable_open additions into a release branch and
> send you a pull request after a few days exposure in linux-next? If you
> do not plan to respin your patch queue soon or at all, I could even let
> you pull a for-arnd branch with a semantically correct merge of yours
> and mine.

I can probably remember this specific one now, but for other people
doing the same on their subsystems, adding no_llseek may help reduce
the need for coordination.

> General thoughts:
>
> ".llseek = NULL," so far meant "do the Right Thing on lseek() and
> friends, as far as the fs core can tell". Shouldn't we keep it that
> way? It's as close to other ".method = NULL," as it can get, which
> either mean "silently skip this method if it doesn't matter" (e.g.
> .flush) or "fail attempts to use this method with a fitting errno" (e.g.
> .write).

My series changes the default from 'default_llseek' to 'generic_file_llseek',
which is almost identical, except for taking the inode mutex instead of the
BKL. Another option that has been discussed before is to make no_llseek
the default, but that might cause more serious problems wiht drivers that
really require seeking.

Since using default_llseek can only ever make a difference if the driver
actually uses the BKL in any other function, I could go through the
patches again and revert those that do no use the BKL anywhere else.

> Of course, as we have already seen with infiniband, firewire, ieee1394,
> .llseek = NULL is ambiguous in practice. Does the driver really want to
> use default_llseek, or should it rather use no_llseek and/or
> nonseekable_open, or should it even implement a dummy_llseek() { return
> 0; } which avoids the BKL but preserves ABI behaviour? This needs to be
> resolved for each and every case eventually, regardless of whether or
> when your addition of .llseek = default_llseek enters mainline.

Yes, that also sounds like a good idea. I believe that Jan actually posted
a patch to do that at some point.

Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Frederic Weisbecker on
On Wed, Mar 24, 2010 at 10:40:54PM +0100, Arnd Bergmann wrote:
> I've spent some time continuing the work of the people on Cc and many others
> to remove the big kernel lock from Linux and I now have bkl-removal branch
> in my git tree at git://git.kernel.org/pub/scm/linux/kernel/git/arnd/playground.git
> that lets me run a kernel on my quad-core machine with the only users of the BKL
> being mostly obscure device driver modules.
>
> The oldest patch in this series is roughly eight years old and is Willy's patch
> to remove the BKL from fs/locks.c, and I took a series of patches from Jan that
> removes it from most of the VFS.
>
> The other non-obvious changes are:
>
> - all file operations that either have an .ioctl method or do not have their
> own .llseek method used to implicitly require the BKL. I've changed that
> so they need to explicitly set .llseek = default_llseek, .unlocked_ioctl =
> default_ioctl, and changed all the code that either has supplied a .ioctl
> method or looks like it needs the BKL somewhere else, meaning the
> default_llseek function might actually do something.
>
> - The block layer now has a global bkldev_mutex that is used in all block
> drivers in place of the BKL. The only recursive instance of the BKL was
> __blkdev_get(), which is now called with the blkdev_mutex held instead of
> grabbing the BKL. This has some possible performance implications that
> need to be looked into.
>
> - The init/main.c code no longer take the BKL. I figured that this was
> completely unnecessary because there is no other code running at the
> same time that takes the BKL.
>
> - The most invasive change is in the TTY layer, which has a new global
> mutex (sorry!). I know that Alan has plans of his own to remove the BKL
> from this subsystem, so my patches may not go anywhere, but they seem
> to work fine for me.
> I've called the new lock the 'Big TTY Mutex' (BTM), a name that probably
> makes more sense if you happen to speak German.
> The basic idea here is to make recursive locking and the release-on-sleep
> explicit, so every mutex_lock, wait_event, workqueue_flush and schedule
> in the TTY layer now explicitly releases the BTM before blocking.
>
> - All drivers that still require the BKL are now listed as 'depends on BKL'
> in Kconfig, and you can set that symbol to 'y', 'm' or 'n'. If the lock
> itself is a module, only other modules can use it, and /proc/modules
> will tell you exactly which ones those are. I've thought about adding
> a module_init function in that module that will taint the kernel, but so
> far I haven't done that.
>
> - Included is a debugfs file that gives statistics over the BKL usage from
> early boot on. This is now obsolete and will not get merged, but I'm
> including it for reference.
>
> Frederic has volunteered to help merging all of this upstream, which I
> very much welcome. The shape that the tree is in now is very inconsistent,
> especially some of the bits at the end are a bit dodgy and all of it needs
> more testing.
>
> I've built-tested an allmodconfig kernel with CONFIG_BKL disabled
> on x86_64, i386, powerpc64, powerpc32, s390 and arm to make sure I
> catch all the modules that depend on BKL, and I've been running
> various versions of this tree on my desktop machine over the last few
> weeks while adding stuff.
>
> Arnd
>
> ---
>
> Arnd Bergmann (44):
> input: kill BKL, fix input_open_file locking
> ptrace: kill BKL
> procfs: kill BKL in llseek
> random: forbid llseek on random chardev
> x86/microcode: use nonseekable_open
> perf_event: use nonseekable_open



I just queued the perf_event one. It looks pretty good. I'm also
looking at some of the most trivials (ehm..less hards) in the list
and see which we can submit right away.

Thanks.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Frederic Weisbecker on
On Wed, Mar 24, 2010 at 10:40:54PM +0100, Arnd Bergmann wrote:
> Arnd Bergmann (44):
> input: kill BKL, fix input_open_file locking


This one is upstream now.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Frederic Weisbecker on
On Sun, Mar 28, 2010 at 09:05:50PM +0100, Arnd Bergmann wrote:
> > General thoughts:
> >
> > ".llseek = NULL," so far meant "do the Right Thing on lseek() and
> > friends, as far as the fs core can tell". Shouldn't we keep it that
> > way? It's as close to other ".method = NULL," as it can get, which
> > either mean "silently skip this method if it doesn't matter" (e.g.
> > .flush) or "fail attempts to use this method with a fitting errno" (e.g.
> > .write).
>
> My series changes the default from 'default_llseek' to 'generic_file_llseek',
> which is almost identical, except for taking the inode mutex instead of the
> BKL.


What if another file operation changes the file pointer while holding the bkl?
You're not protected anymore in this case.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Frederic Weisbecker on
On Thu, Mar 25, 2010 at 11:26:56AM +0100, Arnd Bergmann wrote:
> On Wednesday 24 March 2010, Andrew Morton wrote:
> > On Wed, 24 Mar 2010 22:40:54 +0100 Arnd Bergmann <arnd(a)arndb.de> wrote:
> >
> > > I've spent some time continuing the work of the people on Cc and many others
> > > to remove the big kernel lock from Linux
> >
> > <looks inside ptrace.c>
> >
> > Seems that there might be a few tricksy bits in here. Please do push
> > at least the non-obvious parts out to the relevant people.
>
> Sure, that is certainly the plan.
>
> Regarding the ptrace bits, this is one of a handful of places where the BKL
> was put in by someone a really long time ago but with the rest of the
> series applied, it becomes evident that there is nothing whatsoever that
> it serializes with, so removing the BKL here does not make the situation
> worse. It could still be a bug that needs to be fixed by adding a new
> serialization method no matter if the BKL is there or not.


Yeah, the comment gives this:

/*
* This lock_kernel fixes a subtle race with suid exec
*/

But there is no lock_kernel() in the exec path, may be I missed it...
so this may be an old lock_kernel() that doesn't exist anymore, and
the bkl in the ptrace path is not going to help in any way.

What remain to check are the possible unintended racy places that
this bkl might protect.

I'm going to check a first pass and if it looks fine, I'll just submit
to Oleg and Roland.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/