From: Arnd Bergmann on
On Wednesday 31 March 2010 22:21:23 Arnd Bergmann wrote:
> Another crazy idea I had was to simply turn the BKL into a regular mutex
> as soon as we can show that all remaining users are of the non-recursive
> kind and don't rely on the autorelease-on-sleep. Doing that would be
> much easier without the pushdown into .unlocked_ioctl than it would be
> with it.

I just looked at all the users of lock_kernel remaining with my patch
series. For 90% of them, it is completely obvious that they don't rely
on nested locking, and they very much look like they don't need the
autorelease either, because the BKL was simply pushed down into the
open, ioctl and llseek functions.

There are a few file systems (udf, ncpfs, autofs, coda, ...) and some
network protocols (appletalk, ipx, irnet and x25) for which it is not
obviously, though still quite likely, the case.

So we could actually remove the BKL recursion code soon, or even turn
all of it into a regular mutex, at least as an experimental option.

The recursive users that I've removed in my series are the block, tty,
input and sound subsystems, as well as the init code.

Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Frederic Weisbecker on
On Wed, Mar 31, 2010 at 10:21:23PM +0200, Arnd Bergmann wrote:
> On Wednesday 31 March 2010 19:22:11 Frederic Weisbecker wrote:
> > On Tue, Mar 30, 2010 at 11:33:40AM +0100, Arnd Bergmann wrote:
> > > I believe we can actually remove ioctl from file_operations. The patch I did
> > > to convert all users to ".unlocked_ioctl = default_ioctl," should really catch
> > > all cases, and I think we can enforce this by renaming fops->ioctl to locked_ioctl
> > > or old_ioctl to make sure we didn't miss any, and then mandate that this one
> > > is only used when unlocked_ioctl is set to default_ioctl.
> >
> > I just looked at the patch in question and noted that the changelog
> > is pretty high, but how could it be else.
> > Actually it's not that large, but highly spread:
> <snip>
> > 157 files changed, 372 insertions(+), 80 deletions(-)
> >
> >
> > I wonder if we should actually just turn all these into unlocked_ioctl
> > directly. And then bring a warn on ioctl, and finally schedule the removal
> > of this callback.
> >
> > What do you think?
>
> I don't think the warning helps all that much, at least not across an
> entire release. We could leave it in for the merge window and fix all
> users for -rc1, then submit a patch that kills everything that came
> in during the merge window and remove it completely in -rc2.
>
> Getting rid of ioctl completely is a lot of work though, covering the
> entire lot of ~150 device drivers. I think the patch as is (or the
> variant renaming .ioctl to .locked_ioctl) is far less work and has
> less potential of introducing regressions.
>
> > You plan looks good but I fear this actually carries the problem forward
> > in that we won't be able to remove .ioctl after that.
> >
> > I can handle that if you agree.
>
> I don't think we really need to get rid of it this soon in the obsolete
> drivers, pushing down the BKL into an unlocked_ioctl function only slightly
> shifts the problem around, since the driver still depends on the BKL then
> and gets disabled if you build with CONFIG_BKL=n.


Hmm, yeah you're right actually. Since we have this CONFIG_BKL thing
plus a future check to prevent from people implementing new ioctl
(checking ioctl without default_ioctl), it's actually better than
a big pushdown as it's less invasive.



> In the meantime, we can move the declaration of the .locked_ioctl callback
> into an #ifdef CONFIG_BKL, to make sure nobody builds a driver with an
> ioctl function that does not get called.


Ok, now how to get this all merged? A single monolithic patch is probably
not appropriate.

The simplest is to have a single branch with the default_ioctl implemented,
and then attributed to drivers in a set cut by subsystems/drivers. And
push the whole for the next -rc1.

The other solution is to push default_ioctl for this release and get
the driver changes to each concerned tree. That said, I suspect a good
part of them are unmaintained, hence the other solution looks better
to me.


Hmm?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Alan Cox on
> The recursive users that I've removed in my series are the block, tty,
> input and sound subsystems, as well as the init code.

There are some very subtle recursive cases in the tty code - hangup
triggered close and consoles being an absolute gem I've just had to debug
in my lock removal bits so far...

Alan
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Frederic Weisbecker on
On Wed, Mar 31, 2010 at 11:04:30PM +0200, Arnd Bergmann wrote:
> On Wednesday 31 March 2010 22:21:23 Arnd Bergmann wrote:
> > Another crazy idea I had was to simply turn the BKL into a regular mutex
> > as soon as we can show that all remaining users are of the non-recursive
> > kind and don't rely on the autorelease-on-sleep. Doing that would be
> > much easier without the pushdown into .unlocked_ioctl than it would be
> > with it.
>
> I just looked at all the users of lock_kernel remaining with my patch
> series. For 90% of them, it is completely obvious that they don't rely
> on nested locking, and they very much look like they don't need the
> autorelease either, because the BKL was simply pushed down into the
> open, ioctl and llseek functions.
>
> There are a few file systems (udf, ncpfs, autofs, coda, ...) and some
> network protocols (appletalk, ipx, irnet and x25) for which it is not
> obviously, though still quite likely, the case.
>
> So we could actually remove the BKL recursion code soon, or even turn
> all of it into a regular mutex, at least as an experimental option.
>
> The recursive users that I've removed in my series are the block, tty,
> input and sound subsystems, as well as the init code.


This is a solution that has been tried more than once already. But Linus
has told he wouldn't pull something that turns the bkl into a mutex or a
semaphore.

Plus it's quite hard to tell that it does or not auto-release somewhere
This is often something you can really spot on runtime or on small path
only.

The simple fact the bkl is not always a leaf lock makes it need the
auto-release, otherwise you experience very bad unexpected lock
dependencies.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Arnd Bergmann on
On Wednesday 31 March 2010, Alan Cox wrote:
> > The recursive users that I've removed in my series are the block, tty,
> > input and sound subsystems, as well as the init code.
>
> There are some very subtle recursive cases in the tty code - hangup
> triggered close and consoles being an absolute gem I've just had to debug
> in my lock removal bits so far...

Yes, I've seen some of them. What I meant above is that with
CONFIG_TTY_MUTEX=y, the TTY code no longer uses the BKL in a
nested way, and quite likely no either code does either.

The TTY code with my patch now has tty_lock() for all cases that
I concluded are never nested in another tty_lock, and tty_lock_nested()
for those I did not understand or that I know they are nested (the
latter type usually comes with a comment). The only difference
between the two is a WARN_ON(tty_locked()) in tty_lock, so we can
see where the analysis was wrong.

Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/