From: Arnd Bergmann on
On Wednesday 31 March 2010, Frederic Weisbecker wrote:
> This is a solution that has been tried more than once already. But Linus
> has told he wouldn't pull something that turns the bkl into a mutex or a
> semaphore.

Ok. Starting from the same observation of simplicity in the remaining code,
we should also be able to find a semi-automatic way of turning the BKL usage
in these drivers into a per-module mutex.

> Plus it's quite hard to tell that it does or not auto-release somewhere
> This is often something you can really spot on runtime or on small path
> only.

Well, the autorelease by itself is not needed anywhere. What is needed
is the consequence of autorelease avoiding AB-BA type deadlocks.

> The simple fact the bkl is not always a leaf lock makes it need the
> auto-release, otherwise you experience very bad unexpected lock
> dependencies.

I'm arguing that we can probably show the BKL to be the outermost
lock for the majority of the remaining drivers, which only get it
from their open(), ioctl() or llseek() functions, which are all
called without any locks held. If the BKL is a regular mutex, lockdep
should warn of the other cases.

Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Stefan Richter on
Frederic Weisbecker wrote:
> On Tue, Mar 30, 2010 at 11:33:40AM +0100, Arnd Bergmann wrote:
>> I believe we can actually remove ioctl from file_operations. The patch I did
>> to convert all users to ".unlocked_ioctl = default_ioctl," should really catch
>> all cases, and I think we can enforce this by renaming fops->ioctl to locked_ioctl
>> or old_ioctl to make sure we didn't miss any, and then mandate that this one
>> is only used when unlocked_ioctl is set to default_ioctl.
>
>
> I just looked at the patch in question and noted that the changelog
> is pretty high, but how could it be else.
> Actually it's not that large, but highly spread:
>
[Documentation/ arch/, drivers/, drivers/, and more drivers/, fs/,
include/, lib/, net/, sound/, virt/]
> 157 files changed, 372 insertions(+), 80 deletions(-)
>
>
> I wonder if we should actually just turn all these into unlocked_ioctl
> directly. And then bring a warn on ioctl, and finally schedule the removal
> of this callback.

A side note: A considerable portion of this particular commit in Arnd's
git actually does not deal with .ioctl->.unlocked_ioctl at all, but
purely with .llseek. Many(?) of these changes deal with .ioctl and
..llseek together. (Arnd also says so in the last paragraph of his
changelog.)

IOW there are less .ioctl implementations left than one could think from
a look at the diffstat.
--
Stefan Richter
-=====-==-=- -=-- ----=
http://arcgraph.de/sr/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Arnd Bergmann on
On Thursday 01 April 2010, Stefan Richter wrote:
> >
> > I wonder if we should actually just turn all these into unlocked_ioctl
> > directly. And then bring a warn on ioctl, and finally schedule the removal
> > of this callback.
>
> A side note: A considerable portion of this particular commit in Arnd's
> git actually does not deal with .ioctl->.unlocked_ioctl at all, but
> purely with .llseek. Many(?) of these changes deal with .ioctl and
> .llseek together. (Arnd also says so in the last paragraph of his
> changelog.)
>
> IOW there are less .ioctl implementations left than one could think from
> a look at the diffstat.

Given our recent discussions on the llseek topic, it's probably better to
revert most of the changes that purely deal with llseek. My current idea
is to use an explicit default_llseek only if one of the following is given:

- we convert ioctl to unlocked_ioctl in the same file_operations, or
- the module uses the big kernel lock explicitly elsewhere.

Even then, there may be a number of cases where we can show it not
to be necessary, e.g. when the driver does not care about f_pos.
Concurrent llseek is racy by nature, so in most drivers, using the
BKL in llseek does not gain anything over using i_mutex.

Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Arnd Bergmann on
On Wednesday 31 March 2010, Frederic Weisbecker wrote:
> On Wed, Mar 31, 2010 at 10:21:23PM +0200, Arnd Bergmann wrote:
>
> > In the meantime, we can move the declaration of the .locked_ioctl callback
> > into an #ifdef CONFIG_BKL, to make sure nobody builds a driver with an
> > ioctl function that does not get called.
>
>
> Ok, now how to get this all merged? A single monolithic patch is probably
> not appropriate.
>
> The simplest is to have a single branch with the default_ioctl implemented,
> and then attributed to drivers in a set cut by subsystems/drivers. And
> push the whole for the next -rc1.
>
> The other solution is to push default_ioctl for this release and get
> the driver changes to each concerned tree. That said, I suspect a good
> part of them are unmaintained, hence the other solution looks better
> to me.

I don't care much about the unmaintained parts, we can always have a
tree collecting all the patches for those drivers and merge it in -rc1.

I'd say the nicest way would be to get Linus to merge the patch
below now, so we can start queuing stuff in maintainer trees on top
of it, and check against linux-next what is still missing and push
all of them from our branch.

Arnd

---
Subject: [PATCH] introduce CONFIG_BKL and default_ioctl

This is a preparation for the removal of the big kernel lock that
introduces new interfaces for device drivers still using it.

We can start marking those device drivers as 'depends on CONFIG_BKL'
now, and make that symbol optional later, when the point has come
at which we are able to build a kernel without the BKL.

Similarly, device drivers that currently make use of the implicit
BKL locking around the ioctl function can now get annotated by
changing

.ioctl = foo_ioctl,

to

.locked_ioctl = foo_ioctl,
.unlocked_ioctl = default_ioctl,

As soon as no driver remains using the old ioctl callback, it can
get removed.

Signed-off-by: Arnd Bergmann <arnd(a)arndb.de>
---
fs/ioctl.c | 22 ++++++++++++++++++++++
include/linux/fs.h | 3 +++
include/linux/smp_lock.h | 4 ++++
lib/Kconfig.debug | 10 ++++++++++
4 files changed, 39 insertions(+), 0 deletions(-)

diff --git a/fs/ioctl.c b/fs/ioctl.c
index 6c75110..52c2698 100644
--- a/fs/ioctl.c
+++ b/fs/ioctl.c
@@ -58,6 +58,28 @@ static long vfs_ioctl(struct file *filp, unsigned int cmd,
return error;
}

+#ifdef CONFIG_BKL
+/*
+ * default_ioctl - call unlocked_ioctl with BKL held
+ *
+ * Setting only the the ioctl operation but not unlocked_ioctl will become
+ * invalid in the future, all drivers that are not converted to unlocked_ioctl
+ * should set .unlocked_ioctl = default_ioctl now.
+ */
+long default_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
+{
+ int error = -ENOTTY;
+ if (filp->f_op->locked_ioctl) {
+ lock_kernel();
+ error = filp->f_op->locked_ioctl(filp->f_path.dentry->d_inode,
+ filp, cmd, arg);
+ unlock_kernel();
+ }
+ return error;
+}
+EXPORT_SYMBOL_GPL(default_ioctl);
+#endif
+
static int ioctl_fibmap(struct file *filp, int __user *p)
{
struct address_space *mapping = filp->f_mapping;
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 10b8ded..93afdb4 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1492,6 +1492,9 @@ struct file_operations {
int (*readdir) (struct file *, void *, filldir_t);
unsigned int (*poll) (struct file *, struct poll_table_struct *);
int (*ioctl) (struct inode *, struct file *, unsigned int, unsigned long);
+#ifdef CONFIG_BKL
+ int (*locked_ioctl) (struct inode *, struct file *, unsigned int, unsigned long);
+#endif
long (*unlocked_ioctl) (struct file *, unsigned int, unsigned long);
long (*compat_ioctl) (struct file *, unsigned int, unsigned long);
int (*mmap) (struct file *, struct vm_area_struct *);
diff --git a/include/linux/smp_lock.h b/include/linux/smp_lock.h
index 2ea1dd1..9cec605 100644
--- a/include/linux/smp_lock.h
+++ b/include/linux/smp_lock.h
@@ -62,4 +62,8 @@ static inline void cycle_kernel_lock(void)
#define kernel_locked() 1

#endif /* CONFIG_LOCK_KERNEL */
+
+loff_t default_llseek(struct file *file, loff_t offset, int origin);
+long default_ioctl(struct file *filp, unsigned int cmd, unsigned long arg);
+
#endif /* __LINUX_SMPLOCK_H */
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 1fafb4b..a4852d6 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -444,6 +444,16 @@ config DEBUG_MUTEXES
This feature allows mutex semantics violations to be detected and
reported.

+config BKL
+ def_bool y
+ help
+ This is the traditional lock that is used in old code instead
+ of proper locking. All drivers that use the BKL should depend
+ on this symbol.
+ This configuration option will become user-selectable in the
+ future, as soon as it is possible to build a kernel without
+ it.
+
config DEBUG_LOCK_ALLOC
bool "Lock debugging: detect incorrect freeing of live locks"
depends on DEBUG_KERNEL && TRACE_IRQFLAGS_SUPPORT && STACKTRACE_SUPPORT && LOCKDEP_SUPPORT
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: John Kacur on

----- "Arnd Bergmann" <arnd(a)arndb.de> wrote:

> On Wednesday 31 March 2010 22:21:23 Arnd Bergmann wrote:
> > Another crazy idea I had was to simply turn the BKL into a regular
> mutex
> > as soon as we can show that all remaining users are of the
> non-recursive
> > kind and don't rely on the autorelease-on-sleep. Doing that would
> be
> > much easier without the pushdown into .unlocked_ioctl than it would
> be
> > with it.
>
> I just looked at all the users of lock_kernel remaining with my patch
> series. For 90% of them, it is completely obvious that they don't
> rely
> on nested locking, and they very much look like they don't need the
> autorelease either, because the BKL was simply pushed down into the
> open, ioctl and llseek functions.
>
> There are a few file systems (udf, ncpfs, autofs, coda, ...) and some
> network protocols (appletalk, ipx, irnet and x25) for which it is not
> obviously, though still quite likely, the case.
>
> So we could actually remove the BKL recursion code soon, or even turn
> all of it into a regular mutex, at least as an experimental option.

Well, that would be quite similar to what we do in real-time with the
"Big Kernel Semaphore". However, Linus didn't want that pushed into
mainstream. As an experimental tree it's fine, but we're really stuck
with removing the BKL one by one until it's gone.

>
> The recursive users that I've removed in my series are the block,
> tty,
> input and sound subsystems, as well as the init code.
>
> Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/