From: Arnd Bergmann on
On Monday 29 March 2010, Frederic Weisbecker wrote:
> On Mon, Mar 29, 2010 at 01:18:48AM +0200, Frederic Weisbecker wrote:
> > @@ -1943,7 +1949,7 @@ static ssize_t proc_fdinfo_read(struct file *file, char __user *buf,
> > }
> >
> > static const struct file_operations proc_fdinfo_file_operations = {
> > - .open = nonseekable_open,
> > + .llseek = generic_file_llseek,
> > .read = proc_fdinfo_read,
> > };
> >
> >
> > Replacing default_llseek() by generic_file_llseek() as you
> > did for most of the other parts is fine.
> >
> > But the above changes the semantics as it makes it seekable.
> > Why not just keeping it as is? It just ends up in no_llseek().

The default is default_llseek, which uses the BKL and cannot be
used if procfs is builtin and the BKL is a module.

> There is also the ioctl part that takes the bkl in procfs.
> I'll just check nothing weird happens there wrt file pos.
> We probably first need to pushdown the bkl in the procfs
> ioctl handlers.

The BKL in procfs is only for proc files that have registered
their own .ioctl instead of .unlocked_ioctl method. Converting
every file_operations instance to provide an unlocked_ioctl
(as one of the other patches does) makes sure that this path
is never taken. BTW, there are less than a handful of procfs files
that provide an ioctl operation, and those probably should never
have been merged.

Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: John Kacur on
On Wed, Mar 24, 2010 at 11:40 PM, Arnd Bergmann <arnd(a)arndb.de> wrote:
> I've spent some time continuing the work of the people on Cc and many others
> to remove the big kernel lock from Linux and I now have bkl-removal branch
> in my git tree at git://git.kernel.org/pub/scm/linux/kernel/git/arnd/playground.git
> that lets me run a kernel on my quad-core machine with the only users of the BKL
> being mostly obscure device driver modules.
>
> The oldest patch in this series is roughly eight years old and is Willy's patch
> to remove the BKL from fs/locks.c, and I took a series of patches from Jan that
> removes it from most of the VFS.
>
> The other non-obvious changes are:
>
> - all file operations that either have an .ioctl method or do not have their
> �own .llseek method used to implicitly require the BKL. I've changed that
> �so they need to explicitly set .llseek = default_llseek, .unlocked_ioctl =
> �default_ioctl, and changed all the code that either has supplied a .ioctl
> �method or looks like it needs the BKL somewhere else, meaning the
> �default_llseek function might actually do something.
>
> - The block layer now has a global bkldev_mutex that is used in all block
> �drivers in place of the BKL. The only recursive instance of the BKL was
> �__blkdev_get(), which is now called with the blkdev_mutex held instead of
> �grabbing the BKL. This has some possible performance implications that
> �need to be looked into.
>
> - The init/main.c code no longer take the BKL. I figured that this was
> �completely unnecessary because there is no other code running at the
> �same time that takes the BKL.
>
> - The most invasive change is in the TTY layer, which has a new global
> �mutex (sorry!). I know that Alan has plans of his own to remove the BKL
> �from this subsystem, so my patches may not go anywhere, but they seem
> �to work fine for me.
> �I've called the new lock the 'Big TTY Mutex' (BTM), a name that probably
> �makes more sense if you happen to speak German.
> �The basic idea here is to make recursive locking and the release-on-sleep
> �explicit, so every mutex_lock, wait_event, workqueue_flush and schedule
> �in the TTY layer now explicitly releases the BTM before blocking.
>
> - All drivers that still require the BKL are now listed as 'depends on BKL'
> �in Kconfig, and you can set that symbol to 'y', 'm' or 'n'. If the lock
> �itself is a module, only other modules can use it, and /proc/modules
> �will tell you exactly which ones those are. I've thought about adding
> �a module_init function in that module that will taint the kernel, but so
> �far I haven't done that.
>
> - Included is a debugfs file that gives statistics over the BKL usage from
> �early boot on. This is now obsolete and will not get merged, but I'm
> �including it for reference.
>
> Frederic has volunteered to help merging all of this upstream, which I
> very much welcome. The shape that the tree is in now is very inconsistent,
> especially some of the bits at the end are a bit dodgy and all of it needs
> more testing.
>
> I've built-tested an allmodconfig kernel with CONFIG_BKL disabled
> on x86_64, i386, powerpc64, powerpc32, s390 and arm to make sure I
> catch all the modules that depend on BKL, and I've been running
> various versions of this tree on my desktop machine over the last few
> weeks while adding stuff.
>
> � � � �Arnd
>
> ---
>
> Arnd Bergmann (44):
> � � �input: kill BKL, fix input_open_file locking
> � � �ptrace: kill BKL
> � � �procfs: kill BKL in llseek
> � � �random: forbid llseek on random chardev
> � � �x86/microcode: use nonseekable_open
> � � �perf_event: use nonseekable_open
> � � �dm: use nonseekable_open
> � � �vgaarb: use nonseekable_open
> � � �kvm: don't require BKL
> � � �nvram: kill BKL
> � � �do_coredump: do not take BKL
> � � �hpet: kill BKL, add compat_ioctl
> � � �proc/pci: kill BKL
> � � �autofs/autofs4: move compat_ioctl handling into fs
> � � �usb/mon: kill BKL usage
> � � �fat: push down BKL
> � � �sunrpc: push down BKL
> � � �pcmcia: push down BKL
> � � �vfs: kill BKL in default_llseek
> � � �BKL: introduce CONFIG_BKL.
> � � �bkl-removal: make fops->ioctl and default_llseek optional
> � � �x86: update defconfig to CONFIG_BKL=m
> � � �bkl removal: make unlocked_ioctl mandatory
> � � �bkl removal: use default_llseek in code that uses the BKL
> � � �BKL removal: mark remaining users as 'depends on BKL'
> � � �tty: replace BKL with a new tty_lock
> � � �tty: make atomic_write_lock release tty_lock
> � � �tty: make tty_port->mutex nest under tty_lock
> � � �tty: make termios mutex nest under tty_lock
> � � �tty: make ldisc_mutex nest under tty_lock
> � � �tty: never hold tty_lock() while getting tty_mutex
> � � �ppp: use big tty mutex
> � � �tty: release tty lock when blocking
> � � �tty: implement BTM as mutex instead of BKL
> � � �briq_panel: do not use BTM
> � � �affs: remove leftover unlock_kernel
> � � �kvm: don't require BKL
> � � �block: replace BKL with global mutex
> � � �init: kill BKL usage
> � � �debug: instrument big kernel lock
> � � �BKL removal: make the BKL modular
>
> Matthew Wilcox (1):
> � � �[RFC] Remove BKL from fs/locks.c
>
> Jan Blunck (19):
> � � �JFS: Free sbi memory in error path
> � � �BKL: Explicitly add BKL around get_sb/fill_super
> � � �BKL: Remove outdated comment and include
> � � �BKL: Remove BKL from Amiga FFS
> � � �BKL: Remove BKL from BFS
> � � �BKL: Remove BKL from CifsFS
> � � �BKL: Remove BKL from ext3 fill_super()
> � � �BKL: Remove BKL from ext3_put_super() and ext3_remount()
> � � �BKL: Remove BKL from ext4 filesystem
> � � �BKL: Remove smp_lock.h from exofs
> � � �BKL: Remove BKL from HFS
> � � �BKL: Remove BKL from HFS+
> � � �BKL: Remove BKL from JFS
> � � �BKL: Remove BKL from NILFS2
> � � �BKL: Remove BKL from NTFS
> � � �BKL: Remove BKL from cgroup
> � � �BKL: Remove BKL from do_new_mount()
> � � �ext2: Add ext2_sb_info s_lock spinlock
> � � �BKL: Remove BKL from ext2 filesystem
> --

Great, Arnd, I like this.

I also have a private but stale tree where I have collected some
remove bkl patches (which I will review against your tree.)
I think that it is important that we keep chipping away at it though,
and that we all keep sending stuff upstream when it is ready.

Thanks
John
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Frederic Weisbecker on
On Mon, Mar 29, 2010 at 12:04:24PM +0100, Arnd Bergmann wrote:
> On Monday 29 March 2010, Frederic Weisbecker wrote:
> > On Mon, Mar 29, 2010 at 01:18:48AM +0200, Frederic Weisbecker wrote:
> > > @@ -1943,7 +1949,7 @@ static ssize_t proc_fdinfo_read(struct file *file, char __user *buf,
> > > }
> > >
> > > static const struct file_operations proc_fdinfo_file_operations = {
> > > - .open = nonseekable_open,
> > > + .llseek = generic_file_llseek,
> > > .read = proc_fdinfo_read,
> > > };
> > >
> > >
> > > Replacing default_llseek() by generic_file_llseek() as you
> > > did for most of the other parts is fine.
> > >
> > > But the above changes the semantics as it makes it seekable.
> > > Why not just keeping it as is? It just ends up in no_llseek().
>
> The default is default_llseek, which uses the BKL and cannot be
> used if procfs is builtin and the BKL is a module.



Yeah, but you removed the nonseekable_open and made generic_file_llseek
in llseek on this one.
This makes it seekable while it wasn't, changing its ABI.
It wasn't taking the bkl before that as it was calling
no_llseek().

May be its non seekable property is irrelevant, I don't know,
but if this behaviour must be changed, it should be in a
separate patch as that dosn't deal with the bkl.


> > There is also the ioctl part that takes the bkl in procfs.
> > I'll just check nothing weird happens there wrt file pos.
> > We probably first need to pushdown the bkl in the procfs
> > ioctl handlers.
>
> The BKL in procfs is only for proc files that have registered
> their own .ioctl instead of .unlocked_ioctl method. Converting
> every file_operations instance to provide an unlocked_ioctl
> (as one of the other patches does) makes sure that this path
> is never taken. BTW, there are less than a handful of procfs files
> that provide an ioctl operation, and those probably should never
> have been merged.


There are three of them. I'm going to make them .unlocked_ioctl
and push the bkl inside, and warn on further uses of .ioctl,
without applying the bkl there anymore.

That plus your bkl removal in proc seek, should totally remove the
bkl from procfs.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Arnd Bergmann on
On Monday 29 March 2010 19:59:39 Frederic Weisbecker wrote:
> On Mon, Mar 29, 2010 at 12:04:24PM +0100, Arnd Bergmann wrote:
> > On Monday 29 March 2010, Frederic Weisbecker wrote:
> > > On Mon, Mar 29, 2010 at 01:18:48AM +0200, Frederic Weisbecker wrote:
> > > > @@ -1943,7 +1949,7 @@ static ssize_t proc_fdinfo_read(struct file *file, char __user *buf,
> > > > }
> > > >
> > > > static const struct file_operations proc_fdinfo_file_operations = {
> > > > - .open = nonseekable_open,
> > > > + .llseek = generic_file_llseek,
> > > > .read = proc_fdinfo_read,
> > > > };
> > > >
> > > >
> > > > Replacing default_llseek() by generic_file_llseek() as you
> > > > did for most of the other parts is fine.
> > > >
> > > > But the above changes the semantics as it makes it seekable.
> > > > Why not just keeping it as is? It just ends up in no_llseek().
> >
> > The default is default_llseek, which uses the BKL and cannot be
> > used if procfs is builtin and the BKL is a module.
>
> Yeah, but you removed the nonseekable_open and made generic_file_llseek
> in llseek on this one.
> This makes it seekable while it wasn't, changing its ABI.
> It wasn't taking the bkl before that as it was calling
> no_llseek().

Ah, I see what you mean. That change was certainly not intentional
an should be reverted. Thanks for pointing this out.

> > The BKL in procfs is only for proc files that have registered
> > their own .ioctl instead of .unlocked_ioctl method. Converting
> > every file_operations instance to provide an unlocked_ioctl
> > (as one of the other patches does) makes sure that this path
> > is never taken. BTW, there are less than a handful of procfs files
> > that provide an ioctl operation, and those probably should never
> > have been merged.
>
>
> There are three of them. I'm going to make them .unlocked_ioctl
> and push the bkl inside, and warn on further uses of .ioctl,
> without applying the bkl there anymore.
>
> That plus your bkl removal in proc seek, should totally remove the
> bkl from procfs.

Ok

Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Roland Dreier on
OK, I added the following to my tree, currently queued in my for-next
branch for 2.6.35:


IB: Explicitly rule out llseek to avoid BKL in default_llseek()

Several RDMA user-access drivers have file_operations structures with
no .llseek method set. None of the drivers actually do anything with
f_pos, so this means llseek is essentially a NOP, instead of returning
an error as leaving other file_operations methods unimplemented would
do. This is mostly harmless, except that a NULL .llseek means that
default_llseek() is used, and this function grabs the BKL, which we
would like to avoid.

Since llseek does nothing useful on these files, we would like it to
return an error to userspace instead of silently grabbing the BKL and
succeeding. For nearly all of the file types, we take the
belt-and-suspenders approach of setting the .llseek method to
no_llseek and also calling nonseekable_open(); the exception is the
uverbs_event files, which are created with anon_inode_getfile(), which
already sets f_mode the same way as nonseekable_open() would.

This work is motivated by Arnd Bergmann's bkl-removal tree.

Signed-off-by: Roland Dreier <rolandd(a)cisco.com>
---
drivers/infiniband/core/ucm.c | 3 ++-
drivers/infiniband/core/ucma.c | 4 +++-
drivers/infiniband/core/user_mad.c | 12 ++++++++----
drivers/infiniband/core/uverbs_main.c | 11 +++++++----
4 files changed, 20 insertions(+), 10 deletions(-)

diff --git a/drivers/infiniband/core/ucm.c b/drivers/infiniband/core/ucm.c
index 017d6e2..7ef3954 100644
--- a/drivers/infiniband/core/ucm.c
+++ b/drivers/infiniband/core/ucm.c
@@ -1180,7 +1180,7 @@ static int ib_ucm_open(struct inode *inode, struct file *filp)
file->filp = filp;
file->device = container_of(inode->i_cdev, struct ib_ucm_device, cdev);

- return 0;
+ return nonseekable_open(inode, filp);
}

static int ib_ucm_close(struct inode *inode, struct file *filp)
@@ -1228,6 +1228,7 @@ static const struct file_operations ucm_fops = {
.release = ib_ucm_close,
.write = ib_ucm_write,
.poll = ib_ucm_poll,
+ .llseek = no_llseek,
};

static ssize_t show_ibdev(struct device *dev, struct device_attribute *attr,
diff --git a/drivers/infiniband/core/ucma.c b/drivers/infiniband/core/ucma.c
index b2e16c3..09d4a3b 100644
--- a/drivers/infiniband/core/ucma.c
+++ b/drivers/infiniband/core/ucma.c
@@ -1219,7 +1219,8 @@ static int ucma_open(struct inode *inode, struct file *filp)

filp->private_data = file;
file->filp = filp;
- return 0;
+
+ return nonseekable_open(inode, filp);
}

static int ucma_close(struct inode *inode, struct file *filp)
@@ -1249,6 +1250,7 @@ static const struct file_operations ucma_fops = {
.release = ucma_close,
.write = ucma_write,
.poll = ucma_poll,
+ .llseek = no_llseek,
};

static struct miscdevice ucma_misc = {
diff --git a/drivers/infiniband/core/user_mad.c b/drivers/infiniband/core/user_mad.c
index 04b585e..2bb9703 100644
--- a/drivers/infiniband/core/user_mad.c
+++ b/drivers/infiniband/core/user_mad.c
@@ -780,7 +780,7 @@ static int ib_umad_open(struct inode *inode, struct file *filp)
{
struct ib_umad_port *port;
struct ib_umad_file *file;
- int ret = 0;
+ int ret;

port = container_of(inode->i_cdev, struct ib_umad_port, cdev);
if (port)
@@ -813,6 +813,8 @@ static int ib_umad_open(struct inode *inode, struct file *filp)

list_add_tail(&file->port_list, &port->file_list);

+ ret = nonseekable_open(inode, filp);
+
out:
mutex_unlock(&port->file_mutex);
return ret;
@@ -865,7 +867,8 @@ static const struct file_operations umad_fops = {
.compat_ioctl = ib_umad_compat_ioctl,
#endif
.open = ib_umad_open,
- .release = ib_umad_close
+ .release = ib_umad_close,
+ .llseek = no_llseek,
};

static int ib_umad_sm_open(struct inode *inode, struct file *filp)
@@ -902,7 +905,7 @@ static int ib_umad_sm_open(struct inode *inode, struct file *filp)

filp->private_data = port;

- return 0;
+ return nonseekable_open(inode, filp);

fail:
kref_put(&port->umad_dev->ref, ib_umad_release_dev);
@@ -932,7 +935,8 @@ static int ib_umad_sm_close(struct inode *inode, struct file *filp)
static const struct file_operations umad_sm_fops = {
.owner = THIS_MODULE,
.open = ib_umad_sm_open,
- .release = ib_umad_sm_close
+ .release = ib_umad_sm_close,
+ .llseek = no_llseek,
};

static struct ib_client umad_client = {
diff --git a/drivers/infiniband/core/uverbs_main.c b/drivers/infiniband/core/uverbs_main.c
index 1444f95..a16a91e 100644
--- a/drivers/infiniband/core/uverbs_main.c
+++ b/drivers/infiniband/core/uverbs_main.c
@@ -385,7 +385,8 @@ static const struct file_operations uverbs_event_fops = {
.read = ib_uverbs_event_read,
.poll = ib_uverbs_event_poll,
.release = ib_uverbs_event_close,
- .fasync = ib_uverbs_event_fasync
+ .fasync = ib_uverbs_event_fasync,
+ .llseek = no_llseek,
};

void ib_uverbs_comp_handler(struct ib_cq *cq, void *cq_context)
@@ -639,7 +640,7 @@ static int ib_uverbs_open(struct inode *inode, struct file *filp)

filp->private_data = file;

- return 0;
+ return nonseekable_open(inode, filp);

err_module:
module_put(dev->ib_dev->owner);
@@ -667,7 +668,8 @@ static const struct file_operations uverbs_fops = {
.owner = THIS_MODULE,
.write = ib_uverbs_write,
.open = ib_uverbs_open,
- .release = ib_uverbs_close
+ .release = ib_uverbs_close,
+ .llseek = no_llseek,
};

static const struct file_operations uverbs_mmap_fops = {
@@ -675,7 +677,8 @@ static const struct file_operations uverbs_mmap_fops = {
.write = ib_uverbs_write,
.mmap = ib_uverbs_mmap,
.open = ib_uverbs_open,
- .release = ib_uverbs_close
+ .release = ib_uverbs_close,
+ .llseek = no_llseek,
};

static struct ib_client uverbs_client = {
--
1.7.0.3


--
Roland Dreier <rolandd(a)cisco.com> || For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/index.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/