From: Andy Whitcroft on
On Fri, Aug 6, 2010 at 9:22 PM, Greg KH <gregkh(a)suse.de> wrote:
> On Fri, Aug 06, 2010 at 09:58:45PM +0200, Arnd Bergmann wrote:
>> On Friday 06 August 2010 21:38:36 Linus Torvalds wrote:
>> > I also don't want the BKL to show up in grep, even if the config
>> > option for it were to be impossible to enable. If we've gotten far
>> > enough that we can get rid of the BKL in the tty layer, we really
>> > should get rid of it. Not leave it in some zombie limbo state.
>>
>> Ok, I've sent the replacement patch that does this, and I'm much
>> happier with the outcome.
>>
>> If you apply this version, the only stronghold left for the BKL
>> will be fs/lockd, everything else that's left is either simple
>> to fix or highly obscure.
>
> Ok, I got that one now, and am building the tree to test here before
> sending to Linus.

I have been looking at the proposed change to the TTY locking here.
The conversion of the BKL into a real non-preemptable lock for the TTY
layer has had an interesting semantic change on racing open/closes on
the same TTY. Prior to the changes we would:

- grab the BLK and tty_mutex
- mark the device TTY_CLOSING
- drop the tty_mutex
- call device shutdown
- prevent the device from being found
- drop the BKL

The use of the BLK to cover the device shutdown has an interesting
effect. Where the shutdown processing is simple the lock is
maintained and the whole processing is atomic with respect to racing
opens and closes, such that an open would never see TTY_CLOSING. For
any shutdown processing which resulted in a sleep then the BKL would
be implicitly dropped and reaquired allowing any racing opens to
continue and triggering an open failure errno=EIO, something POSIX at
least hints at as reasonable. As shutdown can include hardware
interactions this seems like appropriate behaviour, and cirtinly the
git log documents this as expected and desirable semantics.

With the new locking the BTM is held in place of the BKL, which
effectivly means throughout open and close processing, which
effectively means _all_ TTY open/closes are serialised throughout
regardless of the length of the shutdown processing. The EIO appears
to be no longer returnable.

Is this change in semantics expected? If not, it is likely something
which could be addressed separately after merging, as long as we are
aware.

-apw
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Arnd Bergmann on
On Monday 09 August 2010, Andy Whitcroft wrote:
> With the new locking the BTM is held in place of the BKL, which
> effectivly means throughout open and close processing, which
> effectively means all TTY open/closes are serialised throughout
> regardless of the length of the shutdown processing. The EIO appears
> to be no longer returnable.
>
> Is this change in semantics expected? If not, it is likely something
> which could be addressed separately after merging, as long as we are
> aware.

I audited (or tried to) all code that is able to sleep while holding the
BTM. In the cases that looked like they might sleep long, I added explicit
tty_unlock()/tty_lock() pairs around the sleeping code.

Code that potentially sleeps but should not do so for an indefinite
time (e.g. kmalloc(GFP_KERNEL)), I did not do this and still hold the BKL.

This means the code shifted to holding the BTM more often than it used
to hold the BKL, but for cases where we wait on user or hardware interaction,
it should still behave as it used to. If you found a place where this
is not true, that is probably an oversight and we should add explicit
tty_unlock() statements around it.

The problem is that we cannot easily give up the BTM while already holding
tty_port->mutex, tty_port->buf_mutex, console_semaphore, tty->termios_mutex
or tty->atomic_write_lock, which would give an AB-BA deadlock without
the autorelease semantics of the BKL. Similarly, any wait_event or
flush_work_queue potentially has the same problem (besides the fact that
it can self-deadlock what it waits for tried to take the BTM). Auditing
all these was the bulk of the work for the conversion.

I did not specifically audit for the case where tty_open() returns -EIO,
but I'd hope that it would be covered by the method I used for the
conversion.

Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Alan Cox on
> With the new locking the BTM is held in place of the BKL, which
> effectivly means throughout open and close processing, which
> effectively means _all_ TTY open/closes are serialised throughout
> regardless of the length of the shutdown processing. The EIO appears
> to be no longer returnable.

The specific case of waiting for queues to empty could be a slow one and
probably wants to addressed nicely. The other causes like memory
allocator waits are probably actually better the new way. What exactly is
an application supposed to do when it gets a -EIO on open, how does it
decide when to try again, when does it retry, how many aps get it right ?

In fact I don't think its unreasonable to treat an open without O_NDELAY
that blocks on a close flushing output from buffers as correct - unless
there is some specific POSIX reason why we cannot.

I'd venture we'd effective fix/hide/cover over a huge range of extremely
obscure and near untestable app failure cases/bugs by doing this.

Alan
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Andy Whitcroft on
On Mon, Aug 09, 2010 at 08:24:53PM +0100, Alan Cox wrote:
> > With the new locking the BTM is held in place of the BKL, which
> > effectivly means throughout open and close processing, which
> > effectively means _all_ TTY open/closes are serialised throughout
> > regardless of the length of the shutdown processing. The EIO appears
> > to be no longer returnable.

Arnd, thanks for your clear explanation. I see now you have actually
addressed the long waits in the main.

> The specific case of waiting for queues to empty could be a slow one and
> probably wants to addressed nicely. The other causes like memory
> allocator waits are probably actually better the new way. What exactly is
> an application supposed to do when it gets a -EIO on open, how does it
> decide when to try again, when does it retry, how many aps get it right ?

I know to my cost they break. For me, console_sem is one where we
commonly trigger this behaviour (before these changes), so affecting bog
standard VTs.

> In fact I don't think its unreasonable to treat an open without O_NDELAY
> that blocks on a close flushing output from buffers as correct - unless
> there is some specific POSIX reason why we cannot.
>
> I'd venture we'd effective fix/hide/cover over a huge range of extremely
> obscure and near untestable app failure cases/bugs by doing this.

Yes that is very likely. That the longer waits are picked out and
release BTM seems a completely sane approach and covers my query
completely.

-apw
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/