From: Stephen Hemminger on
On Thu, 3 Jun 2010 15:33:23 -0700 (PDT)
Linus Torvalds <torvalds(a)linux-foundation.org> wrote:

>
>
> On Thu, 3 Jun 2010, Linus Torvalds wrote:
> >
> > > So still a race that shows up with KVM (fast floppy?) and manifests
> > > as floppy_ready or reset_interrupt OOPS.
> >
> > Yes, it's quite possible that the Linux floppy driver is simply broken by
> > any floppy device that basically responds immediately to a command with an
> > interrupt. And considering how few people use floppies, I do expect that
> > driver to get _worse_ rather than better in the future.
>
> Having looked at that driver some more, I can inf act pretty much
> guarantee it. The locking is rather baroque. It has a "floppy_lock", but
> that only protects certain small parts. In particular, it looks like the
> irq handler and the timers do _not_ take it, and that's where most of the
> real work is done.
>
> And in fact, that does look broken. The interrupt handler really does a
> "schedule_work()" to schedule the actual handler outside of irq context,
> and I don't see any serialization between the timers that file and the
> handler running.
>
> That driver used to be this state machine that ran entirely from interrupt
> context, where one interrupt handler would set the state for the next one
> (that's what the "do_floppy" thing is for). But then it became bottom
> halves, and now it's using schedule_work() instead - and at the same time,
> the _timers_ haven't really changed. Those run in timer context, and can
> thus interrupt the work thing.
>
> It always was a disgusting driver. Now it's just even more so. And yes,
> I'm sure it's full of races that are largely hidden by the fact that real
> floppy hardware is so slow that you can never hit them.
>
> Looking too much at that driver will cause PTSD. I have to look away.

Thank you for confirming my suspicions. Given the state of destruction
there, bug fixing is like playing Jenga.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Linus Torvalds on


On Thu, 3 Jun 2010, Stephen Hemminger wrote:
>
> Thank you for confirming my suspicions. Given the state of destruction
> there, bug fixing is like playing Jenga.

I suspect it's fixable, but it would probably involve a lot of careful
moving around of that "floppy_lock" spinlock. Add various asserts to make
sure that it's held in all cases, and then for each warning you get, you
add the proper spinlock until it's all properly protected.

The _original_ protection was just from irqs being atomic (UP, remember),
and the block layer queueing happening from irq-safe context. You're still
running it on UP, but we've even lost the irq-handler protection (and then
later, the bottom-half mutual exclusion).

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Stephen Hemminger on
On Thu, 3 Jun 2010 15:33:23 -0700 (PDT)
Linus Torvalds <torvalds(a)linux-foundation.org> wrote:

>
>
> On Thu, 3 Jun 2010, Linus Torvalds wrote:
> >
> > > So still a race that shows up with KVM (fast floppy?) and manifests
> > > as floppy_ready or reset_interrupt OOPS.
> >
> > Yes, it's quite possible that the Linux floppy driver is simply broken by
> > any floppy device that basically responds immediately to a command with an
> > interrupt. And considering how few people use floppies, I do expect that
> > driver to get _worse_ rather than better in the future.
>
> Having looked at that driver some more, I can inf act pretty much
> guarantee it. The locking is rather baroque. It has a "floppy_lock", but
> that only protects certain small parts. In particular, it looks like the
> irq handler and the timers do _not_ take it, and that's where most of the
> real work is done.
>
> And in fact, that does look broken. The interrupt handler really does a
> "schedule_work()" to schedule the actual handler outside of irq context,
> and I don't see any serialization between the timers that file and the
> handler running.
>
> That driver used to be this state machine that ran entirely from interrupt
> context, where one interrupt handler would set the state for the next one
> (that's what the "do_floppy" thing is for). But then it became bottom
> halves, and now it's using schedule_work() instead - and at the same time,
> the _timers_ haven't really changed. Those run in timer context, and can
> thus interrupt the work thing.
>
> It always was a disgusting driver. Now it's just even more so. And yes,
> I'm sure it's full of races that are largely hidden by the fact that real
> floppy hardware is so slow that you can never hit them.
>
> Looking too much at that driver will cause PTSD. I have to look away.
>
> Linus

Maybe putting all back together in a threaded_irq would be safest.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Linus Torvalds on


On Thu, 3 Jun 2010, Stephen Hemminger wrote:
>
> Maybe putting all back together in a threaded_irq would be safest.

Yes. That floppy driver could easily be a good case for using those
threaded irq's. The problem, of course, is to find somebody motivated
enough. The code-base really is pretty dang ugly, and it might be hard to
do it incrementally, I think.

(And starting from scratch is likely not a great idea either - while
_some_ of the ugliness comes from the odd irq-driven state machine code, a
lot of it also comes from trying to handle all those floppy formats etc)

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Nick Bowler on
On 15:40 Thu 03 Jun , Linus Torvalds wrote:
> Although one comment says it all:
>
> Cons: I ordered 5. After 45 days 3 of them have failed. Too late to return.
>
> so apparently you do need to order a lot of them to keep them going ;)

I actually still have a real floppy drive in my primary desktop. Bought
it new in 2001, and it still worked when I used it (once!) in fall 2008.

That being said, it would have been quite frustrating if Linux oopsed
when I tried to use this piece of hardware. It was frustrating enough
to even find a single disk to put in it :).

--
Nick Bowler, Elliptic Technologies (http://www.elliptictech.com/)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/