From: Andrew Morton on
On Tue, 26 Jan 2010 14:09:45 GMT tip-bot for Thomas Gleixner <tglx(a)linutronix.de> wrote:

> --- a/kernel/time/clocksource.c
> +++ b/kernel/time/clocksource.c
> @@ -343,7 +343,19 @@ static void clocksource_resume_watchdog(void)
> {
> unsigned long flags;
>
> - spin_lock_irqsave(&watchdog_lock, flags);
> + /*
> + * We use trylock here to avoid a potential dead lock when
> + * kgdb calls this code after the kernel has been stopped with
> + * watchdog_lock held. When watchdog_lock is held we just
> + * return and accept, that the watchdog might trigger and mark
> + * the monitored clock source (usually TSC) unstable.
> + *
> + * This does not affect the other caller clocksource_resume()
> + * because at this point the kernel is UP, interrupts are
> + * disabled and nothing can hold watchdog_lock.
> + */
> + if (!spin_trylock_irqsave(&watchdog_lock, flags))
> + return;
> clocksource_reset_watchdog();
> spin_unlock_irqrestore(&watchdog_lock, flags);
> }
>
> @@ -458,8 +470,8 @@ void clocksource_resume(void)
> * clocksource_touch_watchdog - Update watchdog
> *
> * Update the watchdog after exception contexts such as kgdb so as not
> - * to incorrectly trip the watchdog.
> - *
> + * to incorrectly trip the watchdog. This might fail when the kernel
> + * was stopped in code which holds watchdog_lock.
> */
> void clocksource_touch_watchdog(void)
> {

It would be neater and a shade more reliable to add a separate
clocksource_try_touch_watchdog() for kgdb's use. Which could be inside
#ifdef CONFIG_KGDB.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Jason Wessel on
Andrew Morton wrote:
> On Tue, 26 Jan 2010 14:09:45 GMT tip-bot for Thomas Gleixner <tglx(a)linutronix.de> wrote:
>
>
>> --- a/kernel/time/clocksource.c
>> +++ b/kernel/time/clocksource.c
>> @@ -343,7 +343,19 @@ static void clocksource_resume_watchdog(void)
>> {
>> unsigned long flags;
>>
>> - spin_lock_irqsave(&watchdog_lock, flags);
>> + /*
>> + * We use trylock here to avoid a potential dead lock when
>> + * kgdb calls this code after the kernel has been stopped with
>> + * watchdog_lock held. When watchdog_lock is held we just
>> + * return and accept, that the watchdog might trigger and mark
>> + * the monitored clock source (usually TSC) unstable.
>> + *
>> + * This does not affect the other caller clocksource_resume()
>> + * because at this point the kernel is UP, interrupts are
>> + * disabled and nothing can hold watchdog_lock.
>> + */
>> + if (!spin_trylock_irqsave(&watchdog_lock, flags))
>> + return;
>> clocksource_reset_watchdog();
>> spin_unlock_irqrestore(&watchdog_lock, flags);
>> }
>>
>> @@ -458,8 +470,8 @@ void clocksource_resume(void)
>> * clocksource_touch_watchdog - Update watchdog
>> *
>> * Update the watchdog after exception contexts such as kgdb so as not
>> - * to incorrectly trip the watchdog.
>> - *
>> + * to incorrectly trip the watchdog. This might fail when the kernel
>> + * was stopped in code which holds watchdog_lock.
>> */
>> void clocksource_touch_watchdog(void)
>> {
>>
>
> It would be neater and a shade more reliable to add a separate
> clocksource_try_touch_watchdog() for kgdb's use. Which could be inside
> #ifdef CONFIG_KGDB.
>
>

The kernel debugger is the only user of this API function so
realistically you do not need a new function, and could simply rename
this one, if the name does not fit.

-- more notes on the problem --

I further diagnosed the problem, and it is reasonably rare that it
happens in the first place. I had an instrumented version of the kernel
debugger which showed the problem happening quite frequently, while
trying to fix hw breakpoints.

The problem is not actually an interprocessor deadlock, but that of
re-entering the spin_lock() when it is already held by the interrupted
task on the same cpu. There is no obvious way that I could see to check
for this specific condition and to abort.

For now the patch Thomas created is sufficient, and given some time
we'll decide how to best live with the side effects or to consider any
further changes.

Jason.

Tested-by: Jason Wessel <jason.wessel(a)windriver.com>


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/