From: Norbert van Bolhuis on

The problem occurs when e.g. drivers use time_after(jiffes, timeout).

CONFIG_NO_HZ could make jiffies advance by more than 1.
This is done by: tick_nohz_update_jiffies->tick_do_update_jiffies64->do_timer

If drivers use a timeout value of jiffies+1,
"time_after(jiffies, timeout)" will be true after 1 interrupt
(given that it advances jiffies by at least 2).

This is exactly what happens in cfi_cmdset_0002.c:do_write_buffer
for our case (Powerpc MPC8313, linux-2.6.28, CONFIG_HZ=250, CONFIG_NO_HZ=y).

do_write_buffer does the following:
unsigned long uWriteTimeout = ( HZ / 1000 ) + 1;
...
timeo = jiffies + uWriteTimeout;
...
for (;;) {
...
if (time_after(jiffies, timeo) && !chip_ready(map, adr))
break;
if (chip_ready(map, adr)) {
xip_enable(map, chip, adr);
goto op_done;
}
UDELAY(map, chip, adr, 1);
}
/* software timeout */
ret = -EIO;
opdone:
...

I've seen a few software timeouts after the for-loop
looped only 13 times (= 13 us delay, i.s.o. the expected 1 ms). Typically
our NOR flash (S29GL01GP) may need upto ~ 200 us to be ready.

disabling CONFIG_NO_HZ fixes the problem.
replacing time_after by a for-loop counter to loop max 1000 times
also fixes the problem.

the latest kernel seems to have the same problem.

do I miss something here or is this a known problem of CONFIG_NO_HZ ?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Corrado Zoccolo on
Is XIP enabled? It seems that the xip_udelay is not very accurate, and
may be the cause of what you are seeing.

On Thu, Sep 3, 2009 at 12:50 PM, Norbert van
Bolhuis<nvbolhuis(a)aimvalley.nl> wrote:
>
> The problem occurs when e.g. drivers use time_after(jiffes, timeout).
>
> CONFIG_NO_HZ could make jiffies advance by more than 1.
> This is done by:
> tick_nohz_update_jiffies->tick_do_update_jiffies64->do_timer
>
> If drivers use a timeout value of jiffies+1,
> "time_after(jiffies, timeout)" will be true after 1 interrupt
> (given that it advances jiffies by at least 2).
>
> This is exactly what happens in cfi_cmdset_0002.c:do_write_buffer
> for our case (Powerpc MPC8313, linux-2.6.28, CONFIG_HZ=250, CONFIG_NO_HZ=y).
>
> do_write_buffer does the following:
>  unsigned long uWriteTimeout = ( HZ / 1000 ) + 1;
>  ...
>  timeo = jiffies + uWriteTimeout;
>  ...
>  for (;;) {
>  ...
>  if (time_after(jiffies, timeo) && !chip_ready(map, adr))
>   break;
>  if (chip_ready(map, adr)) {
>   xip_enable(map, chip, adr);
>   goto op_done;
>  }
>  UDELAY(map, chip, adr, 1);
>  }
>  /* software timeout */
>  ret = -EIO;
> opdone:
>  ...
>
> I've seen a few software timeouts after the for-loop
> looped only 13 times (= 13 us delay, i.s.o. the expected 1 ms). Typically
> our NOR flash (S29GL01GP) may need upto ~ 200 us to be ready.
>
> disabling CONFIG_NO_HZ fixes the problem.
> replacing time_after by a for-loop counter to loop max 1000 times
> also fixes the problem.
>
> the latest kernel seems to have the same problem.
>
> do I miss something here or is this a known problem of CONFIG_NO_HZ ?
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo(a)vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>



--
__________________________________________________________________________

dott. Corrado Zoccolo mailto:czoccolo(a)gmail.com
PhD - Department of Computer Science - University of Pisa, Italy
--------------------------------------------------------------------------
The self-confidence of a warrior is not the self-confidence of the average
man. The average man seeks certainty in the eyes of the onlooker and calls
that self-confidence. The warrior seeks impeccability in his own eyes and
calls that humbleness.
Tales of Power - C. Castaneda
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Norbert van Bolhuis on
Corrado Zoccolo wrote:
> Is XIP enabled? It seems that the xip_udelay is not very accurate, and
> may be the cause of what you are seeing.
>

no it isn't. CONFIG_MTD_XIP has been disabled.
nor did someone supsended the write (chip->state remains FL_WRITING).
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Marcin Slusarz on
Norbert van Bolhuis wrote:
>
> The problem occurs when e.g. drivers use time_after(jiffes, timeout).
>
> CONFIG_NO_HZ could make jiffies advance by more than 1.
> This is done by:
> tick_nohz_update_jiffies->tick_do_update_jiffies64->do_timer
>
> If drivers use a timeout value of jiffies+1,
> "time_after(jiffies, timeout)" will be true after 1 interrupt
> (given that it advances jiffies by at least 2).
>
> This is exactly what happens in cfi_cmdset_0002.c:do_write_buffer
> for our case (Powerpc MPC8313, linux-2.6.28, CONFIG_HZ=250,
> CONFIG_NO_HZ=y).
>
> do_write_buffer does the following:
> unsigned long uWriteTimeout = ( HZ / 1000 ) + 1;
> ...
> timeo = jiffies + uWriteTimeout;
> ...
> for (;;) {
> ...
> if (time_after(jiffies, timeo) && !chip_ready(map, adr))
> break;
> if (chip_ready(map, adr)) {
> xip_enable(map, chip, adr);
> goto op_done;
> }
> UDELAY(map, chip, adr, 1);
> }
> /* software timeout */
> ret = -EIO;
> opdone:
> ...
>
> I've seen a few software timeouts after the for-loop
> looped only 13 times (= 13 us delay, i.s.o. the expected 1 ms). Typically

Are you sure? UDELAY may call schedule(), which can return to this thread
after much longer time than 13us...


Marcin
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Pavel Machek on
On Thu 2009-09-03 12:50:12, Norbert van Bolhuis wrote:
>
> The problem occurs when e.g. drivers use time_after(jiffes, timeout).
>
> CONFIG_NO_HZ could make jiffies advance by more than 1.
> This is done by: tick_nohz_update_jiffies->tick_do_update_jiffies64->do_timer

jiffies really should be up-to-date as long as you are running
anything else than idle loop. If not, that's a bug ... and nasty one.

--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/