From: Linus Torvalds on


On Fri, 2 Apr 2010, Jason Wessel wrote:
>
> A cpu_relax() does not mandate that there is an smp memory barrier.
> As a result on the arm smp architecture the kernel debugger can hang
> on entry from time to time, as shown by the kgdb regression tests.
>
> The solution is simply to use the atomic operators which include a
> proper smp memory barrier, instead of using atomic_set() and
> atomic_read().

Hmm. While I absolutely agree that 'cpu_relax()' does not imply a memory
barrier, I disagree that this change should be needed. If ARM has odd
semantics where it will never see changes in a busy loop, then ARM is
buggy, and that has _nothing_ to do with the Linux notion of memory
barriers.

The _whole_ point of "cpu_relax()" is to have busy loops. And the point of
busy loops is that they are waiting for something to change. So if this
loop:

> for_each_online_cpu(i) {
> - while (atomic_read(&cpu_in_kgdb[i]))
> + while (atomic_add_return(0, &cpu_in_kgdb[i]))
> cpu_relax();
> }

can somehow lock up because "cpu_relax()" doesn't work with an infinite
"while (atomic_read(..))" loop, then the ARM implementation of cpu_relax()
is buggy.

Here's a simple example of exactly these kinds of busy loops waiting for
something to change using cpu_relax() from generic kernel code:

ipc/mqueue.c- while (ewp->state == STATE_PENDING)
ipc/mqueue.c: cpu_relax();

ipc/msg.c- while (msg == NULL) {
ipc/msg.c: cpu_relax();

kernel/sched.c- while (task_is_waking(p))
kernel/sched.c: cpu_relax();

kernel/smp.c- while (data->flags & CSD_FLAG_LOCK)
kernel/smp.c: cpu_relax();

so I'd like to understand what the ARM issue is.

Does ARM have some broken cache coherency model where writes by other
CPU's _never_ show up unless the reading CPU does some memory sync thing?
If so, then cpu_relax() obviously does need to do that syncing
instruction.

And no, that does NOT mean that "cpu_relax()" has any memory barrier
semantics. All it means is that cpu_relax() obviously is some
architecture-specific way of saying "I'm in a busy loop, waiting for
something".

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Jason Wessel on
On 04/02/2010 02:12 PM, Linus Torvalds wrote:
>
>
> Hmm. While I absolutely agree that 'cpu_relax()' does not imply a memory
> barrier, I disagree that this change should be needed. If ARM has odd
> semantics where it will never see changes in a busy loop, then ARM is
> buggy, and that has _nothing_ to do with the Linux notion of memory
> barriers.
>
> The _whole_ point of "cpu_relax()" is to have busy loops. And the point of
> busy loops is that they are waiting for something to change. So if this
> loop:
>
>
>> for_each_online_cpu(i) {
>> - while (atomic_read(&cpu_in_kgdb[i]))
>> + while (atomic_add_return(0, &cpu_in_kgdb[i]))
>> cpu_relax();
>> }
>>
>
>

So this part might be overkill, but I don't actually have the hardware,
schematics or reference manuals to ascertain what is going on. The
other changes in this patch should be correct because we really do want
memory barriers which come along with the inc and dec operators.

> can somehow lock up because "cpu_relax()" doesn't work with an infinite
> "while (atomic_read(..))" loop, then the ARM implementation of cpu_relax()
> is buggy.
>
>

Will originally proposed a patch for cpu_relax:
http://lists.infradead.org/pipermail/linux-arm-kernel/2010-March/011076.html
--- The patch ---

+#if __LINUX_ARM_ARCH__ == 6
+#define cpu_relax() smp_mb()
+#else
#define cpu_relax() barrier()
+#endif

---

Russell had this thread:
http://permalink.gmane.org/gmane.linux.ports.arm.kernel/75717

--- clip from url ---

cpu_relax() is also defined to be a compiler barrier so that the compiler
reloads the variable on every iteration.

> This patch changes the definition of cpu_relax() to smp_mb() for ARMv6 cores,
> forcing a flushing of the write buffer on SMP systems. If the Kernel is not
> compiled for SMP support, this will expand to a barrier() as before.

I don't think this is correct. You're making a macro do something on ARM
which no other platform, apart from blackfin (which I believe is wrong)
makes it do.

---

> Does ARM have some broken cache coherency model where writes by other
> CPU's _never_ show up unless the reading CPU does some memory sync thing?
> If so, then cpu_relax() obviously does need to do that syncing
> instruction.
>
>

Given your statements, I can just keep the atomic reads as they were
previously, but keep the inc and dec parts. And we can wait for a
further response from either Will or Russell.

Thanks,
Jason.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Linus Torvalds on


On Fri, 2 Apr 2010, Jason Wessel wrote:
>
> Russell had this thread:
> http://permalink.gmane.org/gmane.linux.ports.arm.kernel/75717

Russell is wrong.

Yes, originally it was about P4's overheating. But let me repeat: the fact
is, this _is_ valid kernel code:

kernel/sched.c- while (task_is_waking(p))
kernel/sched.c: cpu_relax();

(where that "task_is_waking()" is simply doing two regular reads, and
expects another CPU to be changing them).

This has _nothing_ to do with memory barriers, or with overheating.

The fact that maybe some ARM6 cache coherency implementation is pure and
utter sh*t and never sees the changes without the same instruction that
happens to be a memory barrier on that architecture does not make that
cpu_relax() any more about memory barriers.

Similarly, the fact that P4's wanted cpu_relax() in order to not overheat
and cause slowdowns has _nothing_ to do with anything.

All that matters is that the above kind of while loop must work. The
architecture needs to do whatever it needs to do to make it work. End of
discussion. If on ARM6 that means "smp_mb()", then that's an ARM6
implementation issue.

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Linus Torvalds on


On Fri, 2 Apr 2010, Linus Torvalds wrote:
>
> All that matters is that the above kind of while loop must work. The
> architecture needs to do whatever it needs to do to make it work. End of
> discussion. If on ARM6 that means "smp_mb()", then that's an ARM6
> implementation issue.

Put another way: from a kernel standpoint, cpu_relax() in _no_ way implies
a memory barrier. That has always been true, and that continues to be
true.

But Linux does expect that if some other CPU modifies a memory location,
then we _will_ see that modification eventually. If the CPU needs help to
do so, then cpu_relax() needs to do that. Again - this has nothing to do
with memory barriers. It's just a basic requirement.

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Linus Torvalds on


On Fri, 2 Apr 2010, Linus Torvalds wrote:
>
> But Linux does expect that if some other CPU modifies a memory location,
> then we _will_ see that modification eventually. If the CPU needs help to
> do so, then cpu_relax() needs to do that. Again - this has nothing to do
> with memory barriers. It's just a basic requirement.

Btw, this is not necessarily just limited to cpu_relax() either. The
assumption that we'll eventually see changes to memory is pretty common.

If some architecture doesn't see updates from other CPU's (or maybe DMA)
without extra help, I suspect that it might be a good idea to not just
have IO instructions but also things like 'udelay()' have that "extra
helping sauce" in them.

For the exact same reason: we have drivers that depend on things like
jiffies updates happening automatically, eg:

drivers/scsi/u14-34f.c: while ((jiffies - time) < (10 * HZ) && limit++ < 200000) udelay(100L);

or

drivers/isdn/hisax/elsa_ser.c- while(timeout-- && cs->hw.elsa.transcnt)
drivers/isdn/hisax/elsa_ser.c: udelay(1000);

or

drivers/serial/68328serial.c: while (!(uart->utx.w & UTX_TX_AVAIL)) udelay(5);

where those variables we read may be updated from another CPU taking an
interrupt (or by DMA - I didn't look at how UTX_TX_AVAIL gets set, for
example).

Now, in practice, I suspect the above kind of busy loop is _way_ less
common. It's harder to grep for (I picked 'udelay()' to look for, but
the result is full of noise that does IO etc that would presumably fix
things anyway, so the above two are just random examples of some drivers
basically reading variables in a busy loop).

And for something like ARM, random drivers probably don't much matter. So
I doubt that this udelay() thing is at _all_ as important as cpu_relax()
is, and ARM maintainers can probably happily just ignore it.

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/