Prev: [tip:sched/core] sched: move_task_off_dead_cpu(): Take rq->lock around select_fallback_rq()
Next: [tip:perf/core] perf, x86: Add Nehalem programming quirk to Westmere
From: Linus Torvalds on 8 Apr 2010 11:00
On Thu, 8 Apr 2010, Will Deacon wrote:
> I simply used smp_mb() as a way to solve this ARM-specific problem. I think
> Russell objects to this largely because this problem affects a particular
> scenario of busy-wait loops and changing the definition of cpu_relax() adds
> barriers to code that doesn't necessarily require them.
How expensive is a smp_mb() on arm?
And by "expensive" I don't mean so much performance of the instruction
itself (after all, we _are_ just busy-looping), but more about things like
power and perhaps secondary effects (does it cause memory traffic, for
Also, I have to say that _usually_ the problem with non-timely cache
updates in not on the reading side, but on the writing side - ie the other
CPU may be buffering writes indefinitely and the writes will go out only
as a response to bus cycles or the write buffers filling up. In which case
the reader can't really do much about it.
But your comment for the "smp_mb()" patch seems to imply that it's
literally a matter of cache access priorities:
"On the ARM11MPCore processor [where loads are prioritised over stores],
spinning in such a loop will prevent the write buffer from draining."
and in that case I would say that the correct thing _definitely_ is to
make sure that the loop simply is never so tight that. Maybe you can do
that without an smp_mb(), by just making whatever "cpu_relax()" does slow
enough (something that stalls the pipeline or whatever?)
But if smp_mb() is cheap, then that sounds like the right solution.
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/