sequence lock in Linux [Kernel]

Prev: Btrfs updates for 2.6.35
Next: drivers/net/e1000/e1000_main.c: Fix message logging defect

From: Mathieu Desnoyers on 11 Jun 2010 17:10

* Linus Torvalds (torvalds(a)linux-foundation.org) wrote:
> On Fri, Jun 11, 2010 at 12:40 PM, Mathieu Desnoyers
> <mathieu.desnoyers(a)efficios.com> wrote:
> >
> > Is it just me, or the following code:
> >
> > static __always_inline unsigned read_seqbegin(const seqlock_t *sl)
> > {
> > � � � �unsigned ret;
> >
> > repeat:
> > � � � �ret = sl->sequence;
> > � � � �smp_rmb();
> > � � � �if (unlikely(ret & 1)) {
> > � � � � � � � �cpu_relax();
> > � � � � � � � �goto repeat;
> > � � � �}
> >
> > � � � �return ret;
> > }
> >
> > could use a ACCESS_ONCE() around the sl->sequence read ? I'm concerned about the
> > compiler generating code that reads the sequence number chunkwise.
>
> What compiler would do that? That would seem to be a compiler bug, or
> a compiler that is just completely crazy.
>
> But it wouldn't be _wrong_ to make it do ACCESS_ONCE(). I just suspect
> that any compiler that cares is not a compiler worth worrying about,
> and the compiler should be shot in the head rather than us necessarily
> worrying about it.
>
> There is no way a sane compiler can do anything but one read anyway.
> We do end up using all the bits (for the "return ret") part, so a
> compiler that reads the low bit separately is just being a totally
> moronic one - we wouldn't want to touch such a stupid compiler with a
> ten-foot pole.

If for some reason it is better (faster for -O2, smaller code for -Os) on a
given architecture to read the low bits separately from the rest and populate
two different registers, one for the test and the other for the return value,
then a not-so-moronic compiler might actually do this. One reason I could see
for generating this kind of code is compiling with -Os, if the kind of behavior
I describe above generates smaller code.

>
> But at the same time, ACCESS_ONCE() ends up being a reasonable hint to
> programmers, so I wouldn't object to it. I just don't think we should
> pander to "compilers can be crazy". If compilers are crazy, we
> shouldn't use them.

I'm just afraid about the possibility that non-crazy compilers might actually
have a good reason to do this we haven't thought of yet.

FWIW, in the userspace RCU library, I wrapped all non-protected accesses to
shared word-sized aligned variables (e.g. rcu_assign_pointer(),
rcu_dereference_pointer(), ..) with LOAD_SHARED() and STORE_SHARED() accessors,
which are basically just volatile loads and stores.

That's just me making absolutely sure the compiler won't perform anything
chunkwise nor perform multiple loads. It also ensures we are somewhat ready for
upcoming architectures with non-coherent caches. Having all these accesses in a
convenient macro makes it much easier to insert cache flushes whenever needed.

Thanks,

Mathieu

--
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: H. Peter Anvin on 11 Jun 2010 17:10

On 06/11/2010 01:36 PM, Paul E. McKenney wrote:
>
> The reason that the C standard permits this is to allow for things like
> 8-bit CPUs, which are simply unable to load or store 32-bit quantities
> except by doing it chunkwise. But I don't expect the Linux kernel to
> boot on these, and certainly not on any of the ones that I have used!
>
> I most definitely remember seeing a gcc guarantee that loads and stores
> would be done in one instruction whenever the hardware supported this,
> but I am not finding it today. :-(
>

What gcc does not -- and should not -- guarantee is that accessing a
non-volatile member is done exactly once. As Mathieu pointed out, it
can choose to drop it due to register pressure and load it again.

What is possibly a much bigger risk -- since this is an inline -- is
that the value is cached from a previous piece of code, *or* that since
the structure is const(!) that the second read in the repeat loop is
elided. Presumably current versions of gcc don't do that across a
memory clobber, but that doesn't seem entirely out of the question.

-hpa
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: H. Peter Anvin on 11 Jun 2010 17:40

On 06/11/2010 02:36 PM, Paul E. McKenney wrote:
> Memory barriers in the sequence-lock code prevent this, assuming, as
> you point out, that memory clobber works (but if it doesn't, it should
> be fixed):

The constness is my main concern. It's not clear to me that "memory" is
meant to imply that const memory areas without volatile can be clobbered.

-hpa
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Paul E. McKenney on 11 Jun 2010 17:40

On Fri, Jun 11, 2010 at 02:06:01PM -0700, H. Peter Anvin wrote:
> On 06/11/2010 01:36 PM, Paul E. McKenney wrote:
> >
> > The reason that the C standard permits this is to allow for things like
> > 8-bit CPUs, which are simply unable to load or store 32-bit quantities
> > except by doing it chunkwise. But I don't expect the Linux kernel to
> > boot on these, and certainly not on any of the ones that I have used!
> >
> > I most definitely remember seeing a gcc guarantee that loads and stores
> > would be done in one instruction whenever the hardware supported this,
> > but I am not finding it today. :-(
>
> What gcc does not -- and should not -- guarantee is that accessing a
> non-volatile member is done exactly once. As Mathieu pointed out, it
> can choose to drop it due to register pressure and load it again.
>
> What is possibly a much bigger risk -- since this is an inline -- is
> that the value is cached from a previous piece of code, *or* that since
> the structure is const(!) that the second read in the repeat loop is
> elided. Presumably current versions of gcc don't do that across a
> memory clobber, but that doesn't seem entirely out of the question.

Memory barriers in the sequence-lock code prevent this, assuming, as
you point out, that memory clobber works (but if it doesn't, it should
be fixed):

o write_seqlock() and write_tryseqlock() each have an smp_wmb()
following the increment. Ditto for write_seqcount_begin().

o write_sequnlock() has an smp_wmb() preceding the increment,
and ditto for write_seqcount_end(). There are thus two smp_wmb()
calls between the increments in the usual code sequence:

write_seqlock(&l);
do_something();
write_sequnlock();

o read_seqbegin() has an smp_rmb() following its read from
->sequence. Ditto for read_seqcount_begin().

o read_seqretry() has an smp_rmb() preceding its read from
->sequence, and ditto for read_seqcount_retry(). There are thus
two smp_wmb() calls between the reads in the usual code sequence:

do {
s = read_seqbegin(&l);
read_something();
} while read_seqretry(&l, s);

So sequence locks should be pretty safe, at least as far as this
vulnerability is concerned. ;-)

Thanx, Paul
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Paul E. McKenney on 11 Jun 2010 18:10

On Fri, Jun 11, 2010 at 02:38:59PM -0700, H. Peter Anvin wrote:
> On 06/11/2010 02:36 PM, Paul E. McKenney wrote:
> > Memory barriers in the sequence-lock code prevent this, assuming, as
> > you point out, that memory clobber works (but if it doesn't, it should
> > be fixed):
>
> The constness is my main concern. It's not clear to me that "memory" is
> meant to imply that const memory areas without volatile can be clobbered.

Ah! I was assuming that gcc treated "memory" as it would an call to
a function in some other compilation unit. In that case, the compiler
could not count on the "const" on the argument, given the possibility
that the called function might gain a reference to the same memory
locations in a non-const manner, right?

Thanx, Paul
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

First | Prev | Next | Last
Pages: 1 2 3
Prev: Btrfs updates for 2.6.35
Next: drivers/net/e1000/e1000_main.c: Fix message logging defect