introduce sys_membarrier(): process-wide memory barrier (v9) [Kernel]

Prev: drivers: isdn: get rid of custom strtoul()
Next: KVM: x86: Kick VCPU outside PIC lock again

From: Nick Piggin on 24 Feb 2010 04:20

On Mon, Feb 22, 2010 at 04:23:21PM -0500, Mathieu Desnoyers wrote:
> * Chris Friesen (cfriesen(a)nortel.com) wrote:
> > On 02/12/2010 04:46 PM, Mathieu Desnoyers wrote:
> >
> > > Editorial question:
> > >
> > > This synchronization only takes care of threads using the current process memory
> > > map. It should not be used to synchronize accesses performed on memory maps
> > > shared between different processes. Is that a limitation we can live with ?
> >
> > It makes sense for an initial version. It would be unfortunate if this
> > were a permanent limitation, since using separate processes with
> > explicit shared memory is a useful way to mitigate memory trampler issues.
> >
> > If we were going to allow that, it might make sense to add an address
> > range such that only those processes which have mapped that range would
> > execute the barrier. Come to think of it, it might be possible to use
> > this somehow to avoid having to execute the barrier on *all* threads
> > within a process.
>
> The extensible system call mandatory and optional flags will allow this kind of
> improvement later on if this appears to be needed. It will also allow user-space
> to detect if later kernels support these new features or not. But meanwhile I
> think it's good to start with this implementation that covers 99.99% of
> use-cases I can currently think of (ok, well, maybe I'm just unimaginative) ;)

It's a good point, I think having at least the ability to do
process-shared or process-private in the first version of the API might
be a good idea. That matches glibc's synchronisation routines so it
would probably be a desirable feature even if you don't implement it in
your library initially.

When writing multiprocessor scalable software, threads should often be
avoided. They share so much state that it is easy to run into
scalability issues in the kernel. So yes it would be really nice to
have userspace RCU available in a process-shared mode.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Nick Piggin on 25 Feb 2010 00:30

On Wed, Feb 24, 2010 at 09:29:46AM -0800, Darren Hart wrote:
> Nick Piggin wrote:
>
> >When writing multiprocessor scalable software, threads should often be
> >avoided. They share so much state that it is easy to run into
> >scalability issues in the kernel. So yes it would be really nice to
> >have userspace RCU available in a process-shared mode.
>
> A bit off topic, but I'm interested in what you feel some of these
> scalability issues are. Is it mostly bouncing this shared context
> from one CPU to the next and the related cache effects, or is there
> something more you are referring to?

Just in general shared state is almost always going to be more costly in
SMP than non-shared.

From VM to files and fs state to signals and timers and process
accounting. And this also carries up to libc, and critical user code
like the heap allocator.

Linux is usually pretty good, a lot due to RCU, but there are still
contention points.

Andrew had investigated this a lot (in relation to samba) and had a good
talk on it, but the slides don't really do it justice.
http://www.samba.org/~tridge/talks/threads.pdf

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Nick Piggin on 25 Feb 2010 00:40

On Wed, Feb 24, 2010 at 10:22:52AM -0500, Mathieu Desnoyers wrote:
> * Nick Piggin (npiggin(a)suse.de) wrote:
> > On Mon, Feb 22, 2010 at 04:23:21PM -0500, Mathieu Desnoyers wrote:
> > > * Chris Friesen (cfriesen(a)nortel.com) wrote:
> > > > On 02/12/2010 04:46 PM, Mathieu Desnoyers wrote:
> > > >
> > > > > Editorial question:
> > > > >
> > > > > This synchronization only takes care of threads using the current process memory
> > > > > map. It should not be used to synchronize accesses performed on memory maps
> > > > > shared between different processes. Is that a limitation we can live with ?
> > > >
> > > > It makes sense for an initial version. It would be unfortunate if this
> > > > were a permanent limitation, since using separate processes with
> > > > explicit shared memory is a useful way to mitigate memory trampler issues.
> > > >
> > > > If we were going to allow that, it might make sense to add an address
> > > > range such that only those processes which have mapped that range would
> > > > execute the barrier. Come to think of it, it might be possible to use
> > > > this somehow to avoid having to execute the barrier on *all* threads
> > > > within a process.
> > >
> > > The extensible system call mandatory and optional flags will allow this kind of
> > > improvement later on if this appears to be needed. It will also allow user-space
> > > to detect if later kernels support these new features or not. But meanwhile I
> > > think it's good to start with this implementation that covers 99.99% of
> > > use-cases I can currently think of (ok, well, maybe I'm just unimaginative) ;)
> >
> > It's a good point, I think having at least the ability to do
> > process-shared or process-private in the first version of the API might
> > be a good idea. That matches glibc's synchronisation routines so it
> > would probably be a desirable feature even if you don't implement it in
> > your library initially.
>
> I am tempted to say that we should probably wait for users of this API feature
> to manifest themselves before we go on and implement it. This will ensure that
> we don't end up maintaining an unused feature and this provides a minimum
> testability. For now, returning -EINVAL seems like an appropriate response for
> this system call feature.

It would be very trivial compared to the process-private case. Just IPI
all CPUs. It would allow older kernels to work with newer process based
apps as they get implemented. But... not a really big deal I suppose.

> As I said above, given the exensible nature of the sys_membarrier flags, we can
> assign a MEMBARRIER_SHARED_MEM or something like that to a mandatory flag bit
> later on. So when userspace start using this flag on old kernels that do not
> support it, -EINVAL will be returned, and then the application will know it must
> use a fallback. So, basically, we don't even need to define this flag now.
>
> >
> > When writing multiprocessor scalable software, threads should often be
> > avoided. They share so much state that it is easy to run into
> > scalability issues in the kernel. So yes it would be really nice to
> > have userspace RCU available in a process-shared mode.
> >
>
> Agreed, although some major modifications would also be needed in the userspace
> RCU library to do that, because it currently rely on being able to access other
> thread's TLS.

OK. It would be a good feature to keep in mind, I believe.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Steven Rostedt on 25 Feb 2010 12:30

On Thu, 2010-02-25 at 11:53 -0500, Mathieu Desnoyers wrote:

> > It would be very trivial compared to the process-private case. Just IPI
> > all CPUs. It would allow older kernels to work with newer process based
> > apps as they get implemented. But... not a really big deal I suppose.
>
> This is actually what I did in v1 of the patch, but this implementation met
> resistance from the RT people, who were concerned about the impact on RT tasks
> of a lower priority process doing lots of sys_membarrier() calls. So if we want
> to do other-process-aware sys_membarrier(), we would have to iterate on all
> cpus, for every running process shared memory maps and see if there is something
> shared with all shm of the current process. This is clearly not as trivial as
> just broadcasting the IPI to all cpus.

Right, it may require another syscall or parameter to let the tasks
register a shared page. Then have some mechanism to find a way to
quickly check if a CPU is running a process with that page.

-- Steve

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Steven Rostedt on 25 Feb 2010 13:10

On Thu, 2010-02-25 at 12:51 -0500, Mathieu Desnoyers wrote:

> But... either way we chose, we can extend the system call flags and parameters
> as needed, so I think it really should not be part of this initial
> implementation.

I agree here too.

If you have two different tasks doing lockless RCU or what not on shared
memory, it's best to stick with the mb() on the reader side. Yeah, it
makes the performance go down, but heck, I'm really worried about the
crazy complexity that wound need to go into the kernel to prevent this.

-- Steve

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

| Next | Last
Pages: 1 2 3 4
Prev: drivers: isdn: get rid of custom strtoul()
Next: KVM: x86: Kick VCPU outside PIC lock again