From: Borislav Petkov on
On Wed, Feb 03, 2010 at 07:42:51AM -0800, Andrew Morton wrote:
> We didn't deal with it on every architecture, which is something which
> the compiler extension takes care of.
>
> In fact I can't find anywhere where we dealt with it on x86.

Yeah, we talked briefly about using hardware popcnt, see thread
beginning at

http://linux.derkeiler.com/Mailing-Lists/Kernel/2009-06/msg00245.html

for example. I did an ftrace of the cpumask_weight() calls in sched.c to
see whether there would be a measurable performance gain but it didn't
seem so at the time. My numbers said something like ca. 170 hweight
calls per second and since the <lib/hweight.c> implementations roughly
translate to something like ~20 isns (hweight64 to about ~30), the whole
thing wasn't worth the trouble considering checking binutils versions
and slapping opcodes or using gcc intrinsics which involves gcc version
checking.

An alternatives solution which is based on CPUID flag could add the
popcnt opcode without checking any toolchain versions but how is the
replaced instruction going to look like? Something like

alternative("call hweightXX", "popcnt", X86_FEATURE_POPCNT)

by making sure the arg is in some register first?

Hmm..

--
Regards/Gruss,
Boris.

--
Advanced Micro Devices, Inc.
Operating Systems Research Center
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Peter Zijlstra on
On Wed, 2010-02-03 at 19:14 +0100, Borislav Petkov wrote:

> alternative("call hweightXX", "popcnt", X86_FEATURE_POPCNT)

Make sure to apply a 0xff bitmask to the popcnt r16 call for hweight8(),
and hweight64() needs a bit of magic for 32bit, but yes, something like
that ought to work nicely.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Peter Zijlstra on
On Thu, 2010-02-04 at 16:10 +0100, Borislav Petkov wrote:
> On Wed, Feb 03, 2010 at 11:49:54AM -0800, H. Peter Anvin wrote:
> > On 02/03/2010 10:47 AM, Peter Zijlstra wrote:
> > > On Wed, 2010-02-03 at 19:14 +0100, Borislav Petkov wrote:
> > >
> > >> alternative("call hweightXX", "popcnt", X86_FEATURE_POPCNT)
> > >
> > > Make sure to apply a 0xff bitmask to the popcnt r16 call for hweight8(),
> > > and hweight64() needs a bit of magic for 32bit, but yes, something like
> > > that ought to work nicely.
> > >
> >
> > Arguably the "best" option is to have the alternative being a jump to an
> > out-of-line stub which does the necessary parameter marshalling before
> > calling a stub. This technique is already used in a few other places.
>
> Ok, here's a first alpha prototype and completely untested. The asm
> output looks ok though. I've added separate 32-bit and 64-bit helpers in
> order to dispense with the if-else tests. The hw-popcnt versions are the
> opcodes for "popcnt %eax, %eax" and "popcnt %rax, %rax", respectively,
> so %rAX has to be preloaded with the bitmask and the computed value
> has to be retrieved from there afterwards. And yes, it looks not that
> elegant so I'm open for suggestions.
>
> The good thing is, this should work on any toolchain since we don't rely
> on the compiler to know about popcnt and we're protected by CPUID flag
> so that the hw-popcnt version is used only on processors which support
> it.
>
> Please take a good look and let me know what do you guys think.

> +int arch_hweight_long(unsigned long w)
> +{
> + if (sizeof(w) == 4) {
> + asm volatile("movl %[w], %%eax" :: [w] "r" (w));
> + alternative("call _hweight32",
> + "call _popcnt32",
> + X86_FEATURE_POPCNT);
> + asm volatile("" : "=a" (w));
> +
> + } else {
> + asm volatile("movq %[w], %%rax" :: [w] "r" (w));
> + alternative("call _hweight64",
> + "call _popcnt64",
> + X86_FEATURE_POPCNT);
> + asm volatile("" : "=a" (w));
> + }
> + return w;
> +}

hweight_long() isn't an arch primitive, only __arch_hweight{8,16,32,64}
are.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Borislav Petkov on
On Wed, Feb 03, 2010 at 11:49:54AM -0800, H. Peter Anvin wrote:
> On 02/03/2010 10:47 AM, Peter Zijlstra wrote:
> > On Wed, 2010-02-03 at 19:14 +0100, Borislav Petkov wrote:
> >
> >> alternative("call hweightXX", "popcnt", X86_FEATURE_POPCNT)
> >
> > Make sure to apply a 0xff bitmask to the popcnt r16 call for hweight8(),
> > and hweight64() needs a bit of magic for 32bit, but yes, something like
> > that ought to work nicely.
> >
>
> Arguably the "best" option is to have the alternative being a jump to an
> out-of-line stub which does the necessary parameter marshalling before
> calling a stub. This technique is already used in a few other places.

Ok, here's a first alpha prototype and completely untested. The asm
output looks ok though. I've added separate 32-bit and 64-bit helpers in
order to dispense with the if-else tests. The hw-popcnt versions are the
opcodes for "popcnt %eax, %eax" and "popcnt %rax, %rax", respectively,
so %rAX has to be preloaded with the bitmask and the computed value
has to be retrieved from there afterwards. And yes, it looks not that
elegant so I'm open for suggestions.

The good thing is, this should work on any toolchain since we don't rely
on the compiler to know about popcnt and we're protected by CPUID flag
so that the hw-popcnt version is used only on processors which support
it.

Please take a good look and let me know what do you guys think.

Thanks.

--
arch/x86/include/asm/bitops.h | 4 ++
arch/x86/lib/Makefile | 2 +-
arch/x86/lib/popcnt.c | 62 +++++++++++++++++++++++++++++++++++++++++
3 files changed, 67 insertions(+), 1 deletions(-)
create mode 100644 arch/x86/lib/popcnt.c

diff --git a/arch/x86/include/asm/bitops.h b/arch/x86/include/asm/bitops.h
index 02b47a6..deb5013 100644
--- a/arch/x86/include/asm/bitops.h
+++ b/arch/x86/include/asm/bitops.h
@@ -434,6 +434,10 @@ static inline int fls(int x)
#endif
return r + 1;
}
+
+
+extern int arch_hweight_long(unsigned long);
+
#endif /* __KERNEL__ */

#undef ADDR
diff --git a/arch/x86/lib/Makefile b/arch/x86/lib/Makefile
index cffd754..c03fe2d 100644
--- a/arch/x86/lib/Makefile
+++ b/arch/x86/lib/Makefile
@@ -22,7 +22,7 @@ lib-y += usercopy_$(BITS).o getuser.o putuser.o
lib-y += memcpy_$(BITS).o
lib-$(CONFIG_KPROBES) += insn.o inat.o

-obj-y += msr.o msr-reg.o msr-reg-export.o
+obj-y += msr.o msr-reg.o msr-reg-export.o popcnt.o

ifeq ($(CONFIG_X86_32),y)
obj-y += atomic64_32.o
diff --git a/arch/x86/lib/popcnt.c b/arch/x86/lib/popcnt.c
new file mode 100644
index 0000000..179a6e8
--- /dev/null
+++ b/arch/x86/lib/popcnt.c
@@ -0,0 +1,62 @@
+#include <linux/kernel.h>
+#include <linux/bitops.h>
+
+int _hweight32(void)
+{
+ unsigned long w;
+
+ asm volatile("" : "=a" (w));
+
+ return hweight32(w);
+}
+
+int _hweight64(void)
+{
+ unsigned long w;
+
+ asm volatile("" : "=a" (w));
+
+ return hweight64(w);
+}
+
+int _popcnt32(void)
+{
+
+ unsigned long w;
+
+ asm volatile(".byte 0xf3\n\t.byte 0x0f\n\t.byte 0xb8\n\t.byte 0xc0\n\t"
+ : "=a" (w));
+
+ return w;
+}
+
+int _popcnt64(void)
+{
+
+ unsigned long w;
+
+ asm volatile(".byte 0xf3\n\t.byte 0x48\n\t.byte 0x0f\n\t."
+ "byte 0xb8\n\t.byte 0xc0\n\t"
+ : "=a" (w));
+
+ return w;
+}
+
+int arch_hweight_long(unsigned long w)
+{
+ if (sizeof(w) == 4) {
+ asm volatile("movl %[w], %%eax" :: [w] "r" (w));
+ alternative("call _hweight32",
+ "call _popcnt32",
+ X86_FEATURE_POPCNT);
+ asm volatile("" : "=a" (w));
+
+ } else {
+ asm volatile("movq %[w], %%rax" :: [w] "r" (w));
+ alternative("call _hweight64",
+ "call _popcnt64",
+ X86_FEATURE_POPCNT);
+ asm volatile("" : "=a" (w));
+ }
+ return w;
+}
--
1.6.6

--
Regards/Gruss,
Boris.

--
Advanced Micro Devices, Inc.
Operating Systems Research Center
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Borislav Petkov on
On Thu, Feb 04, 2010 at 04:13:52PM +0100, Peter Zijlstra wrote:
> hweight_long() isn't an arch primitive, only __arch_hweight{8,16,32,64}
> are.

Yeah, I'm still looking for the proper location. hweight_long() is the
generic version so do we want to do the

#ifndef __HAVE_ARCH_POPCNT
static inline unsigned long hweight_long(unsigned long w)
....

#endif

thing and define a x86-specific version?

--
Regards/Gruss,
Boris.

--
Advanced Micro Devices, Inc.
Operating Systems Research Center
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/