|
Prev: [PATCH] x86: cleanup machine_specific_memory_setup v2
Next: Oops report for the week preceding June 16th, 2008
From: Linus Torvalds on 18 Jun 2008 20:30 On Wed, 18 Jun 2008, Jeremy Fitzhardinge wrote: > > Along the lines of: Hell no. There's a reason we have a special set_wrprotect() thing. We can do it more efficiently on native hardware by just clearing the bit atomically. No need to do the cmpxchg games. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Linus Torvalds on 18 Jun 2008 21:00 On Wed, 18 Jun 2008, Jeremy Fitzhardinge wrote: > > It's not cmpxchg, just xchg. > In other words, is: > > lock btr $_PAGE_BIT_RW, (%rbx) Well, we can actually do it as lock andl $~_PAGE_RW,(%rbx) although we haven't bothered (I've wanted several times to make clear_bit() do that, but have never gotten around to it - mainly because old gcc versions didn't work with __builtin_constant_p() in inline functions - so you have to do the macro from hell) And yes, the "lock andl" should be noticeably faster than the xchgl. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Linus Torvalds on 19 Jun 2008 00:10 On Wed, 18 Jun 2008, Linus Torvalds wrote: > > And yes, the "lock andl" should be noticeably faster than the xchgl. I dunno. Here's a untested (!!) patch that turns constant-bit set/clear_bit ops into byte mask ops (lock orb/andb). It's not exactly pretty. The reason for using the byte versions is that a locked op is serialized in the memory pipeline anyway, so there are no forwarding issues (that could slow down things when we access things with different sizes), and the byte ops are a lot smaller than 32-bit and particularly 64-bit ops (big constants, and the 64-bit ops need the REX prefix byte too). [ Side note: I wonder if we should turn the "test_bit()" C version into a "char *" version too.. It could actually help with alias analysis, since char pointers can alias anything. So it might be the RightThing(tm) to do for multiple reasons. I dunno. It's a separate issue. ] It does actually shrink the kernel image a bit (a couple of hundred bytes on the text segment for my everything-compiled-in image), and while it's totally untested the (admittedly few) code generation points I looked at seemed sane. And "lock orb" should be noticeably faster than "lock bts". If somebody wants to play with it, go wild. I didn't do "change_bit()", because nobody sane uses that thing anyway. I guarantee nothing. And if it breaks, nobody saw me do anything. You can't prove this email wasn't sent by somebody who is good at forging smtp. This does require a gcc that is recent enough for "__builtin_constant_p()" to work in an inline function, but I suspect our kernel requirements are already higher than that. And if you do have an old gcc that is supported, the worst that would happen is that the optimization doesn't trigger. Linus --- include/asm-x86/bitops.h | 27 ++++++++++++++++++++++----- 1 files changed, 22 insertions(+), 5 deletions(-) diff --git a/include/asm-x86/bitops.h b/include/asm-x86/bitops.h index ee4b3ea..c1b7f91 100644 --- a/include/asm-x86/bitops.h +++ b/include/asm-x86/bitops.h @@ -23,11 +23,22 @@ #if __GNUC__ < 4 || (__GNUC__ == 4 && __GNUC_MINOR__ < 1) /* Technically wrong, but this avoids compilation errors on some gcc versions. */ -#define ADDR "=m" (*(volatile long *) addr) +#define BITOP_ADDR(x) "=m" (*(volatile long *) (x)) #else -#define ADDR "+m" (*(volatile long *) addr) +#define BITOP_ADDR(x) "+m" (*(volatile long *) (x)) #endif +#define ADDR BITOP_ADDR(addr) + +/* + * We do the locked ops that don't return the old value as + * a mask operation on a byte. + */ +#define IS_IMMEDIATE(nr) \ + (__builtin_constant_p(nr)) +#define CONST_MASK_ADDR BITOP_ADDR(addr + (nr>>3)) +#define CONST_MASK (1 << (nr & 7)) + /** * set_bit - Atomically set a bit in memory * @nr: the bit to set @@ -43,9 +54,12 @@ * Note that @nr may be almost arbitrarily large; this function is not * restricted to acting on a single-word quantity. */ -static inline void set_bit(int nr, volatile void *addr) +static inline void set_bit(unsigned int nr, volatile void *addr) { - asm volatile(LOCK_PREFIX "bts %1,%0" : ADDR : "Ir" (nr) : "memory"); + if (IS_IMMEDIATE(nr)) + asm volatile(LOCK_PREFIX "orb %1,%0" : CONST_MASK_ADDR : "i" (CONST_MASK) : "memory"); + else + asm volatile(LOCK_PREFIX "bts %1,%0" : ADDR : "Ir" (nr) : "memory"); } /** @@ -74,7 +88,10 @@ static inline void __set_bit(int nr, volatile void *addr) */ static inline void clear_bit(int nr, volatile void *addr) { - asm volatile(LOCK_PREFIX "btr %1,%0" : ADDR : "Ir" (nr)); + if (IS_IMMEDIATE(nr)) + asm volatile(LOCK_PREFIX "andb %1,%0" : CONST_MASK_ADDR : "i" (~CONST_MASK)); + else + asm volatile(LOCK_PREFIX "btr %1,%0" : ADDR : "Ir" (nr)); } /* -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Linus Torvalds on 19 Jun 2008 12:40 On Thu, 19 Jun 2008, Ingo Molnar wrote: > > Below is the commit, it needed a small amount of massaging to apply the > void * -> unsigned long * change in the x86/bitops topic. Well, that's your bug right there. The macros very much depended on the pointers being "void *", due to the pointer arithmetic (which is a gcc extension that we use extensively - "void *" arithmetic works as if it was a byte pointer). When you changed "addr" to "volatile unsigned long *", now the arithmetic ended up multiplying the offset by the size of "long" before adding it. That said, the _original_ patch wasn't tested either, but quite frankly, looking over the patch one more time, I can't really see how it could break. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Linus Torvalds on 20 Jun 2008 15:20 On Fri, 20 Jun 2008, Jeremy Fitzhardinge wrote: > > Blows up on "gcc version 3.4.4 20050314 (prerelease) (Debian 3.4.3-13)": Yeah, I was a bit worried about that. Gcc sometimes does insane things. We literally just tested that the asm should only _ever_ be generated with a constant value, but if some gcc dead-code removal thing doesn't work, it will then screw up and try to generate the asm even for a non-constant thing. The fairly trivial fix is probably to just change the "i" to "ir", safe in the knowledge that any _sane_ case will never use the "r" possibility. I suspect even your insane case will end up then killing the bad choice later. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
|
Next
|
Last
Pages: 1 2 Prev: [PATCH] x86: cleanup machine_specific_memory_setup v2 Next: Oops report for the week preceding June 16th, 2008 |