From: Linus Torvalds on


On Fri, 26 Mar 2010, David Howells wrote:
>
> fls(N), ffs(N) and fls64(N) can be optimised on x86/x86_64. Currently they
> perform checks against N being 0 before invoking the BSR/BSF instruction, or
> use a CMOV instruction afterwards. Either the check involves a conditional
> jump which we'd like to avoid, or a CMOV, which we'd also quite like to avoid.
>
> Instead, we can make use of the fact that BSR/BSF doesn't modify its output
> register if its input is 0. By preloading the output with -1 and incrementing
> the result, we achieve the desired result without the need for a conditional
> check.

This is totally incorrect.

Where did you find that "doesn't modify its output" thing? It's not true.
The truth is that the destination is undefined. Just read the dang Intel
documentation, it's very clearly stated right there.

If you can show otherwise, feel free. But I'm pretty sure there are
actually x86 chips out there that _do_ modify the destination. I have a
pretty strong memory of us trying this at some point, and it not working.

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: David Howells on
Linus Torvalds <torvalds(a)linux-foundation.org> wrote:

> Where did you find that "doesn't modify its output" thing? It's not true.
> The truth is that the destination is undefined. Just read the dang Intel
> documentation, it's very clearly stated right there.

Hmmm... My ancient Borland Assembler dead-tree manual doesn't mention that.

David
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Linus Torvalds on


On Fri, 26 Mar 2010, Scott Lurndal wrote:
>
> I wonder if Intel's EM64 stuff makes this more deterministic, perhaps
> David's implementation would work for x86_64 only?

Limiting it to x86-64 would certainly remove all the worries about all the
historical x86 clones.

I'd still worry about it for future Intel chips, though. I absolutely
_detest_ relying on undocumented features - it pretty much always ends up
biting you eventually. And conditional writeback is actually pretty nasty
from a microarchitectural standpoint.

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Ralf Baechle on
On Fri, Mar 26, 2010 at 10:45:05AM -0700, Linus Torvalds wrote:

> I went back and checked the old Intel 386 docs from -92 or something, and
> it was "undefined" in there too. So at least Intel seems to have been very
> consistent on this.
>
> That said, maybe all implementations actually do the "don't touch" thing.
>
> But I do have this memory of us doing this ten+ years ago, though, and
> having to check the ZF after all. Which is why I reacted to the patch in
> the first place and checked the documentation.

My trusty old 486 book [1] in the remarks about the BSF instruction:

"The documentation on the 80386 and 80486 states that op1 is undefined if
op2 is 0. In reality the 80386 will leave the value in op1 unchanged.
The first versions of the 80486 will change op1 to an undefined value.
Later version again will leave it unchanged."

[1] Die Intel Familie in German language, by Robert Hummel, 1992
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Matthew Wilcox on
On Fri, Mar 26, 2010 at 05:42:05PM +0000, David Howells wrote:
> Linus Torvalds <torvalds(a)linux-foundation.org> wrote:
>
> > Where did you find that "doesn't modify its output" thing? It's not true.
> > The truth is that the destination is undefined. Just read the dang Intel
> > documentation, it's very clearly stated right there.
>
> Hmmm... My ancient Borland Assembler dead-tree manual doesn't mention that.

The only x86 manuals I have are encased in the Itanium SDM from 2002,
and they agree with Linus that BSF and BSR undefine the contents of the
destination operand if the source operand contains zero.

--
Matthew Wilcox Intel Open Source Technology Centre
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours. We can't possibly take such
a retrograde step."
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/