From: Scott Lurndal on
On Fri, Mar 26, 2010 at 10:23:46AM -0700, Linus Torvalds wrote:
>
>
> On Fri, 26 Mar 2010, David Howells wrote:
> >
> > fls(N), ffs(N) and fls64(N) can be optimised on x86/x86_64. Currently they
> > perform checks against N being 0 before invoking the BSR/BSF instruction, or
> > use a CMOV instruction afterwards. Either the check involves a conditional
> > jump which we'd like to avoid, or a CMOV, which we'd also quite like to avoid.
> >
> > Instead, we can make use of the fact that BSR/BSF doesn't modify its output
> > register if its input is 0. By preloading the output with -1 and incrementing
> > the result, we achieve the desired result without the need for a conditional
> > check.
>
> This is totally incorrect.
>
> Where did you find that "doesn't modify its output" thing? It's not true.
> The truth is that the destination is undefined. Just read the dang Intel
> documentation, it's very clearly stated right there.

While this is true for the current (253666-031US) Intel documentation,
the AMD documentation (rev 3.14) for the same instruction states that the
destination register is unchanged (as opposed to Intel's undefined).

I wonder if Intel's EM64 stuff makes this more deterministic, perhaps
David's implementation would work for x86_64 only?

scott
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Linus Torvalds on


On Fri, 26 Mar 2010, Ralf Baechle wrote:
>
> My trusty old 486 book [1] in the remarks about the BSF instruction:
>
> "The documentation on the 80386 and 80486 states that op1 is undefined if
> op2 is 0. In reality the 80386 will leave the value in op1 unchanged.
> The first versions of the 80486 will change op1 to an undefined value.
> Later version again will leave it unchanged."
>
> [1] Die Intel Familie in German language, by Robert Hummel, 1992

Ok, that explains my memory of us having tried this, at least.

But I do wonder if any of the people working for Intel could ask the CPU
architects whether we could depend on the "don't write" for 64-bit mode.
If AMD already documents the don't-touch semantics, and if Intel were to
be ok with documenting it for their 64-bit capable CPU's, we wouldn't then
need to rely on undefined behavior.

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Matthew Wilcox on
On Fri, Mar 26, 2010 at 11:03:09AM -0700, Linus Torvalds wrote:
> On Fri, 26 Mar 2010, Ralf Baechle wrote:
> >
> > My trusty old 486 book [1] in the remarks about the BSF instruction:
> >
> > "The documentation on the 80386 and 80486 states that op1 is undefined if
> > op2 is 0. In reality the 80386 will leave the value in op1 unchanged.
> > The first versions of the 80486 will change op1 to an undefined value.
> > Later version again will leave it unchanged."
> >
> > [1] Die Intel Familie in German language, by Robert Hummel, 1992
>
> Ok, that explains my memory of us having tried this, at least.
>
> But I do wonder if any of the people working for Intel could ask the CPU
> architects whether we could depend on the "don't write" for 64-bit mode.
> If AMD already documents the don't-touch semantics, and if Intel were to
> be ok with documenting it for their 64-bit capable CPU's, we wouldn't then
> need to rely on undefined behavior.

I'll drop one of them a note.

--
Matthew Wilcox Intel Open Source Technology Centre
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours. We can't possibly take such
a retrograde step."
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Matthew Wilcox on
On Fri, Mar 26, 2010 at 11:03:09AM -0700, Linus Torvalds wrote:
> On Fri, 26 Mar 2010, Ralf Baechle wrote:
> > "The documentation on the 80386 and 80486 states that op1 is undefined if
> > op2 is 0. In reality the 80386 will leave the value in op1 unchanged.
> > The first versions of the 80486 will change op1 to an undefined value.
> > Later version again will leave it unchanged."
> >
> > [1] Die Intel Familie in German language, by Robert Hummel, 1992
>
> Ok, that explains my memory of us having tried this, at least.
>
> But I do wonder if any of the people working for Intel could ask the CPU
> architects whether we could depend on the "don't write" for 64-bit mode.
> If AMD already documents the don't-touch semantics, and if Intel were to
> be ok with documenting it for their 64-bit capable CPU's, we wouldn't then
> need to rely on undefined behavior.

I don't know whether we can get it /documented/, but the architect I
asked said "We'll never get away with reverting to the older behavior,
so in essence the architecture is set to not overwrite."

--
Matthew Wilcox Intel Open Source Technology Centre
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours. We can't possibly take such
a retrograde step."
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Jamie Lokier on
Linus Torvalds wrote:
> On Fri, 26 Mar 2010, Scott Lurndal wrote:
> >
> > I wonder if Intel's EM64 stuff makes this more deterministic, perhaps
> > David's implementation would work for x86_64 only?
>
> Limiting it to x86-64 would certainly remove all the worries about all the
> historical x86 clones.
>
> I'd still worry about it for future Intel chips, though. I absolutely
> _detest_ relying on undocumented features - it pretty much always ends up
> biting you eventually. And conditional writeback is actually pretty nasty
> from a microarchitectural standpoint.

On the same subject of relying on undocumented features:

/* If SMP and !X86_PPRO_FENCE. */
#define smp_rmb() barrier()

I've seen documentation, links posted to lkml ages ago, which implies
this is fine on 64-bit for both Intel and AMD.

But it appears to be relying on undocumented behaviour on 32-bit...

Are you sure it is ok? Has anyone from Intel/AMD ever confirmed it is
ok? Has it been tested? Clones?

-- Jamie
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/