From: Robert Myers on
On May 27, 4:02 pm, Seebs <usenet-nos...(a)seebs.net> wrote:
> On 2010-05-27, James Harris <james.harri...(a)googlemail.com> wrote:
>
> > I'm not going to try and defend him but having seen his posts for some
> > time I don't think he's trolling.
>
> Could be.
>
> > His interests are or have been video
> > processing. He puts a lot of effort into getting the best x86
> > instruction sequences from his Delphi compiler. The rest is mainly
> > communication style.
>
> Could be, but it's a communications style which seems rude to me, and
> the outrage at the idea that a processor might require you to give it
> only semantically valid instructions strikes me as a bad sign.
>

Rudeness on Usenet? What *is* the world coming to?

Main Entry: 1loll
Pronunciation: \ˈläl\
Function: verb
Etymology: Middle English
Date: 14th century
intransitive verb
1 : to hang loosely or laxly : droop
2 : to act or move in a lax, lazy, or indolent manner : lounge
transitive verb
: to let droop or dangle
synonyms see idle
— loll·er \ˈlä-lər\ noun

In any case, computers know what programmers really mean. If not,
they're just being difficult, like a recalcitrant child. A very
stubborn and defiant recalcitrant child.

Robert.

From: MitchAlsup on
On May 27, 11:06 am, "Skybuck Flying" <IntoTheFut...(a)hotmail.com>
wrote:
> Would you rather write:
>
> // 1.
> Z := X shl Y;
>
> // or
>
> // 2.
> if Y < 32 then
> begin
>     Z := X shl Y;
> end else
> begin
>     Z := X;
> end;

Why not:

z = x << (y & 31);

?
From: James Dow Allen on
On May 28, 9:59 am, "Skybuck Flying" <IntoTheFut...(a)hotmail.com>
wrote:
> I was thinking, maybe a "bitcount mod 32" might help ...

From the (useless?) factoids department:

When Intel x86 shifts a 32-bit register with a shift-count
coming from reg, all but the low-order 5 bits are ignored;
i.e. shift count is mod 32.

*However* the Motorola 68xxx uses 6 bits; i.e. shift
count is mod 64. A large count will set the 32-bit
reg to all 0's (or all 1's).

I wrote a Huffman decoder once. I didn't want it to
be just another FAST decoder; I wanted it to be the
fastest decoder possible! Config tested shifts and
took a short-cut when possible.

I am *not* here to defend anal-retentive coding.
("Hello, my name is James and I indulge in silly
pointless micro-optimizations.") But surely everyone
will admit that micro-optimizations are a cheaper
and safer *recreation* than many alternatives. :-)

James
From: Terje Mathisen "terje.mathisen at on
James Dow Allen wrote:
> On May 28, 9:59 am, "Skybuck Flying"<IntoTheFut...(a)hotmail.com>
> wrote:
>> I was thinking, maybe a "bitcount mod 32" might help ...
>
> From the (useless?) factoids department:
>
> When Intel x86 shifts a 32-bit register with a shift-count
> coming from reg, all but the low-order 5 bits are ignored;
> i.e. shift count is mod 32.
>
> *However* the Motorola 68xxx uses 6 bits; i.e. shift
> count is mod 64. A large count will set the 32-bit
> reg to all 0's (or all 1's).
>
> I wrote a Huffman decoder once. I didn't want it to
> be just another FAST decoder; I wanted it to be the
> fastest decoder possible! Config tested shifts and
> took a short-cut when possible.

The fastest Huffman decoders don't use that many shifts, they average
less than one shift_right per decoded token:

My fastest code uses N-bit wide lookups that can return multiple tokens
per iteration, along with a count of the total number of bits used and
the number of tokens copied, i.e. how much the token_count has been
incremented.

It also returns a pointer to the next decoder to use, normally the same.

In the case of zero return tokens, it instead chains to a secondary decoder.

This means that as long as you can blindly decode a bunch of tokens,
i.e. nothing like the "Context-Adaptive" part of h.264 CABAC decoding,
you can do so in a totally branchless manner.

BTW, you also backfill the bit buffer in a branchless manner, and do it
for free by overlapping these operations with the core logic:

// Always try to load the next 8 (or 16/32) bits
bit_buffer |= *data << bits;
bits += 8;

oflow = bits >> 5; // Did we overflow the bit buffer?
bits &= 31;
data += 1-oflow;

>
> I am *not* here to defend anal-retentive coding.
> ("Hello, my name is James and I indulge in silly
> pointless micro-optimizations.") But surely everyone

It is never pointless: I learn something every time.

> will admit that micro-optimizations are a cheaper
> and safer *recreation* than many alternatives. :-)

Absolutely.

I've been helping a guy who crossposted an optimization problem over a
bunch of programming groups (I read it in comp.lang.asm.x86).

His original program took 26 minutes (albeit a number of years ago), he
has a third-party written version which runs in ~5 ms, but sometimes
fails to work (i.e. doesn't find all solutions).

My current code runs in 0.75 ms, and he's still complaining because I
wrote it in C(++) instead of his preferred Pascal. :-)

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"
From: Skybuck Flying on
Ok,

I no longer believe the lookup table is best...

Correct formula's / calculations can probably solve the problem.

As long as the algorithm exists on BitCount = 0.

Good testing still has to be done but it seems ok.

The time went down from 1.32 seconds or so to 1.12 seconds or so...

So that's at least 15% more speed or so, that's nice.

Bye,
Skybuck.