From: Joe Chisolm on
On Mon, 07 Jun 2010 15:36:02 -0400, George Neuner wrote:

> On Mon, 7 Jun 2010 11:17:34 +0200, "Meindert Sprang"
> <ms(a)NOJUNKcustomORSPAMware.nl> wrote:
>
>>Unbelievable.....
>>
>>I'm playing around with the Microchip C18 compiler after a
>>hair-splitting experience with CCS. Apparently the optimizer of C18 is
>>not that good. For instance: LATF = addr >> 16; where addr is an
>>uint32, is compiled into a loop where 4 registers really get shifted 16
>>times in a loop. Any decent compiler should recognise that a shift by
>>16, stored to an 8 bit port could easily be done by simply accessing the
>>3rd byte.... sheesh....
>>
>>Meindert
>
> You're asking a lot.
>
> I've been programming since 1977 and I have never seen any compiler turn
> a long word shift (and/or mask) into a corresponding short word or byte
> access. Every compiler I have ever worked with would perform the shift.
>
> That said, something is wrong if it takes 4 registers. I don't know the
> PIC18, but I never encountered any chip that required more than 2
> registers to shift a value. Many chips have only a 1-bit shifter and
> require a loop to do larger shifts - but many such chips microcode the
> shift loop so the programmer sees only a simple instruction. But,
> occasionally, you do run into oddballs that need large shifts spelled
> out.
>
> Most likely you're somehow reading the (dis)assembly incorrectly: 4
> temporaries that are really mapped into the same register. If the
> compiler (or chip) really does need 4 registers to do a shift, then it's
> a piece of sh*t.
>
> George

You have a 8 bit architecture shifting a 32 bit value, shifting out of one
byte and into the next, thus 4 temps. You have 1 bit shifts. I suspect
the compiler is generating a right shift into carry so the code can
tell if a 1 needs to be moved into the most significant bit of the next
byte.

--
Joe Chisolm
Marble Falls, Tx.
From: D Yuniskis on
Hi Joe,

Joe Chisolm wrote:
> On Mon, 07 Jun 2010 15:36:02 -0400, George Neuner wrote:
>
>> You're asking a lot.
>>
>> I've been programming since 1977 and I have never seen any compiler turn
>> a long word shift (and/or mask) into a corresponding short word or byte
>> access. Every compiler I have ever worked with would perform the shift.
>>
>> That said, something is wrong if it takes 4 registers. I don't know the
>> PIC18, but I never encountered any chip that required more than 2
>> registers to shift a value. Many chips have only a 1-bit shifter and
>> require a loop to do larger shifts - but many such chips microcode the
>> shift loop so the programmer sees only a simple instruction. But,
>> occasionally, you do run into oddballs that need large shifts spelled
>> out.
>>
>> Most likely you're somehow reading the (dis)assembly incorrectly: 4
>> temporaries that are really mapped into the same register. If the
>> compiler (or chip) really does need 4 registers to do a shift, then it's
>> a piece of sh*t.

It would be informative to know what sort of "helper routines"
the compiler calls on. E.g., it might (inelegantly) treat this
as "CALL SHIFT_LONG_RIGHT, repeat" -- in which case the
4 temp access is the canned representation of *any* "long int".

> You have a 8 bit architecture shifting a 32 bit value, shifting out of one
> byte and into the next, thus 4 temps. You have 1 bit shifts. I suspect
> the compiler is generating a right shift into carry so the code can
> tell if a 1 needs to be moved into the most significant bit of the next
> byte.

I think George is commenting that a *smart* compiler can
realize that an (e.g.) 8 bit shift is:
foo[2] = foo[3]
foo[1] = foo[2]
foo[0] = foo[1]
(if you are casting to a narrower data type and can discard foo[3])

and a *9* bit shift is the same as the above with a *single*
bit shift introduced (i.e., you operate on a byte at a time
instead of the entire "long")

(recall, the shift amount is a constant available at
compile time)
From: Grant Edwards on
On 2010-06-07, George Neuner <gneuner2(a)comcast.net> wrote:

> I've been programming since 1977 and I have never seen any compiler
> turn a long word shift (and/or mask) into a corresponding short word
> or byte access. Every compiler I have ever worked with would perform
> the shift.

Really?

I've seen quite a few compilers do that. For example, gcc for ARM
does:

------------------------------testit.c------------------------------
unsigned long ul;

unsigned char foo(void)
{
return ul>>8;
}

unsigned short bar(void)
{
return ul>>16;
}
------------------------------testit.c------------------------------

$ /home/nextgen/toolchain/bin/arm-linux-gcc -c -Os -S -fomit-frame-pointer testit.c

------------------------------testit.s------------------------------
.arch armv5te
[...]
.file "testit.c"
.text
.align 2
.global foo
.type foo, %function
foo:
ldr r3, .L3
ldrb r0, [r3, #1] @ zero_extendqisi2
bx lr
..L4:
.align 2
..L3:
.word ul
.size foo, .-foo
.align 2
.global bar
.type bar, %function
bar:
ldr r3, .L7
ldrh r0, [r3, #2]
bx lr
..L8:
.align 2
..L7:
.word ul
.size bar, .-bar
.comm ul,4,4
[...]
------------------------------testit.s------------------------------


--
Grant Edwards grant.b.edwards Yow! I'm young ... I'm
at HEALTHY ... I can HIKE
gmail.com THRU CAPT GROGAN'S LUMBAR
REGIONS!
From: David Brown on
D Yuniskis wrote:
> Hi Meindert,
>
> Meindert Sprang wrote:
>> Unbelievable.....
>>
>> I'm playing around with the Microchip C18 compiler after a hair-splitting
>> experience with CCS. Apparently the optimizer of C18 is not that good.
>> For
>> instance: LATF = addr >> 16; where addr is an uint32, is compiled into a
>> loop where 4 registers really get shifted 16 times in a loop. Any decent
>> compiler should recognise that a shift by 16, stored to an 8 bit port
>> could
>> easily be done by simply accessing the 3rd byte.... sheesh....
>
> Is LATF *defined* as a uint8_t? (i.e., does the compiler *know* it
> can discard all but the lowest 8 bits?)
>

That's irrelevant (or should be!) - expressions are evaluated in their
own right, and /then/ cast to the type of the LHS. The compiler should,
as it does, initially treat it as a 32-bit shift, but it's a poor
compiler that can't optimise a 32-bit shift by 16 to something better
than this. Optimising it to a single byte transfer comes logically at a
later stage.

> Is uuint32_t *really* unsigned (and not a cheap hack to "long int")?
> I.e., can the compiler be confused (by the definition) to thinking
> it is signed and opting for a sign-preserving shift?
>

I believe that uint32_t /must/ be an unsigned 32-bit integer. If the
compiler cannot work with such a type, then no such type should exist in
<stdint.h>. A standards-compliant compiler is not allowed to cheat in
that way. Of course, I don't know if Microchip's compiler claims to be
standards compliant...

> How about:
>
> uint8_t pointer;
>
> pointer = (uint8_t *) &addr;
> LATF = pointer[2];
>
> Clumsy, admittedly, but perhaps more obvious what's going on?
> (I would have added that this would be easy for an optimizer
> to reduce to an "addressing operation" but I also would have
> expected your shift to be recognized as an easy optimization!)
From: Joe Chisolm on
On Mon, 07 Jun 2010 12:59:49 -0700, D Yuniskis wrote:

> Hi Joe,
>
> Joe Chisolm wrote:
>> On Mon, 07 Jun 2010 15:36:02 -0400, George Neuner wrote:
>>
>>> You're asking a lot.
>>>
>>> I've been programming since 1977 and I have never seen any compiler
>>> turn a long word shift (and/or mask) into a corresponding short word
>>> or byte access. Every compiler I have ever worked with would perform
>>> the shift.
>>>
>>> That said, something is wrong if it takes 4 registers. I don't know
>>> the PIC18, but I never encountered any chip that required more than 2
>>> registers to shift a value. Many chips have only a 1-bit shifter and
>>> require a loop to do larger shifts - but many such chips microcode the
>>> shift loop so the programmer sees only a simple instruction. But,
>>> occasionally, you do run into oddballs that need large shifts spelled
>>> out.
>>>
>>> Most likely you're somehow reading the (dis)assembly incorrectly: 4
>>> temporaries that are really mapped into the same register. If the
>>> compiler (or chip) really does need 4 registers to do a shift, then
>>> it's a piece of sh*t.
>
> It would be informative to know what sort of "helper routines" the
> compiler calls on. E.g., it might (inelegantly) treat this as "CALL
> SHIFT_LONG_RIGHT, repeat" -- in which case the 4 temp access is the
> canned representation of *any* "long int".
>

I agree with your statement. The C18 suite has some canned libraries like
32 bit division and such. There are other helper routines for doing
delays and such.

>> You have a 8 bit architecture shifting a 32 bit value, shifting out of
>> one byte and into the next, thus 4 temps. You have 1 bit shifts. I
>> suspect the compiler is generating a right shift into carry so the code
>> can tell if a 1 needs to be moved into the most significant bit of the
>> next byte.
>
> I think George is commenting that a *smart* compiler can realize that an
> (e.g.) 8 bit shift is: foo[2] = foo[3]
> foo[1] = foo[2]
> foo[0] = foo[1]
> (if you are casting to a narrower data type and can discard foo[3])
>
> and a *9* bit shift is the same as the above with a *single* bit shift
> introduced (i.e., you operate on a byte at a time instead of the entire
> "long")
>
> (recall, the shift amount is a constant available at compile time)

I just did a test using C18. I choose a 18F86J10 (for no particular
reason other than I remember it has a port F and thus a LATF)

For:
static unsigned long addr;
LATF = addr >> 16;

I get results similar to what you have above. The compiler "shifts"
addr into a 32 bit temp by doing two byte moves and two clear byte
instructions. It then does a 1 byte move into LATF from the temp.
I'm not sure what version the OP is using or what else might be going
on behind the scenes with addr. I agree a compiler should be smarter
but for the price (free) C18 is not bad for smaller projects.

BTW: I did a quick test with gcc 4.4.1 and it does a load, shift 16 and
a store byte.


--
Joe Chisolm
Marble Falls, Tx.