From: Meindert Sprang on
"John Temples" <usenet(a)xargs-spam.com> wrote in message
news:slrni0rit7.tc1.usenet(a)xargs-spam.com...
> On 2010-06-07, Meindert Sprang <ms(a)NOJUNKcustomORSPAMware.nl> wrote:
> > Apparently the optimizer of C18 is not that good. For
> > instance: LATF = addr >> 16; where addr is an uint32, is compiled into
a
> > loop where 4 registers really get shifted 16 times in a loop.
>
> Here's what Hi-Tech's PIC18 compiler does:
>
> 853 ;t.c: 59: LATF = addr >> 16;
> 854 00FFFA C0FE FF8E movff _addr+2,3982 ;volatile

And that's how I expected it to be!

Meindert


From: Joe Chisolm on
On Tue, 08 Jun 2010 11:24:58 +0200, Meindert Sprang wrote:

> "D Yuniskis" <not.going.to.be(a)seen.com> wrote in message
> news:hujj28$bn5$1(a)speranza.aioe.org...
>> It would be informative to know what sort of "helper routines" the
>> compiler calls on. E.g., it might (inelegantly) treat this as "CALL
>> SHIFT_LONG_RIGHT, repeat" -- in which case the 4 temp access is the
>> canned representation of *any* "long int".
>
> This is the code that does the shift:
>
> 0FCC8 0E10 MOVLW 0x10
> 0FCCA 90D8 BCF 0xfd8, 0, ACCESS
> 0FCCC 3203 RRCF 0x3, F,ACCESS
> 0FCCE 3202 RRCF 0x2, F, ACCESS
> 0FCD0 3201 RRCF 0x1, F, ACCESS
> 0FCD2 3200 RRCF 0, F, ACCESS
> 0FCD4 06E8 DECF 0xfe8, F, ACCESS
> 0FCD6 E1F9 BNZ 0xfcca
>
> The loop is executed 16 times (>>16) and 4 locations are shifted through
> the carry bit, if I undestand this correctly.... yuck!
>
> Meindert

What version of C18 are you using and what is your target device?




--
Joe Chisolm
Marble Falls, Tx.
From: Grant Edwards on
On 2010-06-08, David Brown <david(a)westcontrol.removethisbit.com> wrote:

> Some compilers will use shifts, some will use byte or word movements.
>
> On the ARM, a compiler will often use shifts because shifts (especially
> by constants) are very cheap on the ARM architecture,

When combined with another arithmetic operation, they're free!

> while unaligned and non-32-bit memory accesses may be expensive or
> illegal (depending on the ARM variant).
>
> A quick test with avr-gcc shows that it uses byte register movements
> rather than shifts, although it's not optimal for 32-bit values (it is
> fine for 16-bit values, which are much more common in an 8-bit world).
> For your example below of "((ul& 0xFFFFFF)>> 8)" it is close to perfect.

IIRC gcc for both msp430 and H300 does byte/word operations instead of
shifts as well.

>> ldr r0, [r3, #0]
>> mov r0, r0, asl #8
>> mov r0, r0, lsr #16
>>
>>> If it recognizes the last as wanting just the middle word then that
>>> would be impressive.
>>
>> Recognizing the last two as wanting just the middle word is moot because
>> that 16-bit word is misaligned and can't be accessed using a 16-bit load
>> instruction.
>
> That's very nice code generation - faster (on an ARM anyway) than using
> masking.

Though it does look a bit odd at first glance. ;)

--
Grant Edwards grant.b.edwards Yow! You can't hurt me!!
at I have an ASSUMABLE
gmail.com MORTGAGE!!
From: Grant Edwards on
On 2010-06-08, Meindert Sprang <ms(a)NOJUNKcustomORSPAMware.nl> wrote:
> "D Yuniskis" <not.going.to.be(a)seen.com> wrote in message
> news:hujj28$bn5$1(a)speranza.aioe.org...
>> It would be informative to know what sort of "helper routines"
>> the compiler calls on. E.g., it might (inelegantly) treat this
>> as "CALL SHIFT_LONG_RIGHT, repeat" -- in which case the
>> 4 temp access is the canned representation of *any* "long int".
>
> This is the code that does the shift:
>
> 0FCC8 0E10 MOVLW 0x10
> 0FCCA 90D8 BCF 0xfd8, 0, ACCESS
> 0FCCC 3203 RRCF 0x3, F, ACCESS
> 0FCCE 3202 RRCF 0x2, F, ACCESS
> 0FCD0 3201 RRCF 0x1, F, ACCESS
> 0FCD2 3200 RRCF 0, F, ACCESS
> 0FCD4 06E8 DECF 0xfe8, F, ACCESS
> 0FCD6 E1F9 BNZ 0xfcca
>
> The loop is executed 16 times (>>16) and 4 locations are shifted through the
> carry bit, if I undestand this correctly.... yuck!

In my experience, "yuck!" is what anybody trying to use C on a PIC
ought to expect. [IMO, "yuck!" is what you get using asm on a PIC as
well, but that's probably a little more subjective.]

--
Grant Edwards grant.b.edwards Yow! Loni Anderson's hair
at should be LEGALIZED!!
gmail.com
From: George Neuner on
On Tue, 08 Jun 2010 10:03:58 +0200, David Brown
<david(a)westcontrol.removethisbit.com> wrote:

>On 08/06/2010 04:47, Grant Edwards wrote:
>> On 2010-06-08, George Neuner<gneuner2(a)comcast.net> wrote:
>>> On Mon, 7 Jun 2010 20:18:35 +0000 (UTC), Grant Edwards
>>> <invalid(a)invalid.invalid> wrote:
>>>
>>>> On 2010-06-07, George Neuner<gneuner2(a)comcast.net> wrote:
>>>>
>>>>> I've been programming since 1977 and I have never seen any compiler
>>>>> turn a long word shift (and/or mask) into a corresponding short word
>>>>> or byte access. Every compiler I have ever worked with would perform
>>>>> the shift.
>>>>
>>>> Really?
>>>>
>>>> I've seen quite a few compilers do that. For example, gcc for ARM
>>>> does:
>>>
>
>Some compilers will use shifts, some will use byte or word movements.
>
>On the ARM, a compiler will often use shifts because shifts (especially
>by constants) are very cheap on the ARM architecture, while unaligned
>and non-32-bit memory accesses may be expensive or illegal (depending on
>the ARM variant).
>
>A quick test with avr-gcc shows that it uses byte register movements
>rather than shifts, although it's not optimal for 32-bit values (it is
>fine for 16-bit values, which are much more common in an 8-bit world).
>For your example below of "((ul& 0xFFFFFF)>> 8)" it is close to perfect.
>
>>> Interesting. But now that I think about it, I almost use shift with a
>>> constant count - it's almost always a computed shift - and even when
>>> the shift is constant, the value is often in a variable anyway due to
>>> surrounding processing.
>>>
>>> - What version of GCC is it?
>>
>> 4.4.3
>>
>>> - What does it do if the shift count is a variable?
>>
>> It uses a shift instruction. There's not really anyting else it could
>> do with a variable shift count.
>>
>>> - What does it do for ((ul& 0xFFFFFF)>> 8)
>>
>> ldr r0, [r3, #0]
>> mov r0, r0, asl #8
>> mov r0, r0, lsr #16
>>
>>> or ((ul>> 8)& 0xFFFF)?
>>
>> ldr r0, [r3, #0]
>> mov r0, r0, asl #8
>> mov r0, r0, lsr #16
>>
>>> If it recognizes the last as wanting just the middle word then that
>>> would be impressive.
>>
>> Recognizing the last two as wanting just the middle word is moot because
>> that 16-bit word is misaligned and can't be accessed using a 16-bit load
>> instruction.
>>
>
>That's very nice code generation - faster (on an ARM anyway) than using
>masking.


Yes. It seems that recent versions of GCC do some interesting shift
optimizations ... which is revising upward my opinion of GCC (which
I've only ever considered an adequate compiler).


I've worked with a number of older versions of GCC and with Intel,
Microsoft and Sun compilers over the years. What I would normally
expect to see from a good compiler is:
- if the source value and the shift count can be statically
determined, I expect the compiler to compute the result
and inline it,
- otherwise I expect to see the shift essentially as coded.

Good compilers can often statically determine the values through value
tracking and/or constant propagation, so under high optimization it
isn't unusual to see something like

unsigned short get_middle( unsigned long ul )
{
return ((ul >> 8) & 0xFFFF);
}

:
unsigned short bleh;
unsigned long blah = 0xBABE;
:
blah |= (0xCAFE << 16)
bleh = get_middle( blah )
:

reduce the get_middle() call to a short constant load of 0xFEBA.

But until I saw GCC 4.4 do it (partly: see my other post), I had never
seen a compiler change a shift into a word (or byte) load from memory.
I have, on occasion, seen shifts changed into register bit field
extraction on chips that have such instructions, but never before into
partial loads from memory.

Seems like I have to start paying more attention to GCC.
George