From: Tim Frink on
Hi,

what are typical heuristics for loop unrolling, i.e. which constraints
must be satisfied to have a compiler perform unrolling on a particular
loop?

I assume that the loop must not exceed a particular size to avoid a
code size explosion. But are there any other heuristics?

Regards,
Tim
From: Ian Collins on
Tim Frink wrote:
> Hi,
>
> what are typical heuristics for loop unrolling, i.e. which constraints
> must be satisfied to have a compiler perform unrolling on a particular
> loop?
>
> I assume that the loop must not exceed a particular size to avoid a
> code size explosion. But are there any other heuristics?
>
They will be specific to the compiler and loops don't have to be
completely unrolled. For example a loop of 100 may be expanded to a
loop of 20 blocks of 5 repeats.

Some compilers provide a means for tuning loop unrolling as a
space/performance trade off.

--
Ian Collins.
From: Walter Banks on


Ian Collins wrote:

> > I assume that the loop must not exceed a particular size to avoid a
> > code size explosion. But are there any other heuristics?
> >
> They will be specific to the compiler and loops don't have to be
> completely unrolled. For example a loop of 100 may be expanded to a
> loop of 20 blocks of 5 repeats.
>
> Some compilers provide a means for tuning loop unrolling as a
> space/performance trade off.

We have looked at a lot of customer embedded system code related
to loop unrolling and eventually concluded that best thing that we
could do in our compilers was to compile the code as written.

Application code usually was very clear and almost always right
in their timing choice of straight line vs loop implementation. Rolling
up straight line code and unrolling loops more often than not
interfered with a conscience choice a developer made.



Regards,

--
Walter Banks
Byte Craft Limited
Tel. (519) 888-6911
http://www.bytecraft.com
walter(a)bytecraft.com




From: jebblue on
On Wed, 23 Apr 2008 08:51:41 -0400, Walter Banks wrote:

> We have looked at a lot of customer embedded system code related to loop
> unrolling and eventually concluded that best thing that we could do in
> our compilers was to compile the code as written.
>
> Application code usually was very clear and almost always right in their
> timing choice of straight line vs loop implementation. Rolling up
> straight line code and unrolling loops more often than not interfered
> with a conscience choice a developer made.

Refreshing viewpoint. Maybe like in Java and C# the user could add
comment tags to provide hints, compiler specific, if the compiler used
didn't recognize the tags, they just get ignored so the code remains
compilable across vendor's compilers, but the ones that do recognize them
enables the compiler to attempt to zing the code.

This lets the user become a more active participant, helping guide the
compiler's decisions rather than a less granular command line option
indicating an optimization preference.

--
// This is my opinion.
From: user923005 on
On Apr 23, 2:34 am, Tim Frink <plfr...(a)yahoo.de> wrote:
> Hi,
>
> what are typical heuristics for loop unrolling, i.e. which constraints
> must be satisfied to have a compiler perform unrolling on a particular
> loop?
>
> I assume that the loop must not exceed a particular size to avoid a
> code size explosion. But are there any other heuristics?

Every compiler will be different. You can always unroll it yourself
and compare. E.g.:


void loopcopy(long *to, long *from, long count);
void duffcopy(long *to, long *from, long count);

void loopcopy(long *to, long *from, long count)
{
do
*to = *from++;
while (--count > 0);
return;
}

void duffcopy(long *to, long *from, long count)
{
register n = (count + 7) / 8;
switch (count % 8)
{
case 0:
do
{
*to = *from++;
case 7:
*to = *from++;
case 6:
*to = *from++;
case 5:
*to = *from++;
case 4:
*to = *from++;
case 3:
*to = *from++;
case 2:
*to = *from++;
case 1:
*to = *from++;
}
while (--n > 0);
}
return;
}

#include <stdlib.h>

#define SIZE 4096

long from[SIZE];
long to[SIZE];

long main()
{
long i;

for (i = 0; i < SIZE; i++)
from[i] = rand();
for (i = 0; i < 10000; i++)
loopcopy(to, from, SIZE);
for (i = 0; i < 10000; i++)
duffcopy(to, from, SIZE);
return 0;
}
/*
Profile: Function timing, sorted by time
Date: Wed Apr 16 23:38:59 1997


Program Statistics
------------------
Command line at 1997 Apr 16 23:38: "c:\tmp\Release\Duff"
Total time: 1187.011 millisecond
Time outside of functions: 2.139 millisecond
Call depth: 2
Total functions: 3
Total hits: 20001
Function coverage: 100.0%
Overhead Calculated 9
Overhead Average 9

Module Statistics for duff.exe
------------------------------
Time in module: 1184.872 millisecond
Percent of time in module: 100.0%
Functions in module: 3
Hits in module: 20001
Module function coverage: 100.0%

Func Func+Child Hit
Time % Time % Count Function
---------------------------------------------------------
719.321 60.7 719.321 60.7 10000 _loopcopy (duff.obj)
444.002 37.5 444.002 37.5 10000 _duffcopy (duff.obj)
21.550 1.8 1184.872 100.0 1 _main (duff.obj)

*/