From: Noob on
[ NB: CROSS-POSTED to comp.arch, comp.arch.embedded, comp.programming ]
[ Followup-To: set to comp.programming ]

Hello,

I imagine the following problem has been efficiently solved
over a million times in the past.

I need to rotate a picture clockwise 90 degrees.

Conceptually, my picture is represented by a matrix
(W columns, H rows) of pixels, with each pixel
represented by 3 octets.

pixel original_picture[W][H];
pixel rotated_picture[H][W];

At the implementation level, I'm dealing with 1D integer arrays.

int src[W*H*3];
int dest[W*H*3];

Conceptually, the rotation operation comes down to

rotated_picture[j][H-1-i] = original_picture[i][j]

My actual code is

for (i = 0; i < H; ++i)
for (j = 0; j < W; ++j)
memcpy(dest+(H*j+H-1-i)*3, src+(W*i+j)*3, 3);

Consider as an example,
W = 1644
H = 1164
size = 5.74 MB

On my target platform, the rotation takes 1.65 seconds.

I'm trying to get this down under half a second.

I'm using the GNU tool chain. For some weird reason, we
compile everything -O0. The first thing I'll try is crank
gcc's optimization level.

I'm hoping gcc can perform some strength reduction, as the
index calculation seems to be taking a non-negligible fraction
of the total run-time.

Changing the loop to

for (i = 0; i < H; ++i)
for (j = 0; j < W; ++j)
{
memcpy(dest+(H*j+H-i-1)*3, src, 3);
src += 3;
}

brings a 5% improvement.

I thought changing the memcpy call to 3 explicit assignments
might help, but it actually slowed things down.

Perhaps I need to perform some tiling magic... I don't think
gcc performs automatic tiling?

Comments and insight would be greatly appreciated.

Regards.
From: Andrew Smallshaw on
On 2010-04-07, Noob <root(a)127.0.0.1> wrote:
>
> My actual code is
>
> for (i = 0; i < H; ++i)
> for (j = 0; j < W; ++j)
> memcpy(dest+(H*j+H-1-i)*3, src+(W*i+j)*3, 3);

I haven't looked at it in detail but that immediately caught my
eye. Surely:

memcpy(dest+(H*j+H-1-i)*3, src+(W*i+j)*3, 3 * sizeof(int));

--
Andrew Smallshaw
andrews(a)sdf.lonestar.org
From: Terje Mathisen "terje.mathisen at on
Follow-up restored to include a group I read)
Noob wrote:
> [ NB: CROSS-POSTED to comp.arch, comp.arch.embedded, comp.programming ]
> [ Followup-To: set to comp.programming ]
>
> Hello,
>
> I imagine the following problem has been efficiently solved
> over a million times in the past.
>
> I need to rotate a picture clockwise 90 degrees.
>
> Conceptually, my picture is represented by a matrix
> (W columns, H rows) of pixels, with each pixel
> represented by 3 octets.
>
> pixel original_picture[W][H];
> pixel rotated_picture[H][W];
>

This is indeed a well-known problem, the key to make it fast is to
realize that it is very similar to a transpose operation!

I.e. first copy everything to the target array (in a single block move),
then transpose it, finally you reverse each of the rows.

The difference between clock and counter-clock-wise rotation is in doing
the row reverse either before or after the transpose. Try both since I
don't remeber which is which!

The key here is that the transpose operation is quite cache-friendly,
much better than your current naive code.

Terje
--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"
From: nmm1 on
In article <romu87-uk61.ln1(a)ntp.tmsw.no>,
Terje Mathisen <"terje.mathisen at tmsw.no"> wrote:
>Noob wrote:
>>
>> I imagine the following problem has been efficiently solved
>> over a million times in the past.
>>
>> I need to rotate a picture clockwise 90 degrees.
>>
>> Conceptually, my picture is represented by a matrix
>> (W columns, H rows) of pixels, with each pixel
>> represented by 3 octets.
>>
>> pixel original_picture[W][H];
>> pixel rotated_picture[H][W];
>
>This is indeed a well-known problem, the key to make it fast is to
>realize that it is very similar to a transpose operation!
>
>I.e. first copy everything to the target array (in a single block move),
>then transpose it, finally you reverse each of the rows.
>
>The difference between clock and counter-clock-wise rotation is in doing
>the row reverse either before or after the transpose. Try both since I
>don't remeber which is which!
>
>The key here is that the transpose operation is quite cache-friendly,
>much better than your current naive code.

For extra marks, produce a much more cache-friendly version, using
blocks rather than rows ....

You can even do that in place, and make it efficient. But I suspect
that you (Terje) know all of this!


Regards,
Nick Maclaren.
From: Robert Myers on
On Apr 7, 1:10 pm, n...(a)cam.ac.uk wrote:
> In article <romu87-uk61....(a)ntp.tmsw.no>,
> Terje Mathisen  <"terje.mathisen at tmsw.no"> wrote:
>
>
>
>
>
> >Noob wrote:
>
> >> I imagine the following problem has been efficiently solved
> >> over a million times in the past.
>
> >> I need to rotate a picture clockwise 90 degrees.
>
> >> Conceptually, my picture is represented by a matrix
> >> (W columns, H rows) of pixels, with each pixel
> >> represented by 3 octets.
>
> >> pixel original_picture[W][H];
> >> pixel rotated_picture[H][W];
>
> >This is indeed a well-known problem, the key to make it fast is to
> >realize that it is very similar to a transpose operation!
>
> >I.e. first copy everything to the target array (in a single block move),
> >then transpose it, finally you reverse each of the rows.
>
> >The difference between clock and counter-clock-wise rotation is in doing
> >the row reverse either before or after the transpose. Try both since I
> >don't remeber which is which!
>
> >The key here is that the transpose operation is quite cache-friendly,
> >much better than your current naive code.
>
> For extra marks, produce a much more cache-friendly version, using
> blocks rather than rows ....
>
> You can even do that in place, and make it efficient.  But I suspect
> that you (Terje) know all of this!
>
What would happen if you changed the default storage order for arrays
of dimension two and greater, so that they were always stored in
blocks with some cache or node-friendly order? You could make it
nearly transparent in a language like c++ by defining appropriate
objects.

Robert.