From: AJ on
Hi,

I am trying to optimize gcc for power consumption. Technique I am
using is rescheduling instruction based on hamming distance between
two consecutive instructions. Now I want to create a new pass after
register allocation and final jump optimization. I am new to gcc
development and am having trouble finding out how to create a new
pass. Also I understand my scheduler will have to operate on RTL code.
what would be the data structures I would need to create or which ones
could be reused.

How much code from schedule_insns() could be reused. I only need to
change the cost model in it.

Any kind of suggestion from you experts would be very beneficial for
me.

Alex
From: Terje Mathisen on
AJ wrote:
> Hi,
>
> I am trying to optimize gcc for power consumption. Technique I am
> using is rescheduling instruction based on hamming distance between
> two consecutive instructions. Now I want to create a new pass after

That's a very interesting idea, but I'm afraid that it would be totally
useless!

I.e. the actual operations taking place inside a modern cpu core has so
little to do with the actual bit pattern of the opcodes, as to be
(almost?) totally irrelevant.

OTOH it might make sense to compile code that really can save power, by
getting rid of wasted instructions, i.e. missed branches.

One possible way of doing this would be to write code that executes a
little slower, but does so using less branching and/or fewer memory
operations (including instruction load/decode operations).

Terje

--
- <Terje.Mathisen(a)hda.hydro.com>
"almost all programming can be viewed as an exercise in caching"
From: nedbrek on
Hello all,

"Terje Mathisen" <terje.mathisen(a)hda.hydro.com> wrote in message
news:mbidnaRHLf3-RI7UnZ2dnUVZ8qXinZ2d(a)giganews.com...
> AJ wrote:
> That's a very interesting idea, but I'm afraid that it would be totally
> useless!

I agree. Look at your total power consumption. Outside the CPU; you have
displays, mass storage, chipset, etc. Plus, the CPU power is probably
dominated by leakage (constant power consumption - independent of dynamic
activity).

The best way to optimize for power is to optimize for speed. That way, your
program is done sooner, and the CPU can go to sleep faster. Additionally,
you might be able to turn the whole computer off (saving all the "other"
power drains).

HTH,
Ned


From: Morten Reistad on
In article <mbidnaRHLf3-RI7UnZ2dnUVZ8qXinZ2d(a)giganews.com>,
Terje Mathisen <terje.mathisen(a)hda.hydro.com> wrote:
>AJ wrote:
>> Hi,
>>
>> I am trying to optimize gcc for power consumption. Technique I am
>> using is rescheduling instruction based on hamming distance between
>> two consecutive instructions. Now I want to create a new pass after
>
>That's a very interesting idea, but I'm afraid that it would be totally
>useless!
>
>I.e. the actual operations taking place inside a modern cpu core has so
>little to do with the actual bit pattern of the opcodes, as to be
>(almost?) totally irrelevant.
>
>OTOH it might make sense to compile code that really can save power, by
>getting rid of wasted instructions, i.e. missed branches.
>
>One possible way of doing this would be to write code that executes a
>little slower, but does so using less branching and/or fewer memory
>operations (including instruction load/decode operations).

Also, genereal optimisation for space also seems to save power; and
work out very well on the class of machines like the Atom or the
eee-style cpus. These are very sensitive to cache hit ratios, and
small is beautiful. I tested doing a build of OpenBSD with -Os everywhere,
and it had a markedly better feel on one of the slower eees.

And powertop is your friend.

-- mrr

From: Wilco Dijkstra on

"Terje Mathisen" <terje.mathisen(a)hda.hydro.com> wrote in message news:mbidnaRHLf3-RI7UnZ2dnUVZ8qXinZ2d(a)giganews.com...
> AJ wrote:
>> Hi,
>>
>> I am trying to optimize gcc for power consumption. Technique I am
>> using is rescheduling instruction based on hamming distance between
>> two consecutive instructions. Now I want to create a new pass after
>
> That's a very interesting idea, but I'm afraid that it would be totally useless!

It has been done before on ARM many years ago, and the conclusion was
it only gave a few percent gain on the simplest ARM. On more complex CPUs
the gain would be far less. So on a modern x86 it is indeed totally useless.

> I.e. the actual operations taking place inside a modern cpu core has so little to do with the actual bit pattern of
> the opcodes, as to be (almost?) totally irrelevant.
>
> OTOH it might make sense to compile code that really can save power, by getting rid of wasted instructions, i.e.
> missed branches.
>
> One possible way of doing this would be to write code that executes a little slower, but does so using less branching
> and/or fewer memory operations (including instruction load/decode operations).

For branchy code, compiling for space is usually best (Windows is compiled
-Os), for code with lots of loops, compiling for performance is best.

But your choice of CPU, and at what voltage/frequency setting you run
it at will save an order of magnitude more than changing compiler options.

Wilco