From: Terje Mathisen "terje.mathisen at on
nmm1(a)cam.ac.uk wrote:
> In article<0b40dbdb-53c0-4c5c-a19b-e68316f3d9c4(a)p17g2000vbl.googlegroups.com>,
> Larry<lstewart2(a)gmail.com> wrote:
>> By the way, once your applications get to large scale (over 1000
>> cores), problems of synchronization and load balancing start to
>> dominate, and in that regime, I suspect variable speed clocks make the
>> situation worse. Better to turn off cores to save power than to let
>> them run at variable speed.
>
> Oh, gosh, YES! The more I think about tuning parallel codes in a
> variable clock context, the more I think that I don't want to go
> there. And that's independent of whether I have an application or
> an implementor hat on.

But this is already happening!

Current leading-edge power-optimization schemes have to consider exactly
these scenarios, i.e. run one core at slightly higher speed vs two cores
at somewhat lower, merge and gang schedule all interrupt handling onto a
single cpu, so that it can spend as much tima s possible in very
low-power modes, while all the others get to sleep all the time.

Terje
--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"
From: nmm1 on
In article <1il537-kci1.ln1(a)ntp.tmsw.no>,
Terje Mathisen <"terje.mathisen at tmsw.no"> wrote:
>nmm1(a)cam.ac.uk wrote:
>> In article<0b40dbdb-53c0-4c5c-a19b-e68316f3d9c4(a)p17g2000vbl.googlegroups.com>,
>> Larry<lstewart2(a)gmail.com> wrote:
>>> By the way, once your applications get to large scale (over 1000
>>> cores), problems of synchronization and load balancing start to
>>> dominate, and in that regime, I suspect variable speed clocks make the
>>> situation worse. Better to turn off cores to save power than to let
>>> them run at variable speed.
>>
>> Oh, gosh, YES! The more I think about tuning parallel codes in a
>> variable clock context, the more I think that I don't want to go
>> there. And that's independent of whether I have an application or
>> an implementor hat on.
>
>But this is already happening!
>
>Current leading-edge power-optimization schemes have to consider exactly
>these scenarios, i.e. run one core at slightly higher speed vs two cores
>at somewhat lower, merge and gang schedule all interrupt handling onto a
>single cpu, so that it can spend as much tima s possible in very
>low-power modes, while all the others get to sleep all the time.

I am aware of that, but the perpetrators of such designs haven't
thought things through.

That sort of fiendish complexity is incompatible with most existing
collective designs and implementations, and is very probably more
or less incompatible with debugging (and, worse, tuning) any even
remotely efficient ones. Not because it's theoretically impossible,
but because it is too complicated for mere mortals to achieve.

What they are targeting is the existing case of separate, independent,
rarely communicating processes. Fine. But there is no way that
design can be made to work when using parallelism to speed up most
existing (serial) applications. I.e. when you have exhausted the
natural parallelism, you have nowhere to go.

While such usage is the sole province of the HPC people, that doesn't
matter, but it's a catastrophic idea to move 'general purpose' systems
to being even more informatible with HPC and without any plan for
introducing parallelism INTO existing (serial) applications.


Regards,
Nick Maclaren.
From: Andrew Reilly on
On Wed, 27 Jan 2010 12:22:00 +0000, nmm1 wrote:

> That sort of fiendish complexity is incompatible with most existing
> collective designs and implementations, and is very probably more or
> less incompatible with debugging (and, worse, tuning) any even remotely
> efficient ones. Not because it's theoretically impossible, but because
> it is too complicated for mere mortals to achieve.

Aside from the obvious CPU scaling issue just discussed, it seems to me
that another major driver for this kind of thinking is the desire for
"clockless" OSes (or OS modes) to improve efficiency of idle VM
instances. The whole notion of virtualizing processor instances like
that blows notions of clock synchronization of the window, or at least
makes the notion a lot less tractable.

> What they are targeting is the existing case of separate, independent,
> rarely communicating processes. Fine. But there is no way that design
> can be made to work when using parallelism to speed up most existing
> (serial) applications. I.e. when you have exhausted the natural
> parallelism, you have nowhere to go.

Is there anywhere much to go when the natural parallelism has been
exhausted?

> While such usage is the sole province of the HPC people, that doesn't
> matter, but it's a catastrophic idea to move 'general purpose' systems
> to being even more informatible with HPC and without any plan for
> introducing parallelism INTO existing (serial) applications.

I received a semi-spam from Sun this morning, and before binning it my
retinas registered something about doing HPC "in the cloud". It seems
that some HPC folk aren't terribly concerned about tight
synchronization. Or perhaps I'm missing the point?

Cheers,

--
Andrew
From: nmm1 on
In article <7saq5vFefhU1(a)mid.individual.net>,
Andrew Reilly <areilly---(a)bigpond.net.au> wrote:
>
>> That sort of fiendish complexity is incompatible with most existing
>> collective designs and implementations, and is very probably more or
>> less incompatible with debugging (and, worse, tuning) any even remotely
>> efficient ones. Not because it's theoretically impossible, but because
>> it is too complicated for mere mortals to achieve.
>
>Aside from the obvious CPU scaling issue just discussed, it seems to me
>that another major driver for this kind of thinking is the desire for
>"clockless" OSes (or OS modes) to improve efficiency of idle VM
>instances. The whole notion of virtualizing processor instances like
>that blows notions of clock synchronization of the window, or at least
>makes the notion a lot less tractable.

Actually, no, it doesn't. It only does when you are trying to force
parallel applications into the separate independent process model.
Consider a system where a process was made up of LIGHTWEIGHT threads
(i.e. what they were always supposed to be), the system schedules the
process and the application schedules the threads within it.

>> What they are targeting is the existing case of separate, independent,
>> rarely communicating processes. Fine. But there is no way that design
>> can be made to work when using parallelism to speed up most existing
>> (serial) applications. I.e. when you have exhausted the natural
>> parallelism, you have nowhere to go.
>
>Is there anywhere much to go when the natural parallelism has been
>exhausted?

In a great many cases, yes. It's harder, but often feasible.

>> While such usage is the sole province of the HPC people, that doesn't
>> matter, but it's a catastrophic idea to move 'general purpose' systems
>> to being even more informatible with HPC and without any plan for
>> introducing parallelism INTO existing (serial) applications.
>
>I received a semi-spam from Sun this morning, and before binning it my
>retinas registered something about doing HPC "in the cloud". It seems
>that some HPC folk aren't terribly concerned about tight
>synchronization. Or perhaps I'm missing the point?

Yes, and so are they.

"Embarrassingly parallel" or "farmable" applications are not really
HPC, used not to be classified as that, and it has been a stupid
idea to use specialist parallel computers for them for three decades.
The users may use a lot of resources, but that's not the point.

That sort of use (think Monte-Carlo, parameter space search etc.)
works perfectly well "in the cloud", on a roomful of el cheapo
workstations, or whatever. It's been a solved problem since time
immemorial, which is why it used not to be classified as HPC.

Real HPC is about problems that are inherently infeasible without a
lot of computing power, and almost always involved quite a lot of
communication. It includes the case of taking a basically serial
design, and improving its algorithms to run in parallel.


Regards,
Nick Maclaren.
From: Michel Hack on
On Jan 27, 7:34 am, Andrew Reilly <areilly...(a)bigpond.net.au> wrote:

> Aside from the obvious CPU scaling issue just discussed, it seems to me
> that another major driver for this kind of thinking is the desire for
> "clockless" OSes (or OS modes) to improve efficiency of idle VM
> instances.  The whole notion of virtualizing processor instances like
> that blows notions of clock synchronization of the window, or at least
> makes the notion a lot less tractable.

Having been aware of virtualized processors over forty years ago, S/
370
has distinguished CPU time from elapsed time since day 1, and provided
a
TOD clock and clock comparator to permit deadline scheduling, making
the
whole notion of tick-based timekeeping look silly. I would have
expected
other processors to pick up on this at least fifteen years ago...

As a result, reading time, or scheduling work based on CPU time, are
not
affected by virtualization, and if a virtual machine has nothing to
do,
it sits in (virtual) wait state until an external event happens, or
the
clock comparator trips due to the next scheduled event. This consumes
zero cycles; 100% of the cycles are available to run other guests. It
allows z/VM to support thousands of sleeping virtual machines at very
little cost (a few structures in the hypervisor's memory).

Michel.