High-bandwidth computing interest group [Computer Architecture]

Prev: Last Call for Papers Reminder (extended): World Congress on Engineering and Computer Science WCECS 2010
Next: ARM-based desktop computer ? (Hybrid computers ?: Low + High performance ;))

From: Robert Myers on 21 Jul 2010 15:25

jacko wrote:
> On 21 July, 19:13, Robert Myers <rbmyers...(a)gmail.com> wrote:

>>
>> The actual problem -> accurate representation of a nonlinear free field
>> + non-trivial geometry == bureaucrats apparently prefer to pretend that
>> the problem doesn't exist, or at least not to scrutinize too closely
>> what's behind the plausible-looking pictures that come out.
>>
>> Robert.- Hide quoted text -
>>
>> - Show quoted text -
>
> Umm, I think a need for upto cubic fields is resonable in modelling.
> Certain effects do not show in the quadratic or linear approximations.
> This can be done by tripling the variable count, and lots more
> computation, but surely there must be ways.
>
> Quartic modelling may not serve that much of an extra purpose, as a
> cusp catastrophy is within the cubic. Mapping the field to x, and
> performing an inverse map to find applied force can linearize certain
> problems.

I don't want to alienate the computer architects here by turning this
into a forum on computational mathematics.

Maybe you know something about nonlinear equations that I don't. If you
know enough, maybe you want to look into the million dollar prize on
offer from the Clay Institute for answering some very fundamental
questions about either the Navier-Stokes or the Euler equations.

Truncation of the hierarchy of equations for turbulence by assuming that
the fourth cumulant is zero leads to unphysical results, like negative
energies in the spectral energy distribution. I'm a tad muddy on the
actual history now, but I knew that result decades ago.

There is, as far as I know, no ab initio or even natural truncation of
the infinite hiearchy of conserved quantities that isn't problematical.
There are various hacks that work--sort of. Every single plot that
you see that purports to represent the calculation of a fluid flow at a
reasonable Reynolds number depends on some kind of hack.

For the Navier-Stokes equations, nature provides a natural cut-off scale
in length, the turbulent dissipation scale, and ab initio calculations
at interesting turbulent Reynolds numbers do exist up to Re~10,000.

As I've tried (unsuccessfully) to explain here, the interaction between
the longest and shortest scales in a problem that is more than weakly
non-linear (problems for which expansions in the linear free-field
propagator do not converge) is not some arcane mathematical nit, but
absolutely fundamental to the understanding of lots of questions that
one would really like the answer to.

Even if people continue to build careers based on calculations that
blithely ignore a fundamental reality of the governing equations, and
even if Al Gore could go through another ten reincarnations without
understanding what I'm talking about, the reality won't go away because
the computers to address it are inconveniently expensive.

Robert.

From: jacko on 21 Jul 2010 16:03

Navier-Stokes is one of the hardest, and most useful to model.

My limit went to phrasing the Re expression as an inequality to a
constant, and applying the initial steps of Uncertain Geometry to make
a possible strong 'turbulance as uncertainty' idea.
http://sites.google.com/site/jackokring for uncertain geometry.

But yes more emphisis should be placed on nonlinear fluid modelling as
a test benchmark of GPU style arrays.

From: George Neuner on 21 Jul 2010 18:18

On Tue, 20 Jul 2010 15:41:13 +0100 (BST), nmm1(a)cam.ac.uk wrote:

>In article <04cb46947eo6mur14842fqj45pvrqp61l1(a)4ax.com>,
>George Neuner <gneuner2(a)comcast.net> wrote:
>>
>>ISTM bandwidth was the whole point behind pipelined vector processors
>>in the older supercomputers. ...
>> ... the staging data movement provided a lot of opportunity to
>>overlap with real computation.
>>
>>YMMV, but I think pipeline vector units need to make a comeback.
>
>NO chance! It's completely infeasible - they were dropped because
>the vendors couldn't make them for affordable amounts of money any
>longer.

Hi Nick,

Actually I'm a bit skeptical of the cost argument ... obviously it's
not feasible to make large banks of vector registers fast enough for
multiple GHz FPUs to fight over, but what about a vector FPU with a
few dedicated registers?

There are a number of (relatively) low cost DSPs in the up to ~300MHz
range that have large (32KB and up, 4K double floats) 1ns dual ported
SRAM, are able to sustain 1 or more flops/SRAM cycle, and which match
or exceed the sustainable FP performance of much faster CPUs. Some of
these DSPs are $5-$10 in industrial quantities and some even are cheap
in hobby quantities.

Given the economics of mass production, it would seem that creating
some kind of vector coprocessor combining FPU, address units and a few
banks of SRAM with host DMA access should be relatively cheap if the
FPU is kept in under 500MHz.

Obviously, it could not have the peak performance of the GHz host FPU,
but a suitable problem could easily keep several such processors
working. Cray's were a b*tch, but when the problem suited them ...
With several vector coprocessors on a plug-in board, this isn't very
different from the GPU model other than having more flexibility in
staging data.

The other issue is this: what exactly are we talking about in this
thread ... are we trying to have the fastest FPUs possible or do we
want a low cost machine with (very|extremely) high throughput?

No doubt I've overlooked something (or many things 8) pertaining to
economics or politics or programming - I don't think there is any
question that there are plenty of problems (or subproblems) suitable
for solving on vector machines. So please feel free to enlighten me.

George

From: Alex McDonald on 21 Jul 2010 18:46

On 20 July, 22:31, "David L. Craig" <dlc....(a)gmail.com> wrote:
> On Jul 20, 2:49 pm, Robert Myers <rbmyers...(a)gmail.com> wrote:
>

>
> > I doubt if mass-market x86 hypervisors ever crossed the
> > imagination at IBM, even as the barbarians were at the
> > gates.
>
> You'd be wrong. A lot of IBMers and customer VMers were
> watching what Intel was going to do with the 80386 next
> generations to support machine virtualization. While
> Intel claimed it was coming, by mainframe standards, they
> showed they just weren't serious. Not only can x86 not
> fully virtualize itself, it has known design flaws that
> can be exploited to compromise the integrity of its
> guests and the hypervisor. That it is used widely as a
> consolidation platform boggles the minds of those in the
> know. We're waiting for the eventual big stories.
>

Can you be more explicit on this? I understand the lack of complete
virtualization is an issue with the x86, but I'm fascinated by your
claim of exploitable design flaws; what are they?

From: jacko on 21 Jul 2010 19:31

> No doubt I've overlooked something (or many things 8) pertaining to
> economics or politics or programming - I don't think there is any
> question that there are plenty of problems (or subproblems) suitable
> for solving on vector machines. So please feel free to enlighten me.

I think it's that FPU speed is not the bottleneck at present. It's
keeping it fed with data, and shifting it arround memory in suitable
ordered patterns. Maybe not fetching data as a linear cacheline unit,
but maybe a generic step n (not just powers of 2) as a generic scatter/
gather, with n changable on the virtual cache line before a save say.

Maybe it's about what an address is and can it specify process to
smart memories on read and write.

It's definatly about reducing latency when this is possible or how
this may be possible.

And it's about cache structures which may help in any or all of the
above, by preventing an onset of thrashing.

SIMD is part of this, as the program size drops. But even vector units
have to be kept fed with data.