From: Robert Myers on
On Jul 19, 11:54 am, Thomas Womack <twom...(a)chiark.greenend.org.uk>
wrote:
> In article <QSK0o.10246$Zp1.7...(a)newsfe15.iad>,
> Robert Myers  <rbmyers...(a)gmail.com> wrote:
>
> >I have lamented, at length, the proliferation of flops at the expense of
> >bytes-per-flop in what are now currently styled as supercomputers.
>
> >This subject came up recently on the Fedora User's Mailing List when
> >someone claimed that GPU's are just what the doctor ordered to make
> >high-end computation pervasively available.  Even I have fallen into
> >that trap, in this forum, and I was quickly corrected.  In the most
> >general circumstance, GPU's seem practically to have been invented to
> >expose bandwidth starvation.
>
> Yes, they've got a very low peak bandwidth:peak flops ratio; but the
> peak bandwidth is reasonably high in absolute terms - the geforce 480
> peak bandwidth is about that of a Cray T916.
>
> (the chip has about 2000 balls on the bottom, 384 of which are memory
> I/O running at 4GHz)
>
> I don't think it makes sense to complain about low bw:flops ratios;
> you could always make the ratio higher by removing ALUs, getting you a
> machine which is less capable at the many jobs that can be made to
> need flops but not bytes.
>

It doesn't make much sense, as I have repeatedly been reminded, simply
to complain, if that's all you ever do.

If nothing else, I've been on a one-man crusade to stop the
misrepresentation of current "supercomputers." The designs are *not*
scalable, except with respect to a set of problems that are
embarrassingly parallel in the global sense, or so close to
embarrassingly parallel that the wimpy global bandwidth that's
available is not a serious handicap.

If you can't examine the most interesting questions about the
interaction between the largest and smallest scales the machine can
represent without making indefensible mathematical leaps, then why
bother building the machine at all? Because there are a bunch of
almost embarrassingly parallel problems that you *can* do?

I don't think we're ever going to agree on this. Your ongoing
annoyance has been noted. I'd like to explore what can and what
cannot be done so that everyone understands the consequences of the
decisions being made about computational frontiers that, from the way
we are going now, will never be explored.

Maybe we've reached a brick wall. If so, I'm mostly the only one
talking about it, and I'd like to broaden the discussion without
annoying people who don't want to hear it.

Robert.
From: David L. Craig on
I am new to comp.arch and so am unclear of the pertinent history of
this
discussion, so please bear with me and don't take any offense at
anything
I say, as that is quite unintended.

Is the floating point bandwidth issue only being applied to one
architecture;
e.g., x86? If so, why? Is this not a problem with other designs?
Also, why
single out floating point bandwidth? For instance, what about the
maximum
number of parallel RAM acceses architectures can support, which has
major
impacts on balancing cores' use with I/Os use?

If everyone thinks a different group is called for, that's fine with
me. I just
want to understand the reasons this type of discussion doesn't fit
here.
From: Robert Myers on
David L. Craig wrote:

> I am new to comp.arch and so am unclear of the pertinent history of
> this
> discussion, so please bear with me and don't take any offense at
> anything
> I say, as that is quite unintended.
>
> Is the floating point bandwidth issue only being applied to one
> architecture;
> e.g., x86? If so, why? Is this not a problem with other designs?

Some of my harshest criticism has been aimed at computers built around
the Power architecture, one of which briefly owned the top spot on the
Top 500 list. The problem is not peculiar to any ISA.

> Also, why
> single out floating point bandwidth? For instance, what about the
> maximum
> number of parallel RAM acceses architectures can support, which has
> major
> impacts on balancing cores' use with I/Os use?
>

I have no desire to limit the figures of merit that deserve
consideration. I just want provide some corrective to the "Wow! A
gazillion flops!" without even an asterisk talk.

Right now, people present, brag about, and plan for just one figure of
merit: linpack flops. That makes sense to some, I gather, but it makes
no sense to me.

Computation is more or less a solved problem. Most of the challenges
left have to do with moving data around, with latency and not bandwidth
having gotten the lion's share of attention (for good reason). I
believe that moving data around will ultimately be the limiting factor
with regard to reducing power consumption.

> If everyone thinks a different group is called for, that's fine with
> me. I just
> want to understand the reasons this type of discussion doesn't fit
> here.

The safest answer that I can think of to this question is that it is
really an interdepartmental problem.

The computer architects here have been relatively tolerant of my
excursions of thought as to why the computers currently being built
don't really cut it, but a proper discussion of all the pro's and con's
would take the discussion and perhaps the list far beyond any normal
definition of computer architecture.

Even leaving aside justifying why expensive bandwidth is not optional,
there is little precedent here for in-depth explorations of blue-sky
proposals. A fair fraction of the blue-sky propositions brought here
can't be taken seriously, and my sense of this group is that it wants to
keep the thinking mostly inside the box, not for want of imagination,
but to avoid science fiction and rambling, uninformed discussion.

Robert.
From: jgd on
In article <QSK0o.10246$Zp1.7167(a)newsfe15.iad>, rbmyersusa(a)gmail.com
(Robert Myers) wrote:

> Since I have talked most about the subject here and gotten the most
> valuable feedback here, I thought to solicit advice as to what kind
> of forum would seem most plausible/attractive to pursue such a
> subject.

A mailing list seems the most plausible to me. When the subject doesn't
have a well-defined structure (as yet), a wiki or web BBS tends to get
in the way of communication.

--
John Dallman, jgd(a)cix.co.uk, HTML mail is treated as probable spam.
From: nik Simpson on
On 7/19/2010 10:36 AM, MitchAlsup wrote:
>
> d) high end PC processors can afford 2 memory channels

Not quite as screwed as that, the top-end Xeon & Opteron parts have 4
DDR3 memory channels, but still screwed. For the 2-socket space, it's 3
DDR3 memory channels for typical server processors. Of course, the move
to on-chip memory controllers means that scope for additional memory
channels is pretty much "zero" but that's the price you pay for
commodity parts, they are designed to meet the majority of customers,
and it's hard to justify the costs of additional memory channels at the
processor and board layout levels just to satisfy the needs of bandwidth
crazy HPC apps ;-)

--
Nik Simpson