From: nedbrek on
Hello all,

"Robert Myers" <rbmyersusa(a)gmail.com> wrote in message
news:37734c23-5748-4f4b-9013-1d1c60cb3d94(a)d8g2000yqf.googlegroups.com...
> On Jul 20, 1:49 pm, "David L. Craig" <dlc....(a)gmail.com> wrote:
>> If we're talking about custom, never-mind-the-cost
>> designs, then that's the stuff that should make this
>> a really fun group.
>
> Moving the discussion to some place slightly less visible than
> comp.arch might not produce more productive flights of fancy, but I,
> for one, am interested in what is physically possible and not just
> what can be built with the consent of Sen. Mikulski--a lady I have
> always admired, to be sure, from her earliest days in politics, just
> not the person I'd cite as intellectual backup for technical
> decisions.

If we are only limited by physics, a lot is possible...

Can you summarize the problem space here?
1) Amount of data - fixed (SPEC), or grows with performance (TPC)
2) Style of access - you mentioned this some, regular (not random) but not
really suitable for sequential (or cache line) structures. Is it sparse
array? Linked lists? What percentage is pointers vs. FMAC inputs?
3) How branchy is it?

I think that should be enough to get some juices going...

Ned


From: David L. Craig on
On Jul 20, 7:11 pm, Robert Myers <rbmyers...(a)gmail.com> wrote:

> Maybe quantum entanglement is the answer to moving data around.

Sigh... I wonder how many decades we are from that being standard in
COTS hardware (assuming the global underpins of R&D hold up that
long). Probably more than I've got (unless the medical R&D also grows
by leaps and bounds and society deems me worthy of being kept around).

I like simultaneous backup 180 degrees around the planet and on the
Moon, that's for sure.
From: Jeremy Linton on
On 7/21/2010 5:26 AM, nmm1(a)cam.ac.uk wrote:
> In article<8ant0rFf0gU1(a)mid.individual.net>,
> Andrew Reilly<areilly---(a)bigpond.net.au> wrote:
>> On Tue, 20 Jul 2010 11:49:03 -0700, Robert Myers wrote:
>>
>>> (90%+ efficiency for Linpack, 10% for anything even slightly more
>>> interesting).
>>
>> Have you, or anyone else here, ever read any studies of the sensitivities
>> of the latter performance figure to differences in interconnect bandwidth/
>> expense? I.e., does plugging another fat IB tree into every node in
>> parallel, doubling cross section bandwidth, raise the second figure to
>> 20%?
>
> A little, and I have done a bit of testing. It does help, sometimes
> considerably, but the latency is at least as important as the bandwidth.

With regard to latency, I've wondered for a while, why no has built a
large inifiniband (like?) switch with a large closely attached memory.
It probably won't help the MPI guys, but those beasts are only used for
the HPC market anyway. Why not modify them to shave a hop off and admit
that some segment of the HPC market could use it? Is the HPC market that
cost sensitive that they cannot afford a slight improvement, at a
disproportionate cost, for one component in the system?






From: Robert Myers on
Andrew Reilly wrote:
> On Tue, 20 Jul 2010 11:49:03 -0700, Robert Myers wrote:
>
>> (90%+ efficiency for Linpack, 10% for anything even slightly more
>> interesting).
>
> Have you, or anyone else here, ever read any studies of the sensitivities
> of the latter performance figure to differences in interconnect bandwidth/
> expense? I.e., does plugging another fat IB tree into every node in
> parallel, doubling cross section bandwidth, raise the second figure to
> 20%?

I have read such studies, yes, and I've even posted some of what I've
found here on comp.arch, where there has been past discussion of just
those kinds of questions.

That's an argument for why this material shouldn't be limited to being
scattered through comp.arch. I have a hard time finding even my own
posts with Google groups search.

A place has generously been offered to host probably a mailing list and
a wiki. I'll be glad to try to continue to pursue the conversation here
to try to generate as wide interest as possible, but, since I've already
worn the patience of some thin by repeating myself, I'd rather focus on
finding a relatively quiet gathering place for those who are really
interested.

I have neither interest in nor intention of moderating a group or
limiting the membership, so whatever is done should be available to
whoever is interested. Whatever I do will be clearly announced here.

Robert.
From: nmm1 on
In article <i274h0$hqs$1(a)speranza.aioe.org>,
Jeremy Linton <reply-to-list(a)nospam.org> wrote:
>>>
>>>> (90%+ efficiency for Linpack, 10% for anything even slightly more
>>>> interesting).
>>>
>>> Have you, or anyone else here, ever read any studies of the sensitivities
>>> of the latter performance figure to differences in interconnect bandwidth/
>>> expense? I.e., does plugging another fat IB tree into every node in
>>> parallel, doubling cross section bandwidth, raise the second figure to
>>> 20%?
>>
>> A little, and I have done a bit of testing. It does help, sometimes
>> considerably, but the latency is at least as important as the bandwidth.
>
>With regard to latency, I've wondered for a while, why no has built a
>large inifiniband (like?) switch with a large closely attached memory.
>It probably won't help the MPI guys, but those beasts are only used for
>the HPC market anyway. Why not modify them to shave a hop off and admit
>that some segment of the HPC market could use it? Is the HPC market that
>cost sensitive that they cannot afford a slight improvement, at a
>disproportionate cost, for one component in the system?

It's been done, but network attached memory isn't really viable, as
local memory is so cheap and so much faster.

It sounds as if you think it would reduce latency, but of what?
I.e. what would you use it for?


Regards,
Nick Maclaren.