From: matt.reilly on
On Apr 9, 12:20 am, "Chris Thomasson" <cris...(a)comcast.net> wrote:
>
>
> What do you think about my initial idea on how to program your systems? That
> is, using advanced shared-memory multi-threading techniques for intra-node
> communication, and MPI for inter-node communication... I think it should
> work very well. I am always interested in being able to create and play
> around with my own algorithms. I appreciate that your DMA engine microcode
> is available; have you applied for any patents?

There are folks who use shared memory for the intra-node comms and
MPI between nodes. With care, that can work on SiCortex systems and
deliver good performance. Shared memory programming mixed with
message passing works well for some, brings the problems of both
worlds to others. SiCortex is happy to see both.

Personally, I tend to do all the communications with MPI. That way I
don't
need to worry about processor assignments, task mapping, or even what
platform I'm running on. Most of the code that I see from customers
and
prospects follows a similar model. We have, however seen a few hybrid
codes.
From: already5chosen on
On Apr 9, 6:17 pm, matt.rei...(a)sicortex.com wrote:
> On Apr 6, 1:19 pm, already5cho...(a)yahoo.com wrote:
>
> > Matt,
> > Could you comment on Paul Gotch's speculations above (about MIPS
> > 20Kc) ?
>
> We looked at the 20Kc and liked it. However, it was a hard macro
> (that is, MIPS supplies completed masks, not synthesizable verilog)
> and designed for 130nm. Our technology target was 90nm. The team
> had lots of experience in doing design shrinks and felt the cost
> of shrinking the 20Kc was prohibitive. Further, the 20Kc as it stood
> was not designed for a cache coherent SMP. Fitting the necessary
> changes into a hard macro made it even more problematic.
>
> We chose the 5Kf, a 64 bit soft IP block from MIPS. Then we
> worked hard at the synthesis flow to make it run at 500 MHz
> and stay within a sub-watt power budget.
>
> matt

Thanks for information, Matt.
I vaguely remember from the Byte articles from the mid 90s that 5Kf
FPU was optimized toward single-precision performance. Did you change
this part of the core?

From: Del Cecchi on

<matt.reilly(a)sicortex.com> wrote in message
news:3e072521-34e1-4a8c-a7aa-98afafe75711(a)8g2000hse.googlegroups.com...
> On Apr 9, 12:20 am, "Chris Thomasson" <cris...(a)comcast.net> wrote:
>>
>>
>> What do you think about my initial idea on how to program your
>> systems? That
>> is, using advanced shared-memory multi-threading techniques for
>> intra-node
>> communication, and MPI for inter-node communication... I think it
>> should
>> work very well. I am always interested in being able to create and
>> play
>> around with my own algorithms. I appreciate that your DMA engine
>> microcode
>> is available; have you applied for any patents?
>
> There are folks who use shared memory for the intra-node comms and
> MPI between nodes. With care, that can work on SiCortex systems and
> deliver good performance. Shared memory programming mixed with
> message passing works well for some, brings the problems of both
> worlds to others. SiCortex is happy to see both.
>
> Personally, I tend to do all the communications with MPI. That way I
> don't
> need to worry about processor assignments, task mapping, or even what
> platform I'm running on. Most of the code that I see from customers
> and
> prospects follows a similar model. We have, however seen a few hybrid
> codes.

This sounds a lot like a Blue Gene, only of course with Mips taking the
place of PowerPC as the processor. Would you comment on the differences?

del


From: matt.reilly on
On Apr 9, 12:51 pm, already5cho...(a)yahoo.com wrote:
>
> Thanks for information, Matt.
> I vaguely remember from the Byte articles from the mid 90s that 5Kf
> FPU was optimized toward single-precision performance. Did you change
> this part of the core?

We did goose the FP unit a bit. We rebuilt the FP pipeline to support
double precision at 2FLOPs per cycle (MADD.D a double precision
mull-add), so the double precision FP rate is the same as the single
precision FP rate.

Other than that, we added cache coherence to the L1, and full
single bit correction/double bit detect to the L1 Dcache. (The I
cache is parity protected.)

There were Byte articles in the mid 90's on the 5Kf? Who knew?
From: matt.reilly on
On Apr 9, 7:49 pm, "Del Cecchi" <delcecchioftheno...(a)gmail.com> wrote:
> This sounds a lot like a Blue Gene, only of course with Mips taking the
> place of PowerPC as the processor. Would you comment on the differences?
>
> del

I doesn't sound anything like a Blue Gene -- it is much much
quieter. ;)

The major differences relative to BG/L are

1. Higher performance inter-node communication (higher BW, lower
end-to-end latency (average under 2uS MPI ping-pong).

2. Design centered on 972 nodes and smaller. Our ambitions are
to fill needs in day-to-day production environments, not to beat
the Earth Simulator or occupy slots in the Top500. (Somebody
needs to do that, but we chose to focus elsewhere.)

3. Full linux/posix environment on all processors. All system software
and SiCortex libraries are open source.

4. Up to 8GB of DRAM per 6 processor node.

5. SiCortex has configurations from 72 processors (the deskside
development
system) to 5832 (the cabinet with the gull wing doors). In addition
to the
SC648 (648 processors) and the SC1458 (1458 processors, how DID we
come
up with this naming scheme?) there are incremental versions in between
that
involve replacing processor modules with "placeholder modules."

6. Kautz graph topology for all traffic vs. mesh/torus and trees. The
graph
diameter is 6 for the largest SiCortex system.

7. "Generic" IO as long as you think of PCI Express Modules as
"Generic."
Specifically, all systems support GigE, Infiniband, and
Fiberchannel.
We support others, but I don't have the supported IO list in front of
me right now.

8. BG/L does a better job of managing the processor-memory path:
BG/L stream triads are 6x better than SiCortex. Sigh.



BG/P will probably improve on a few of these, but I haven't seen
results from the BG/P installations yet.

There are probably other differences, but I'm more versed on the
SiCortex side of things than the BG/L.
First  |  Prev  |  Next  |  Last
Pages: 1 2 3 4 5 6 7 8 9
Prev: Committed Instructions
Next: Need of "Precise Exceptions"