|
From: matt.reilly on 9 Apr 2008 12:22 On Apr 9, 12:20 am, "Chris Thomasson" <cris...(a)comcast.net> wrote: > > > What do you think about my initial idea on how to program your systems? That > is, using advanced shared-memory multi-threading techniques for intra-node > communication, and MPI for inter-node communication... I think it should > work very well. I am always interested in being able to create and play > around with my own algorithms. I appreciate that your DMA engine microcode > is available; have you applied for any patents? There are folks who use shared memory for the intra-node comms and MPI between nodes. With care, that can work on SiCortex systems and deliver good performance. Shared memory programming mixed with message passing works well for some, brings the problems of both worlds to others. SiCortex is happy to see both. Personally, I tend to do all the communications with MPI. That way I don't need to worry about processor assignments, task mapping, or even what platform I'm running on. Most of the code that I see from customers and prospects follows a similar model. We have, however seen a few hybrid codes.
From: already5chosen on 9 Apr 2008 12:51 On Apr 9, 6:17 pm, matt.rei...(a)sicortex.com wrote: > On Apr 6, 1:19 pm, already5cho...(a)yahoo.com wrote: > > > Matt, > > Could you comment on Paul Gotch's speculations above (about MIPS > > 20Kc) ? > > We looked at the 20Kc and liked it. However, it was a hard macro > (that is, MIPS supplies completed masks, not synthesizable verilog) > and designed for 130nm. Our technology target was 90nm. The team > had lots of experience in doing design shrinks and felt the cost > of shrinking the 20Kc was prohibitive. Further, the 20Kc as it stood > was not designed for a cache coherent SMP. Fitting the necessary > changes into a hard macro made it even more problematic. > > We chose the 5Kf, a 64 bit soft IP block from MIPS. Then we > worked hard at the synthesis flow to make it run at 500 MHz > and stay within a sub-watt power budget. > > matt Thanks for information, Matt. I vaguely remember from the Byte articles from the mid 90s that 5Kf FPU was optimized toward single-precision performance. Did you change this part of the core?
From: Del Cecchi on 9 Apr 2008 19:49 <matt.reilly(a)sicortex.com> wrote in message news:3e072521-34e1-4a8c-a7aa-98afafe75711(a)8g2000hse.googlegroups.com... > On Apr 9, 12:20 am, "Chris Thomasson" <cris...(a)comcast.net> wrote: >> >> >> What do you think about my initial idea on how to program your >> systems? That >> is, using advanced shared-memory multi-threading techniques for >> intra-node >> communication, and MPI for inter-node communication... I think it >> should >> work very well. I am always interested in being able to create and >> play >> around with my own algorithms. I appreciate that your DMA engine >> microcode >> is available; have you applied for any patents? > > There are folks who use shared memory for the intra-node comms and > MPI between nodes. With care, that can work on SiCortex systems and > deliver good performance. Shared memory programming mixed with > message passing works well for some, brings the problems of both > worlds to others. SiCortex is happy to see both. > > Personally, I tend to do all the communications with MPI. That way I > don't > need to worry about processor assignments, task mapping, or even what > platform I'm running on. Most of the code that I see from customers > and > prospects follows a similar model. We have, however seen a few hybrid > codes. This sounds a lot like a Blue Gene, only of course with Mips taking the place of PowerPC as the processor. Would you comment on the differences? del
From: matt.reilly on 10 Apr 2008 13:13 On Apr 9, 12:51 pm, already5cho...(a)yahoo.com wrote: > > Thanks for information, Matt. > I vaguely remember from the Byte articles from the mid 90s that 5Kf > FPU was optimized toward single-precision performance. Did you change > this part of the core? We did goose the FP unit a bit. We rebuilt the FP pipeline to support double precision at 2FLOPs per cycle (MADD.D a double precision mull-add), so the double precision FP rate is the same as the single precision FP rate. Other than that, we added cache coherence to the L1, and full single bit correction/double bit detect to the L1 Dcache. (The I cache is parity protected.) There were Byte articles in the mid 90's on the 5Kf? Who knew?
From: matt.reilly on 10 Apr 2008 13:37
On Apr 9, 7:49 pm, "Del Cecchi" <delcecchioftheno...(a)gmail.com> wrote: > This sounds a lot like a Blue Gene, only of course with Mips taking the > place of PowerPC as the processor. Would you comment on the differences? > > del I doesn't sound anything like a Blue Gene -- it is much much quieter. ;) The major differences relative to BG/L are 1. Higher performance inter-node communication (higher BW, lower end-to-end latency (average under 2uS MPI ping-pong). 2. Design centered on 972 nodes and smaller. Our ambitions are to fill needs in day-to-day production environments, not to beat the Earth Simulator or occupy slots in the Top500. (Somebody needs to do that, but we chose to focus elsewhere.) 3. Full linux/posix environment on all processors. All system software and SiCortex libraries are open source. 4. Up to 8GB of DRAM per 6 processor node. 5. SiCortex has configurations from 72 processors (the deskside development system) to 5832 (the cabinet with the gull wing doors). In addition to the SC648 (648 processors) and the SC1458 (1458 processors, how DID we come up with this naming scheme?) there are incremental versions in between that involve replacing processor modules with "placeholder modules." 6. Kautz graph topology for all traffic vs. mesh/torus and trees. The graph diameter is 6 for the largest SiCortex system. 7. "Generic" IO as long as you think of PCI Express Modules as "Generic." Specifically, all systems support GigE, Infiniband, and Fiberchannel. We support others, but I don't have the supported IO list in front of me right now. 8. BG/L does a better job of managing the processor-memory path: BG/L stream triads are 6x better than SiCortex. Sigh. BG/P will probably improve on a few of these, but I haven't seen results from the BG/P installations yet. There are probably other differences, but I'm more versed on the SiCortex side of things than the BG/L. |