|
From: matt.reilly on 12 Apr 2008 20:10 On Apr 11, 5:19 pm, "Paul A. Clayton" <paaronclay...(a)earthlink.net> wrote: > > I would also consider the 2-wide FPU of Blue Gene processor a > significant difference (4 FLOPs per cycle vs. 2 FLOPs per cycle > for theSiCortexprocessor). I was a bit disappointed that theSiCortexdid not exploit such SIMD. Perhaps the targeted > > Paul A. Clayton > just a technophile > reachable as 'paaronclayton' > at "embarqmail.com" Early on, BG/L users had trouble coaxing the compilers to use the second FP pipe, as I recall. Del? The issue rules and exclusions around the second pipe took some adapting. Adding more FP without commensurate memory bandwidth and communications bandwidth is often a waste of power. But getting 4FLOPS/cycle of single precision would be handy. sigh. matt
From: Del Cecchi on 12 Apr 2008 23:43 <matt.reilly(a)sicortex.com> wrote in message news:adc0a048-6ad5-4030-b02b-5b51ddecc842(a)c65g2000hsa.googlegroups.com... > On Apr 11, 5:19 pm, "Paul A. Clayton" <paaronclay...(a)earthlink.net> > wrote: >> >> I would also consider the 2-wide FPU of Blue Gene processor a >> significant difference (4 FLOPs per cycle vs. 2 FLOPs per cycle >> for theSiCortexprocessor). I was a bit disappointed that >> theSiCortexdid not exploit such SIMD. Perhaps the targeted >> >> Paul A. Clayton >> just a technophile >> reachable as 'paaronclayton' >> at "embarqmail.com" > > Early on, BG/L users had trouble coaxing the compilers to use the > second > FP pipe, as I recall. Del? The issue rules and exclusions around the > second > pipe took some adapting. > > Adding more FP without commensurate memory bandwidth > and communications bandwidth is often a waste of power. > > But getting 4FLOPS/cycle of single precision would be handy. sigh. > > > matt Don't look at me. I was a circuit designer. Well, I still am but not actively at the moment. So all that software stuff is out of my field. Holes and electrons and femtofarads and picoseconds, now you're talking my language.... :-)
From: Nick Maclaren on 13 Apr 2008 03:40 In article <adc0a048-6ad5-4030-b02b-5b51ddecc842(a)c65g2000hsa.googlegroups.com>, matt.reilly(a)sicortex.com writes: |> On Apr 11, 5:19 pm, "Paul A. Clayton" <paaronclay...(a)earthlink.net> |> wrote: |> > |> Adding more FP without commensurate memory bandwidth |> and communications bandwidth is often a waste of power. Yes. |> But getting 4FLOPS/cycle of single precision would be handy. sigh. Not really, except for image and audio work. While there are a few applications for which single precision is enough on a modern high- performance computer (please note), there aren't many. There are a fair number where it would be enough, if all parts of it programmed by a numerical expert - and there are damn few of those still active. You can't even solve a large, well-conditioned set of linear equations in single precision, if the matrix is banded - there is a limit of about a million on the number of equations at which point ALL accuracy is lost. See Wilkinson and Reinsch. Solving a mere 10,000 (which can be done easily even for unbanded matrices) won't give you more than about 1% accuracy. And those are the BEST cases - any ill-conditioning and forget it! Regards, Nick Maclaren.
From: Bernd Paysan on 13 Apr 2008 09:23 Nick Maclaren wrote: > You can't even solve a large, well-conditioned set of linear equations > in single precision, if the matrix is banded - there is a limit of > about a million on the number of equations at which point ALL accuracy > is lost. See Wilkinson and Reinsch. Solving a mere 10,000 (which can > be done easily even for unbanded matrices) won't give you more than > about 1% accuracy. And those are the BEST cases - any ill-conditioning > and forget it! Question: How do you get the coefficients for such a matrix to be more accurate than SP? I mean for a real-world problem. Most measurements give you something between 8 and 24 integer bits; in some rare occasions, you get a few more. E.g. if you do your 14 days weather forecast with "the required precision", you forget that your actual data is not that precise, either. You may not lose any precision during the calculation process, but if you change one of the inputs by just one bit, you end up with completely different weather. It's a shame that random rounding is not supported by common number crunching hardware. When you want to know if your algorithm actually works with imprecise input data, you can just try with slight changes on the input. But then, you still need to have enough headroom in the actual calculation, and in all non-trivial iterative calculations, you can't really know. So I'd rather like to have random rounding, and when the result is quite different each time, I just know that it's not stable. -- Bernd Paysan "If you want it done right, you have to do it yourself" http://www.jwdt.com/~paysan/
From: Del Cecchi on 14 Apr 2008 10:55
"Bernd Paysan" <bernd.paysan(a)gmx.de> wrote in message news:44e9d5-b3j.ln1(a)vimes.paysan.nom... > (snip) > It's a shame that random rounding is not supported by common number > crunching hardware. When you want to know if your algorithm actually > works > with imprecise input data, you can just try with slight changes on the > input. But then, you still need to have enough headroom in the actual > calculation, and in all non-trivial iterative calculations, you can't > really know. So I'd rather like to have random rounding, and when the > result is quite different each time, I just know that it's not stable. > > -- > Bernd Paysan > "If you want it done right, you have to do it yourself" > http://www.jwdt.com/~paysan/ True random numbers are a real pain to generate on a chip. Would pseudo-random work? |