From: matt.reilly on
On Apr 11, 5:19 pm, "Paul A. Clayton" <paaronclay...(a)earthlink.net>
wrote:
>
> I would also consider the 2-wide FPU of Blue Gene processor a
> significant difference (4 FLOPs per cycle vs. 2 FLOPs per cycle
> for theSiCortexprocessor). I was a bit disappointed that theSiCortexdid not exploit such SIMD. Perhaps the targeted
>
> Paul A. Clayton
> just a technophile
> reachable as 'paaronclayton'
> at "embarqmail.com"

Early on, BG/L users had trouble coaxing the compilers to use the
second
FP pipe, as I recall. Del? The issue rules and exclusions around the
second
pipe took some adapting.

Adding more FP without commensurate memory bandwidth
and communications bandwidth is often a waste of power.

But getting 4FLOPS/cycle of single precision would be handy. sigh.


matt
From: Del Cecchi on

<matt.reilly(a)sicortex.com> wrote in message
news:adc0a048-6ad5-4030-b02b-5b51ddecc842(a)c65g2000hsa.googlegroups.com...
> On Apr 11, 5:19 pm, "Paul A. Clayton" <paaronclay...(a)earthlink.net>
> wrote:
>>
>> I would also consider the 2-wide FPU of Blue Gene processor a
>> significant difference (4 FLOPs per cycle vs. 2 FLOPs per cycle
>> for theSiCortexprocessor). I was a bit disappointed that
>> theSiCortexdid not exploit such SIMD. Perhaps the targeted
>>
>> Paul A. Clayton
>> just a technophile
>> reachable as 'paaronclayton'
>> at "embarqmail.com"
>
> Early on, BG/L users had trouble coaxing the compilers to use the
> second
> FP pipe, as I recall. Del? The issue rules and exclusions around the
> second
> pipe took some adapting.
>
> Adding more FP without commensurate memory bandwidth
> and communications bandwidth is often a waste of power.
>
> But getting 4FLOPS/cycle of single precision would be handy. sigh.
>
>
> matt

Don't look at me. I was a circuit designer. Well, I still am but not
actively at the moment. So all that software stuff is out of my field.
Holes and electrons and femtofarads and picoseconds, now you're talking
my language.... :-)


From: Nick Maclaren on

In article <adc0a048-6ad5-4030-b02b-5b51ddecc842(a)c65g2000hsa.googlegroups.com>,
matt.reilly(a)sicortex.com writes:
|> On Apr 11, 5:19 pm, "Paul A. Clayton" <paaronclay...(a)earthlink.net>
|> wrote:
|> >
|> Adding more FP without commensurate memory bandwidth
|> and communications bandwidth is often a waste of power.

Yes.

|> But getting 4FLOPS/cycle of single precision would be handy. sigh.

Not really, except for image and audio work. While there are a few
applications for which single precision is enough on a modern high-
performance computer (please note), there aren't many. There are a
fair number where it would be enough, if all parts of it programmed
by a numerical expert - and there are damn few of those still active.

You can't even solve a large, well-conditioned set of linear equations
in single precision, if the matrix is banded - there is a limit of
about a million on the number of equations at which point ALL accuracy
is lost. See Wilkinson and Reinsch. Solving a mere 10,000 (which can
be done easily even for unbanded matrices) won't give you more than
about 1% accuracy. And those are the BEST cases - any ill-conditioning
and forget it!


Regards,
Nick Maclaren.
From: Bernd Paysan on
Nick Maclaren wrote:
> You can't even solve a large, well-conditioned set of linear equations
> in single precision, if the matrix is banded - there is a limit of
> about a million on the number of equations at which point ALL accuracy
> is lost. See Wilkinson and Reinsch. Solving a mere 10,000 (which can
> be done easily even for unbanded matrices) won't give you more than
> about 1% accuracy. And those are the BEST cases - any ill-conditioning
> and forget it!

Question: How do you get the coefficients for such a matrix to be more
accurate than SP? I mean for a real-world problem. Most measurements give
you something between 8 and 24 integer bits; in some rare occasions, you
get a few more. E.g. if you do your 14 days weather forecast with "the
required precision", you forget that your actual data is not that precise,
either. You may not lose any precision during the calculation process, but
if you change one of the inputs by just one bit, you end up with completely
different weather.

It's a shame that random rounding is not supported by common number
crunching hardware. When you want to know if your algorithm actually works
with imprecise input data, you can just try with slight changes on the
input. But then, you still need to have enough headroom in the actual
calculation, and in all non-trivial iterative calculations, you can't
really know. So I'd rather like to have random rounding, and when the
result is quite different each time, I just know that it's not stable.

--
Bernd Paysan
"If you want it done right, you have to do it yourself"
http://www.jwdt.com/~paysan/
From: Del Cecchi on

"Bernd Paysan" <bernd.paysan(a)gmx.de> wrote in message
news:44e9d5-b3j.ln1(a)vimes.paysan.nom...
>
(snip)
> It's a shame that random rounding is not supported by common number
> crunching hardware. When you want to know if your algorithm actually
> works
> with imprecise input data, you can just try with slight changes on the
> input. But then, you still need to have enough headroom in the actual
> calculation, and in all non-trivial iterative calculations, you can't
> really know. So I'd rather like to have random rounding, and when the
> result is quite different each time, I just know that it's not stable.
>
> --
> Bernd Paysan
> "If you want it done right, you have to do it yourself"
> http://www.jwdt.com/~paysan/

True random numbers are a real pain to generate on a chip. Would
pseudo-random work?


First  |  Prev  |  Next  |  Last
Pages: 1 2 3 4 5 6 7 8 9
Prev: Committed Instructions
Next: Need of "Precise Exceptions"