From: "Andy "Krazy" Glew" on
[Skybuckposting about doing GPGPU computations, using SW error detection and correction
>>> I have 0% experience with error correcting codes I am afraid ! ;) :)
mpm wrote:
>> Have you considered paying a consultant?

Don't discourage the kid! If Skybuck follows through on this, he's on the way to becoming an expert. Perhaps not in
the academic theory, but perhaps on how to do SW ECC on GPU hardware.

I have more fun reading Skybuck's posts.

Hackers rule!


For Skybuck:

a) Try googling "residue error detection". I haven't vetted these papers - they give an idea, but they are probably not
the papers that talk best abot how to use residues for error detection in ALUs and other computations.

Scholarly articles for residue error detection
Redundant residue number systems for error detection � - Etzel - Cited by 56
Detection and tracking of point features - Tomasi - Cited by 851
Concurrent error detection using watchdog processors- � - Mahmood - Cited by 315
Search Results

Results include your SearchWiki notes for residue error detection. Share these notes
Copy and paste this link into an email or IM:

See a preview of the shared page
1.
An Algorithm for Scaling and Single Residue Error Correction in ...
by CC Su - 1990 - Cited by 11 - Related articles - All 4 versions
{10} R. W. Watson, "Error detection and correction and other residue-interacting operations in a redundant
residue number system," Ph.D. dissertation, ...
portal.acm.org/citation.cfm?id=101793.101802 -

a') The basic idea is, for every, say, 32 bit floating point comp[utation, compute a, say, 3 bit residue, and check the
residues.


b) CRCs are not so good for checking ALU integrity. CRCs are okay for checking data stored, e.g. in memory. You might
also use a CRC if you do the computation twice, calculating a CRC at various intermediate points, and compare the CRCs
as opposed to comparing all computation results. (I've used exactly this in simulators, comparing a hash of the simple
in-order and full out-of-order simulations.)

i.e. CRCs are for error detection when you do the computation twice, or for storage. Residues are the best known
method for doing a computation once, and detecting if that single computation has an error.


c) You talked about using xyz, leaving w for ECC.

On the GPU I am most familiar with, xyzw are corresponding 32 bit fields of a 128 bit wide SIMD vector.

On the GPU I am most familiar with, the instruction set is SIMD, so you tend to want to do all ops on all elements of a
vector. Although you can mask. Doing residue or other ECC this way would be inefficient.

You really want to do the ECC in a separate computation. A separate vector. E.g. a 4x32 vector of FP, and a 4x4
vector of residues. Or, even better, a vector of 32 32-bit FP data items (8x128 bits), with a 128 bit vector that is 32
4-bit residues.

If you can do 32-wide SIMD, this would have the least overhead for residues.

However, if your computation is not this uniform...

ATI) on ATI's VLIW pipeline you may not lose so much by doing 3x32 + a residue However, the residue ops in SW may
require more instructions that the FP.

NVDA) If the xyz and w are really different scalars within a single thread of a warp, and you can get a full warp width
of converged control flow, great.

But even here if, for example, you did 32 bit computations, you might still be able to do 8 4 bit reside computations in
the same 32 bits Or, rather, within a thread in a warp you might be able to do 8 32 bit computations, and then a single
set of 4-bit wide residue computations packed 8 in a 32 bit number.