|
Prev: Intel plans to tackle cosmic ray threat
Next: Intel plans to tackle cosmic ray threat (actually they have been working on it for at least five years...austin)
From: austin on 8 Apr 2008 11:15 Symon, Well, Cypress, Xilinx, IBM, and many others have made it no secret that neutrons at sea level are causing upsets, and we have done something about it (and presented the papers, and shown our results). Intel has also been working very quietly on this, with much less press. I suggest that if you are not thinking about single event effects, you should be, and demanding your vendor show you the proof of their design efforts in this regard. Virtex 5 is (as of today), 144 FIT/Mb for the config bits, 95% confidence interval from 100 to 200 FIT/Mb. This is from our 400 devices located on mountain tops in France (31.029 Giga-bit-years of test time, 35 events). Compare this to a 65nm ASSP or ASIC, which is at least 1000 FIT/Mb or 1000 FIT/million gates(!). Do nothing, and it gets worse. Do something, and it gets back to where it should be. These numbers from the SELSE II conference a few years back: the industry numbers are really a lot worse, but no one will admit it. There is a reason why Xilinx FPGA devices are finding their way into many high availability and high reliability applications: we are the only choice -- there is no competition whatsoever. Austin
From: austin on 8 Apr 2008 11:58 Symon, First of all, there is no such thing as a single particle detector. Secondly, detecting the current spike (from a strike) requires a very complex circuit, itself subject to spikes (I know, we designed them for the USAF...) Thirdly, Intel has done far more than this, and deserved a better PR. Perhaps they should fire the PR firm? Austin Symon wrote: > "austin" <austin(a)xilinx.com> wrote in message > news:ftg25m$p2m2(a)cnn.xsj.xilinx.com... >> Intel has also been working very quietly on this, with much less press. >> > Hi Austin, > I wondered what were your thoughts on their patent where "The cosmic ray > detector [built into the device] is therefore designed to spot when rays > have caused interference and then tell the chip to repeat the command." ? I > guess in an FPGA it could trigger a readback to ensure the device was still > correctly configured and/or issue a user logic reset. > Cheers, Syms. > >
From: austin on 8 Apr 2008 12:07 And, Yes, in S3A, S3AN, S3D, V4, V5 we are able to either reconfigure on detection of an upset, notify the user (and they decide what to do), or in V4 and V5, correct the flipped bit without having to reconfigure (or even go to the config flash/prom). Basically, in our road show, it is detailed how the user needs to decide what to do, and at what levels, in order to meet their availability and reliability numbers. Mitigation is part hardware, part system architecture, and part software. Depending on what you are doing, and how long you can tolerate being "off-line" there are different solutions. They are: -just reconfigure, start fresh -just fix the bit flip, continue on (as a flip does nothing 90% of the time, and seldom causes anything to 'crash') -fix the bit flip and reset or go back to a check point/known states -use dual redundancy, and check for agreement (if a fault is not tolerated - like in banking, accounting) repeat if no agreement -use full triple modular redundancy (when it must be correct, and 100% available), also scrub to fix bits that may flip so flips are not allowed to accumulate All methods are used by our customers, and they all work. We have reference designs and support for these models. And they can be tested by reconfiguring to flip bits while operating. One heck of a lot cheaper than using a proton beam, or neutron beam .... and more complete (we have folks who flip each bit, one by one, and prove their system meets its requirements). Austin
From: Jon Elson on 8 Apr 2008 15:15 Symon wrote: > "austin" <austin(a)xilinx.com> wrote in message > news:ftg25m$p2m2(a)cnn.xsj.xilinx.com... > >>Intel has also been working very quietly on this, with much less press. >> > > Hi Austin, > I wondered what were your thoughts on their patent where "The cosmic ray > detector [built into the device] is therefore designed to spot when rays > have caused interference and then tell the chip to repeat the command." ? I > guess in an FPGA it could trigger a readback to ensure the device was still > correctly configured and/or issue a user logic reset. > Cheers, Syms. > > Boy, I saw that text, too, and really wondered about how reliable such a procedure would be. If the state of flip-flops or dynamic memories are altered, repeating the previous instruction operation would be worthless. There is SO much more area in high-end CPUs devoted to memory and much less to logic functions, I would expect memory corruption to be the most probable fault. Jon
From: Colin Paul Gloster on 10 Apr 2008 07:26
Austin posted: |------------------------------------------------------------------------| |"[..] | | | |[..] they can be tested | |by reconfiguring to flip bits while operating. One heck of a lot cheaper| |than using a proton beam, or neutron beam .... and more complete (we | |have folks who flip each bit, one by one, and prove their system meets | |its requirements)." | |------------------------------------------------------------------------| Logical testing will not match checking whether real radiation respects your model of the system. One transient can defeat the outcome of clocked triply modularly redundant voters. Sincerely, Colin Paul Gloster, unemployed and cold |