From: Baron on
Nix Inscribed thus:

> On 16 Jan 2010, Baron spake thusly:
>
>> Nix Inscribed thus:
>>
>>> On 10 Jan 2010, Baron verbalised:
>>>> 75% of software problems are hardware related, ignoring virus and
>>>> PBKC.
>>>
>>> Speaking as someone who writes software for a living, boyoboy you
>>> couldn't be more wrong. I'd give the figure as well under 1%. Bugs
>>> in software are ubiquitous.
>>>
>>> (Even considering "you didn't read the hardware spec before you did
>>> $FOO and now it's failed" as a "hardware bug" and thus considering
>>> cache-coherency problems in multithreaded apps to be "hardware
>>> problems", the figure is likely still well under 5%).
>>
>> Ok ! So you are the only person that programs for all potential
>> hardware faults... I don't think so !
>
> No... but hardware has a lot of *potential* faults, but relatively few
> *actual* ones that aren't catastrophic. Bad RAM (or bad cells on
> flash) is probably the most common, and that surely isn't common: I've
> never seen it except on new machiens.

Point taken !

I see far far more faults, both on new and old machines, that are
related to hardware, for what ever reason, than can be attributed to
software failure.

> Software has myriad actual faults: the flow of bugs never stops.
> Compare to, say, the erratum list for a modern Intel CPU: a lot of
> faults, many of which sound horrible... but all of which are so
> obscure that either the OS can work around them or it never runs
> into them at all.

I wouldn't dispute that at all !
Where you have had a working machine suddenly start to have problems, I
rarely find that its the software thats at fault.

From your point of view, you must make the assumption that the hardware
is what the manufacturer/designer says it is. From there your program
is designed to achieve a desired goal, and you will develop work
arounds to overcome problems that you face.

If the manufacturer makes a change to the hardware, you would code or
change your code accordingly. But how can you possibly account for a
gate failure, broken trace, bad joint or a thermal issue due to infant
mortality or aging. I don't belive that its possible... Yet !

We both see the same problem but from different perspectives. I would
always make sure that the hardware was sound before condemning
software.

--
Best Regards:
Baron.
From: Nix on
On 18 Jan 2010, Baron said:

> Nix Inscribed thus:
>> No... but hardware has a lot of *potential* faults, but relatively few
>> *actual* ones that aren't catastrophic. Bad RAM (or bad cells on
>> flash) is probably the most common, and that surely isn't common: I've
>> never seen it except on new machiens.
>
> Point taken !
>
> I see far far more faults, both on new and old machines, that are
> related to hardware, for what ever reason, than can be attributed to
> software failure.

Amazing. You're almost unique in my experience: perhaps you buy really
cheap hardware, or never use anything non-heavily-tested?

> I wouldn't dispute that at all !
> Where you have had a working machine suddenly start to have problems, I
> rarely find that its the software thats at fault.

You mean where something that worked before suddenly stops working
without apparent cause? Often that too is software, especially in opaque
systems such as Windows: something deep in the state of the system has
silently changed and broken something. Earlier versions of Windows were
so prone to this that it became widely known as 'Windows rot'.

> From your point of view, you must make the assumption that the hardware
> is what the manufacturer/designer says it is. From there your program
> is designed to achieve a desired goal, and you will develop work
> arounds to overcome problems that you face.

Er, yes. And very often you get it wrong (generally not because of the
hardware: generally because of mistakes in how parts of the software
relate to other parts). Hence, bugs.

> If the manufacturer makes a change to the hardware, you would code or
> change your code accordingly. But how can you possibly account for a
> gate failure, broken trace, bad joint or a thermal issue due to infant
> mortality or aging. I don't belive that its possible... Yet !

Well, it sort of is. ECCRAM, for instance, especially combined with
Intel's latest bleeding-edge features to allow RAM failures to trigger
masking out of the physical page containing the damaged RAM, and
checksummed buses like PCIe... but CPU-layer faults, bad joints on the
motherboard, you're stuck.

> We both see the same problem but from different perspectives. I would
> always make sure that the hardware was sound before condemning
> software.

I always, always do the opposite, and am nearly always right (except
on new machines where the hardware has not been known to work before:
there, too, I assume hardware).
From: Baron on
Nix Inscribed thus:

> On 18 Jan 2010, Baron said:
>
>> Nix Inscribed thus:
>>> No... but hardware has a lot of *potential* faults, but relatively
>>> few *actual* ones that aren't catastrophic. Bad RAM (or bad cells on
>>> flash) is probably the most common, and that surely isn't common:
>>> I've never seen it except on new machiens.
>>
>> Point taken !
>>
>> I see far far more faults, both on new and old machines, that are
>> related to hardware, for what ever reason, than can be attributed to
>> software failure.
>
> Amazing. You're almost unique in my experience: perhaps you buy really
> cheap hardware, or never use anything non-heavily-tested?

Its nice to be unique ! I'm not really. :-)
45+ years in hardware makes you cynical.
Particularly where something that can't possibly happen, happens.

>> I wouldn't dispute that at all !
>> Where you have had a working machine suddenly starts to have
>> problems, I rarely find that its the software thats at fault.
>
> You mean where something that worked before suddenly stops working
> without apparent cause? Often that too is software, especially in
> opaque systems such as Windows: something deep in the state of the
> system has silently changed and broken something. Earlier versions of
> Windows were so prone to this that it became widely known as 'Windows
> rot'.

Windows is not/never was any good !

>> From your point of view, you must make the assumption that the
>> hardware is what the manufacturer/designer says it is. From there
>> your program is designed to achieve a desired goal, and you will
>> develop work arounds to overcome problems that you face.
>
> Er, yes. And very often you get it wrong (generally not because of the
> hardware: generally because of mistakes in how parts of the software
> relate to other parts). Hence, bugs.
>
>> If the manufacturer makes a change to the hardware, you would code or
>> change your code accordingly. But how can you possibly account for a
>> gate failure, broken trace, bad joint or a thermal issue due to
>> infant mortality or aging. I don't belive that its possible...
>> Yet !
>
> Well, it sort of is. ECCRAM, for instance, especially combined with
> Intel's latest bleeding-edge features to allow RAM failures to trigger
> masking out of the physical page containing the damaged RAM, and
> checksummed buses like PCIe... but CPU-layer faults, bad joints on the
> motherboard, you're stuck.

The "bad caps" syndrome is a good example of software damaged by failing
hardware.

>> We both see the same problem but from different perspectives. I
>> would always make sure that the hardware was sound before condemning
>> software.
>
> I always, always do the opposite, and am nearly always right (except
> on new machines where the hardware has not been known to work before:
> there, too, I assume hardware).

--
Best Regards:
Baron.
From: Nix on
On 18 Jan 2010, Baron verbalised:

> Nix Inscribed thus:
>
>> Amazing. You're almost unique in my experience: perhaps you buy really
>> cheap hardware, or never use anything non-heavily-tested?
>
> Its nice to be unique ! I'm not really. :-)
> 45+ years in hardware makes you cynical.
> Particularly where something that can't possibly happen, happens.

Ah, I think that's the real difference. 45 years in hardware with
all the faults coming to you, and you assume the hardware is where
the faults lie. For me, twenty-plus years in software with all
the faults coming to me, and I assume the software is where the
faults lie :)

The difference is that, even on my own systems, I see software faults
*far* more often then hardware faults. (Perhaps part of that is because
I always buy top-of-the-line hardware because I know damn well that if
the hardware fails, I don't have a hope of fixing it, so it had better
not fail.)

>> opaque systems such as Windows: something deep in the state of the
>> system has silently changed and broken something. Earlier versions of
>> Windows were so prone to this that it became widely known as 'Windows
>> rot'.
>
> Windows is not/never was any good !

XP is actually a little better here. (A little.)

But the problem is present in all complex systems, not just Windows
(hell, not just computers, although in evolved systems such as biology
the effect is to freeze things that everything relies on in time,
because if they change at all they kill the organism).

>> Well, it sort of is. ECCRAM, for instance, especially combined with
>> Intel's latest bleeding-edge features to allow RAM failures to trigger
>> masking out of the physical page containing the damaged RAM, and
>> checksummed buses like PCIe... but CPU-layer faults, bad joints on the
>> motherboard, you're stuck.
>
> The "bad caps" syndrome is a good example of software damaged by failing
> hardware.

Oh, I'm not saying hardware failure doesn't happen (indeed right now I'm
trying to diagnose a bizarre problem with a system that has good RAM
when you boot it, but the longer it is powered on or the more RAM is in
it, the worse the RAM gets until after a day or so it won't even boot).
It's just that instability, particularly non-pervasive instability, is
generally software's fault. If e.g. *everything* starts going wrong at
once in inconsistent and constantly-changing ways, *then* I'd start to
suspect hardware, but not before. (Plus, obviously, some failures, like
disk failures, are obvious when they happen. But disks are weird: I
mean, *moving parts*? How primitive!)
From: Baron on
Nix Inscribed thus:

> On 18 Jan 2010, Baron verbalised:
>
>> Nix Inscribed thus:
>>
>>> Amazing. You're almost unique in my experience: perhaps you buy
>>> really cheap hardware, or never use anything non-heavily-tested?
>>
>> Its nice to be unique ! I'm not really. :-)
>> 45+ years in hardware makes you cynical.
>> Particularly where something that can't possibly happen, happens.
>
> Ah, I think that's the real difference. 45 years in hardware with
> all the faults coming to you, and you assume the hardware is where
> the faults lie. For me, twenty-plus years in software with all
> the faults coming to me, and I assume the software is where the
> faults lie :)

Yup. Two sides of the same coin !

> The difference is that, even on my own systems, I see software faults
> *far* more often then hardware faults. (Perhaps part of that is
> because I always buy top-of-the-line hardware because I know damn well
> that if the hardware fails, I don't have a hope of fixing it, so it
> had better not fail.)
>
>>> opaque systems such as Windows: something deep in the state of the
>>> system has silently changed and broken something. Earlier versions
>>> of Windows were so prone to this that it became widely known as
>>> 'Windows rot'.
>>
>> Windows is not/never was any good !
>
> XP is actually a little better here. (A little.)

XP is what W98 should have been. M$ has been too busy making money on
the back of a fundamentally bad design.

> But the problem is present in all complex systems, not just Windows
> (hell, not just computers, although in evolved systems such as biology
> the effect is to freeze things that everything relies on in time,
> because if they change at all they kill the organism).
>
>>> Well, it sort of is. ECCRAM, for instance, especially combined with
>>> Intel's latest bleeding-edge features to allow RAM failures to
>>> trigger masking out of the physical page containing the damaged RAM,
>>> and checksummed buses like PCIe... but CPU-layer faults, bad joints
>>> on the motherboard, you're stuck.
>>
>> The "bad caps" syndrome is a good example of software damaged by
>> failing hardware.
>
> Oh, I'm not saying hardware failure doesn't happen (indeed right now
> I'm trying to diagnose a bizarre problem with a system that has good
> RAM when you boot it, but the longer it is powered on or the more RAM
> is in it, the worse the RAM gets until after a day or so it won't even
> boot).

Look at the memory power supply. Particularly if it gets worse with
more ram.

> It's just that instability, particularly non-pervasive
> instability, is generally software's fault. If e.g. *everything*
> starts going wrong at once in inconsistent and constantly-changing
> ways, *then* I'd start to suspect hardware, but not before. (Plus,
> obviously, some failures, like disk failures, are obvious when they
> happen. But disks are weird: I mean, *moving parts*? How primitive!)

Solid state HDD will fall in price as the take up grows. Couple of
years maybe. ;-)

--
Best Regards:
Baron.