From: nmm1 on
In article <7ivv9cF335jhvU1(a)mid.individual.net>,
Del Cecchi <delcecchinospamofthenorth(a)gmail.com> wrote:
>
>The Origin didn't have a service processor to handle things like power
>on and off? I am shocked and appalled.

Yes, it did. That wasn't the problem.

What I did was hammer it hard enough that the CPUs jammed solid in
the firmware, which was then no longer listening to the NMI channel.
It only happened a couple of times - normally, powering on and off
via the console worked.

Incidentally, I did something very similar to an IBM SP3, for very
different reasons. I am pretty sure that was a straight misdesign
in the controller software. That was very irritating, because I
wasn't stress-testing it at the time.

As far as I recall, I never managed it on the Hitachi SR2201 or
Sun F15K.


Regards,
Nick Maclaren.
From: Del Cecchi on
nmm1(a)cam.ac.uk wrote:
> In article <7ivv9cF335jhvU1(a)mid.individual.net>,
> Del Cecchi <delcecchinospamofthenorth(a)gmail.com> wrote:
>> The Origin didn't have a service processor to handle things like power
>> on and off? I am shocked and appalled.
>
> Yes, it did. That wasn't the problem.
>
> What I did was hammer it hard enough that the CPUs jammed solid in
> the firmware, which was then no longer listening to the NMI channel.
> It only happened a couple of times - normally, powering on and off
> via the console worked.
>
> Incidentally, I did something very similar to an IBM SP3, for very
> different reasons. I am pretty sure that was a straight misdesign
> in the controller software. That was very irritating, because I
> wasn't stress-testing it at the time.
>
> As far as I recall, I never managed it on the Hitachi SR2201 or
> Sun F15K.
>
>
> Regards,
> Nick Maclaren.
I think the modern thing to do is have the service processor interface
to the switches and control IPL and power on and off. So it shouldn't
matter if the main processor is wedged or vaporized. the service
processor can snapshot stuff by scanning the LSSD registers and handle
the power stuff. NMI, whatever. turn the power off, or scan the
appropriate data in (equivilent of POR) and you are good to go.

I guess someone was trying to save a few bucks or something.

del
From: nmm1 on
In article <7j2eqjF33tntoU1(a)mid.individual.net>,
Del Cecchi <delcecchinospamofthenorth(a)gmail.com> wrote:
>
>I think the modern thing to do is have the service processor interface
>to the switches and control IPL and power on and off. So it shouldn't
>matter if the main processor is wedged or vaporized. the service
>processor can snapshot stuff by scanning the LSSD registers and handle
>the power stuff. NMI, whatever. turn the power off, or scan the
>appropriate data in (equivilent of POR) and you are good to go.

The Origin was some time ago now. It is also possible that, as with
the similar event on the SP3, I had managed to lock up the on-rack
service processor (assuming there was one).

A very common low-level hardware misdesign is for a 'message' between
two chips to hang, uninterruptibly, if the other chip is dead in the
water. That phenomenon may (or may not) have been involved in either
or both cases. As you can imagine, I didn't waste time trying to
find out exactly what had happened :-)


Regards,
Nick Maclaren.