From: Andi Kleen on
Hidetoshi Seto wrote:
> Mike Travis wrote:
>> Mike Travis wrote:
>>> Hi Roland,
>>>
>>> I've found that I'm getting one of these lines for every cpu:
>>>
>>> mce: CPU supports 0 MCE banks

That message can be just removed I think. I don't see much value in it
because the value is in sysfs and when you see the CPU type you can easily
determine it anyways.

I don't think the patch below really solves the problem because they
would have the same noise problem back once they switch from the simulator
to a real box which has banks.

> Hum, I suppose the line for CPU 0 was slightly different from others,
> because SHD means "this bank is shared bank and controlled by other".
> Maybe:
> CPU 0 MCA banks CMCI:0 CMCI:1 CMCI:2 CMCI:3 CMCI:5 ... CMCI:21
>
> But I agree that we could some work for this messages...
> Is it better to change the message level to debug from info?

Can be made INFO yes, but I would prefer not removing them
from the dmesg for now.

Perhaps they could be also compressed a bit like SRAT.

-Andi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Hidetoshi Seto on
Andi Kleen wrote:
> Hidetoshi Seto wrote:
>> Mike Travis wrote:
>>> Mike Travis wrote:
>>>> Hi Roland,
>>>>
>>>> I've found that I'm getting one of these lines for every cpu:
>>>>
>>>> mce: CPU supports 0 MCE banks
>
> That message can be just removed I think. I don't see much value in it
> because the value is in sysfs and when you see the CPU type you can easily
> determine it anyways.
>
> I don't think the patch below really solves the problem because they
> would have the same noise problem back once they switch from the simulator
> to a real box which has banks.

If box has any banks more than 0, then the line above will be appeared only
once for CPU 0. Only on the simulator, with MCE-capable processor with no
bank, this message becomes unacceptable noise because it appears for every
cpu.

Anyway I think my patch is nice to have, to avoid unexpected behavior on
uncertain environment.

Without disabling, what can we do on MCE with no bank?
I found that do_machine_check() does nothing if banks==0 ... it is better
to let system to panic with "Machine check from unknown source"?


>> Hum, I suppose the line for CPU 0 was slightly different from others,
>> because SHD means "this bank is shared bank and controlled by other".
>> Maybe:
>> CPU 0 MCA banks CMCI:0 CMCI:1 CMCI:2 CMCI:3 CMCI:5 ... CMCI:21
>>
>> But I agree that we could some work for this messages...
>> Is it better to change the message level to debug from info?
>
> Can be made INFO yes, but I would prefer not removing them
> from the dmesg for now.
>
> Perhaps they could be also compressed a bit like SRAT.

Like SRAT? I could not catch the meaning ... For example?


Thanks,
H.Seto

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Andi Kleen on
Hidetoshi Seto wrote:

>
> Without disabling, what can we do on MCE with no bank?

Nothing, but is it really worth adding a special case?

> I found that do_machine_check() does nothing if banks==0 ... it is better
> to let system to panic with "Machine check from unknown source"?

IMHO yes. In this case the system must be very confused and panic is the
best you can do. Otherwise it won't do anything interesting anyways.

>
>>> Hum, I suppose the line for CPU 0 was slightly different from others,
>>> because SHD means "this bank is shared bank and controlled by other".
>>> Maybe:
>>> CPU 0 MCA banks CMCI:0 CMCI:1 CMCI:2 CMCI:3 CMCI:5 ... CMCI:21
>>>
>>> But I agree that we could some work for this messages...
>>> Is it better to change the message level to debug from info?
>> Can be made INFO yes, but I would prefer not removing them
>> from the dmesg for now.
>>
>> Perhaps they could be also compressed a bit like SRAT.
>
> Like SRAT? I could not catch the meaning ... For example?

See the recent patches from David Rientjes in the same original thread.

-Andi

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Hidetoshi Seto on
Andi Kleen wrote:
> Hidetoshi Seto wrote:
>> Without disabling, what can we do on MCE with no bank?
>
> Nothing, but is it really worth adding a special case?

If question were:
- is it really worth to support this special environment,
"MCE-capable but no MCE banks" ?
then I'd like to say no.

So I suggested to disable MCE on this uncertain environment.
Or we will end up adding more codes for special cases...

>> I found that do_machine_check() does nothing if banks==0 ... it is better
>> to let system to panic with "Machine check from unknown source"?
>
> IMHO yes. In this case the system must be very confused and panic is the
> best you can do. Otherwise it won't do anything interesting anyways.

Agreed, but this is also a special case.
Not depending on the real number of banks, confused system could fail to
get the value from memory... Humm, in theory MCE handler must be
implemented carefully, but I bet the confused value will not be always 0,
.... is it worth to do?

>>>> Hum, I suppose the line for CPU 0 was slightly different from others,
>>>> because SHD means "this bank is shared bank and controlled by other".
>>>> Maybe:
>>>> CPU 0 MCA banks CMCI:0 CMCI:1 CMCI:2 CMCI:3 CMCI:5 ... CMCI:21
>>>>
>>>> But I agree that we could some work for this messages...
>>>> Is it better to change the message level to debug from info?
>>> Can be made INFO yes, but I would prefer not removing them
>>> from the dmesg for now.
>>>
>>> Perhaps they could be also compressed a bit like SRAT.
>>
>> Like SRAT? I could not catch the meaning ... For example?
>
> See the recent patches from David Rientjes in the same original thread.

I found it, thanks.

So I suppose your idea is like:
CPU 0 MCA banks CMCI:{0-3,5-9,12-21} POLL:{4,10,11}
CPU 1 MCA banks SHD:{0,1,6-9,12-21} CMCI:{2,3,5} POLL:{4,10,11}
right?

IMHO the format I suggested is better to read, as far as banks is
not so big number.
CPU 0 MCA banks map : CCCC PCCC CCPP CCCC CCCC CC
CPU 1 MCA banks map : ssCC PCss ssPP ssss ssss ss


Thanks,
H.Seto

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Valdis.Kletnieks on
On Wed, 28 Oct 2009 06:24:45 BST, Andi Kleen said:

> >>> mce: CPU supports 0 MCE banks
>
> That message can be just removed I think. I don't see much value in it
> because the value is in sysfs and when you see the CPU type you can easily
> determine it anyways.

Maybe it should only print a message if it finds an unexpected number of banks?
"Hey dood - we're on a Core3.5 and there should be 6 banks here, but the
hardware says there's only 4. What's up with that?"