From: Andi Kleen on
Valdis.Kletnieks(a)vt.edu wrote:
> On Wed, 28 Oct 2009 06:24:45 BST, Andi Kleen said:
>
>>>>> mce: CPU supports 0 MCE banks
>> That message can be just removed I think. I don't see much value in it
>> because the value is in sysfs and when you see the CPU type you can easily
>> determine it anyways.
>
> Maybe it should only print a message if it finds an unexpected number of banks?
> "Hey dood - we're on a Core3.5 and there should be 6 banks here, but the
> hardware says there's only 4. What's up with that?"

The kernel doesn't know what number of banks are expected, just humans do.

-Andi


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Mike Travis on


Hidetoshi Seto wrote:
> Andi Kleen wrote:
>> Hidetoshi Seto wrote:
>>> Without disabling, what can we do on MCE with no bank?
>> Nothing, but is it really worth adding a special case?
>
> If question were:
> - is it really worth to support this special environment,
> "MCE-capable but no MCE banks" ?
> then I'd like to say no.
>
> So I suggested to disable MCE on this uncertain environment.
> Or we will end up adding more codes for special cases...
>
>>> I found that do_machine_check() does nothing if banks==0 ... it is better
>>> to let system to panic with "Machine check from unknown source"?
>> IMHO yes. In this case the system must be very confused and panic is the
>> best you can do. Otherwise it won't do anything interesting anyways.
>
> Agreed, but this is also a special case.
> Not depending on the real number of banks, confused system could fail to
> get the value from memory... Humm, in theory MCE handler must be
> implemented carefully, but I bet the confused value will not be always 0,
> ... is it worth to do?
>
>>>>> Hum, I suppose the line for CPU 0 was slightly different from others,
>>>>> because SHD means "this bank is shared bank and controlled by other".
>>>>> Maybe:
>>>>> CPU 0 MCA banks CMCI:0 CMCI:1 CMCI:2 CMCI:3 CMCI:5 ... CMCI:21
>>>>>
>>>>> But I agree that we could some work for this messages...
>>>>> Is it better to change the message level to debug from info?
>>>> Can be made INFO yes, but I would prefer not removing them
>>>> from the dmesg for now.
>>>>
>>>> Perhaps they could be also compressed a bit like SRAT.
>>> Like SRAT? I could not catch the meaning ... For example?
>> See the recent patches from David Rientjes in the same original thread.
>
> I found it, thanks.
>
> So I suppose your idea is like:
> CPU 0 MCA banks CMCI:{0-3,5-9,12-21} POLL:{4,10,11}
> CPU 1 MCA banks SHD:{0,1,6-9,12-21} CMCI:{2,3,5} POLL:{4,10,11}
> right?
>
> IMHO the format I suggested is better to read, as far as banks is
> not so big number.
> CPU 0 MCA banks map : CCCC PCCC CCPP CCCC CCCC CC
> CPU 1 MCA banks map : ssCC PCss ssPP ssss ssss ss
>
>
> Thanks,
> H.Seto

The problem comes up when you have a whole bunch of cpus, and the lines
become redundant. Can you compress the lines so that cpus with the
same given mappings are printed on one line?

Thanks,
Mike
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Roland Dreier on

> Perhaps they could be also compressed a bit like SRAT.

Seems like a good idea... but I wonder what the best way to represent
things is. For example I have a 2-socket Nehalem system that shows:

2 times: MCA banks CMCI:2 CMCI:3 CMCI:5 CMCI:6 SHD:8
6 times: MCA banks CMCI:2 CMCI:3 CMCI:5 SHD:6 SHD:8
8 times: MCA banks SHD:2 SHD:3 SHD:5 SHD:6 SHD:8

presumably the first line is once per package, the next line is for the
first sibling in all the other cores in a package, and the last line is
for the SMT siblings of all the cores.

But would we want to accumulate all the different combinations of banks
along with a CPU mask and then print something like:

CPUs 0 4: MCA banks CMCI:2 CMCI:3 CMCI:5 CMCI:6 SHD:8
CPUs 1 2 3 5 6 7: MCA banks CMCI:2 CMCI:3 CMCI:5 SHD:6 SHD:8
CPUs 8 9 10 11 12 13 14 15: MCA banks SHD:2 SHD:3 SHD:5 SHD:6 SHD:8

of course output like that is going to lead to super-long lines on a
64-thread system.

Also I'm not sure of a clean way to implement this; unlike the SRAT
stuff, we need to deal with CPU hotplug so all this at best could be
__cpuinitdata, ie we can't discard it in most configs.

However the "MCA banks" output definitely is annoying on a 64-thread
system -- the amount of output is far greater than the utility of said
output. So ideas on the best way to reduce this would be appreciated.

Thanks,
Roland
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Mike Travis on


Roland Dreier wrote:
> > Perhaps they could be also compressed a bit like SRAT.
>
> Seems like a good idea... but I wonder what the best way to represent
> things is. For example I have a 2-socket Nehalem system that shows:
>
> 2 times: MCA banks CMCI:2 CMCI:3 CMCI:5 CMCI:6 SHD:8
> 6 times: MCA banks CMCI:2 CMCI:3 CMCI:5 SHD:6 SHD:8
> 8 times: MCA banks SHD:2 SHD:3 SHD:5 SHD:6 SHD:8
>
> presumably the first line is once per package, the next line is for the
> first sibling in all the other cores in a package, and the last line is
> for the SMT siblings of all the cores.
>
> But would we want to accumulate all the different combinations of banks
> along with a CPU mask and then print something like:
>
> CPUs 0 4: MCA banks CMCI:2 CMCI:3 CMCI:5 CMCI:6 SHD:8
> CPUs 1 2 3 5 6 7: MCA banks CMCI:2 CMCI:3 CMCI:5 SHD:6 SHD:8
> CPUs 8 9 10 11 12 13 14 15: MCA banks SHD:2 SHD:3 SHD:5 SHD:6 SHD:8

Or use a cpumask and cpulist_scnprintf which condenses the cpu list nicely.

>
> of course output like that is going to lead to super-long lines on a
> 64-thread system.
>
> Also I'm not sure of a clean way to implement this; unlike the SRAT
> stuff, we need to deal with CPU hotplug so all this at best could be
> __cpuinitdata, ie we can't discard it in most configs.
>
> However the "MCA banks" output definitely is annoying on a 64-thread
> system -- the amount of output is far greater than the utility of said
> output. So ideas on the best way to reduce this would be appreciated.
>
> Thanks,
> Roland
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Roland Dreier on

> > But would we want to accumulate all the different combinations of banks
> > along with a CPU mask and then print something like:
> >
> > CPUs 0 4: MCA banks CMCI:2 CMCI:3 CMCI:5 CMCI:6 SHD:8
> > CPUs 1 2 3 5 6 7: MCA banks CMCI:2 CMCI:3 CMCI:5 SHD:6 SHD:8
> > CPUs 8 9 10 11 12 13 14 15: MCA banks SHD:2 SHD:3 SHD:5 SHD:6 SHD:8
>
> Or use a cpumask and cpulist_scnprintf which condenses the cpu list nicely.

Thanks! I didn't know about that API.

However with that said I think the real issue is whether that style of
output is a good idea, no matter how nicely the CPU list is formatted :)

- R.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/