From: Andreas Herrmann on
The patches don't properly work here.

(1) For instance I got following log entries when doing
suspend/resume, doing CPU offline/online test and reloading the
module:

microcode: original microcode versions...
microcode: CPU0-3: patch_level=0x1000065


platform microcode: firmware: requesting amd-ucode/microcode_amd.bin
...
microcode: CPU0-1,3: patch_level=0x1000083

microcode: CPU2-3: patch_level=0x1000065

Microcode Update Driver: v2.00 <tigran(a)aivazian.fsnet.co.uk>, Peter Oruba

The patch levels are:

# for i in `seq 0 3`; do lsmsr -c $i PATCH_LEVEL; done
PATCH_LEVEL = 0x0000000001000083
PATCH_LEVEL = 0x0000000001000083
PATCH_LEVEL = 0x0000000001000065
PATCH_LEVEL = 0x0000000001000065

(2) During suspend/resume the ucode is not updated:

hadburg linux # for i in `seq 0 3`; do lsmsr -c $i PATCH_LEVEL; done
PATCH_LEVEL = 0x0000000001000083
PATCH_LEVEL = 0x0000000001000083
PATCH_LEVEL = 0x0000000001000083
PATCH_LEVEL = 0x0000000001000083
hadburg linux # pm-suspend
hadburg linux # for i in `seq 0 3`; do lsmsr -c $i PATCH_LEVEL; done
PATCH_LEVEL = 0x0000000001000065
PATCH_LEVEL = 0x0000000001000065
PATCH_LEVEL = 0x0000000001000065
PATCH_LEVEL = 0x0000000001000065


That used to work w/o your patches. Didn't have time to look why this
is now failing. You've changed mc_cpu_callback() -- most likely that
is causing this regression.


Regards,
Andreas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Andreas Herrmann on
On Thu, Nov 05, 2009 at 07:40:53PM +0100, Dmitry Adamushko wrote:
> 2009/11/5 Andreas Herrmann <herrmann.der.user(a)googlemail.com>:
> > The patches don't properly work here.
> >
> > (1) For instance I got following log entries when doing
> > � �suspend/resume, doing CPU offline/online test and reloading the
> > � �module:
>
> To avoid possible misunderstandings, I'd like to clarify the output below.
>
> > �microcode: original microcode versions...
> > �microcode: CPU0-3: patch_level=0x1000065
>
> So this is the 1st time you have loaded a module.
>
> > �platform microcode: firmware: requesting amd-ucode/microcode_amd.bin
> > �...
> > �microcode: CPU0-1,3: patch_level=0x1000083
>
> before or after loading a module? CPU2 is down, isn't it?

No, no CPU was offline at this moment. They all were brought back
online after some CPU hotplug and/or suspend/resume tests.

> > �microcode: CPU2-3: patch_level=0x1000065

Both messages showed up after same ucode-update process.

> same question as above.

Same answer as above all CPUs are online.

> Here, either CPUs 0 and 1 are down or have a
> different version. Both above messages don't make sense taken together

See, and that's the problem.

> (CPU3 belongs to both sets) unless summarize_cpu_info() is utterly
> broken.

I didn't check that yet.

> > �Microcode Update Driver: v2.00 <tigran(a)aivazian.fsnet.co.uk>, Peter Oruba
> >
> > The patch levels are:
> >
> > �# for i in `seq 0 3`; do lsmsr -c $i PATCH_LEVEL; done
> > �PATCH_LEVEL � � � � �= 0x0000000001000083
> > �PATCH_LEVEL � � � � �= 0x0000000001000083
> > �PATCH_LEVEL � � � � �= 0x0000000001000065
> > �PATCH_LEVEL � � � � �= 0x0000000001000065
>
> this is after your test has been stopped and all the CPUs are up, right?

Yes.

> > (2) During suspend/resume the ucode is not updated:
> >
> > �hadburg linux # for i in `seq 0 3`; do lsmsr -c $i PATCH_LEVEL; done
> > �PATCH_LEVEL � � � � �= 0x0000000001000083
> > �PATCH_LEVEL � � � � �= 0x0000000001000083
> > �PATCH_LEVEL � � � � �= 0x0000000001000083
> > �PATCH_LEVEL � � � � �= 0x0000000001000083
> > �hadburg linux # pm-suspend
> > �hadburg linux # for i in `seq 0 3`; do lsmsr -c $i PATCH_LEVEL; done
> > �PATCH_LEVEL � � � � �= 0x0000000001000065
> > �PATCH_LEVEL � � � � �= 0x0000000001000065
> > �PATCH_LEVEL � � � � �= 0x0000000001000065
> > �PATCH_LEVEL � � � � �= 0x0000000001000065
> >
> >
> > That used to work w/o your patches. Didn't have time to look why this
> > is now failing. You've changed mc_cpu_callback() -- most likely that
> > is causing this regression.
>
> Hmm, cpu-event-callbacks seem to be working on my (Intel) setup. I
> have enabled pr_debug messages and also did a little trick to allow
> ucode of the same version to be loaded (my cpu is of the recent ucode
> by itself) and I can see cpu-callback events for both resuming and
> cpu-up cases.
>
> (firstly, upgraded with microcode_ctl as I only have a .dat file)
>
> suspend-resume
> ...
> [ 584.506371] microcode: CPU1 removed
> [ 584.516018] microcode: CPU0 updated to revision 0x57, date = 2007-03-15
> [ 584.597326] microcode: CPU1 updated upon resume
> [ 584.597562] microcode: CPU1 updated to revision 0x57, date = 2007-03-15
> [ 584.597565] microcode: CPU1 added
> ...
>
> and now cpu1 : down -> up
>
> [ 1616.932249] microcode: CPU1 removed
> [ 1633.942502] platform microcode: firmware: requesting intel-ucode/06-0f-02
> [ 1633.954638] microcode: data file intel-ucode/06-0f-02 load failed
> [ 1633.954642] microcode: CPU1 added
>
>
> as I understand, you don't see " platform microcode: firmware:
> requesting intel-ucode" messages upon 'upping' a cpu, do you?

Sure, no intel-ucode messages as I tested with AMD CPUs ;-)
But otherwise no, no messages.

> sure, my test is somewhat limited... anyway, first of all I'd like to
> get a clear understanding of your logs. Thanks for yout test btw. :-))

I'll send you full logs asap.


Regards,
Andreas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Andreas Herrmann on
On Fri, Nov 06, 2009 at 01:56:31PM +0100, Dmitry Adamushko wrote:
> 2009/11/6 Andreas Herrmann <herrmann.der.user(a)googlemail.com>:

<snip>

> >> (CPU3 belongs to both sets) unless summarize_cpu_info() is utterly
> >> broken.
> >
> > I didn't check that yet.
>
> Yeah, this behavior is likely due to a missing cpumask_clear() in
> summarize_cpu_info().

Yeah, that fixes the wrong messages.
The other problem of not-updated CPU microcode after suspend/resume persists.

> should be as follows:
>
> if (!alloc_cpumask_var(&cpulist, GFP_KERNEL))
> return;
>
> + cpumask_clear(cpulist);

Better use zalloc_cpumask instead of alloc/clear.

> >> sure, my test is somewhat limited... anyway, first of all I'd like to
> >> get a clear understanding of your logs. Thanks for yout test btw. :-))
> >
> > I'll send you full logs asap.
>
> Thanks. Maybe it's something about a particular sequence of actions
> that triggers this behavior. Or was it reproducible with the very
> first pm-suspend invocation after "modprobe microcode.ko"?

The sequence is:

1. loading microcode.ko
2. setting cpu2 offline
3. setting cpu2 online
4. suspend (pm-suspend)
5. resume

microcode of CPU2 is not updated:

# for i in `seq 0 3`; do lsmsr -c $i PATCH_LEVEL; done
PATCH_LEVEL = 0x0000000001000083
PATCH_LEVEL = 0x0000000001000083
PATCH_LEVEL = 0x0000000001000065
PATCH_LEVEL = 0x0000000001000083

dmesg attached.

As I've said, that test used to pass with all CPUs updated to new
ucode in the past (at least that I think so ;-( -- but in contrast to
my previous mail this doesn't seem to be related to your patch. I
tested latest mainline and the test fails as well ... seems that I
need to do some debugging.


Regards,
Andreas

PS1: You should remove the needless newline from the patch level string:

static int version_snprintf(char *buf, int len, struct cpu_signature *csig)
{
- return snprintf(buf, len, "patch_level=0x%x\n", csig->rev);
+ return snprintf(buf, len, "patch_level=0x%x", csig->rev);
}

PS2: I plan to remove further needless messages from the amd ucode driver asap.
From: Andreas Herrmann on
On Wed, Nov 11, 2009 at 05:07:22PM +0100, Dmitry Adamushko wrote:
> Andreas,
>
>
> any progress with this issue?

Yes

> You mentioned that the problem is also reproducible without my
> patches, right?

.... and yes.

Fixed with
http://git.kernel.org/?p=linux/kernel/git/x86/linux-2.6-tip.git;a=commitdiff;h=9f15226e75583547aaf542c6be4bdac1060dd425


Andreas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Ingo Molnar on

-tip testing found the following bug - there's a _long_ boot delay of
58.6 seconds if the CPU family is not supported:

[ 1.421761] calling microcode_init+0x0/0x137 @ 1
[ 1.426532] platform microcode: firmware: requesting amd-ucode/microcode_amd.bin
[ 61.433126] microcode: failed to load file amd-ucode/microcode_amd.bin
[ 61.439682] microcode: CPU0: AMD CPU family 0xf not supported
[ 61.445441] microcode: CPU1: AMD CPU family 0xf not supported
[ 61.451273] Microcode Update Driver: v2.00 <tigran(a)aivazian.fsnet.co.uk>, Peter Oruba
[ 61.459116] initcall microcode_init+0x0/0x137 returned 0 after 58625622 usecs

Where does this delay come from?

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/