From: H. Peter Anvin on
[Adding Borislav and Hans to the Cc: list]

Any objections? Otherwise I'm going to queue this up in the urgent queue.

-hpa


On 07/13/2010 11:59 AM, Michal Schmidt wrote:
> On my system with AMD Phenom II X6 I am seeing pauses at boot (usually during
> udev startup) which require a key press to continue. It only happens if C1E is
> enabled in the BIOS.
>
> It's caused by the APIC timer's inability to wake up the CPU from C1E (AMD
> erratum #400). Linux has a workaround for it, but it's not being applied
> correctly in this case. Though c1e_idle() detects C1E just fine, by the time
> acpi_idle ('processor.ko' module) takes over, it is forgotten.
>
> After AMD C1E is detected, it is not sufficient to flag it in boot_cpu_data,
> because the flag will get cleared in identify_cpu() when more CPUs are brought
> up later. The fix is to mark the flag as forced.
>
> The additional call to set_cpu_cap() is just to make sure the flag is set even
> on the CPUs that are already up and /proc/cpuinfo shows 'amdc1e' on all.
>
> Also fix indentation in the function.
>
> Signed-off-by: Michal Schmidt <mschmidt(a)redhat.com>
> ---
>
> arch/x86/kernel/process.c | 9 +++++----
> 1 files changed, 5 insertions(+), 4 deletions(-)
>
> diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
> index e7e3521..f3520a8 100644
> --- a/arch/x86/kernel/process.c
> +++ b/arch/x86/kernel/process.c
> @@ -589,7 +589,7 @@ static void c1e_idle(void)
> if (!boot_cpu_has(X86_FEATURE_NONSTOP_TSC))
> mark_tsc_unstable("TSC halt in AMD C1E");
> printk(KERN_INFO "System has AMD C1E enabled\n");
> - set_cpu_cap(&boot_cpu_data, X86_FEATURE_AMDC1E);
> + setup_force_cpu_cap(X86_FEATURE_AMDC1E);
> }
> }
>
> @@ -605,6 +605,7 @@ static void c1e_idle(void)
> &cpu);
> printk(KERN_INFO "Switch to broadcast mode on CPU%d\n",
> cpu);
> + set_cpu_cap(&current_cpu_data, X86_FEATURE_AMDC1E);
> }
> clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_ENTER, &cpu);
>
> @@ -614,9 +615,9 @@ static void c1e_idle(void)
> * The switch back from broadcast mode needs to be
> * called with interrupts disabled.
> */
> - local_irq_disable();
> - clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_EXIT, &cpu);
> - local_irq_enable();
> + local_irq_disable();
> + clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_EXIT, &cpu);
> + local_irq_enable();
> } else
> default_idle();
> }
>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Borislav Petkov on
From: "H. Peter Anvin" <hpa(a)zytor.com>
Date: Tue, Jul 13, 2010 at 01:05:06PM -0700

> [Adding Borislav and Hans to the Cc: list]
>
> Any objections? Otherwise I'm going to queue this up in the urgent queue.

Yeah, I was staring at it already. And it looks ok at a first glance.
However, since you want to speed it up into urgent, I'd like to give it
a run at our X6 boxes tomorrow before ACKing it..

> On 07/13/2010 11:59 AM, Michal Schmidt wrote:
> > On my system with AMD Phenom II X6 I am seeing pauses at boot (usually during
> > udev startup) which require a key press to continue. It only happens if C1E is
> > enabled in the BIOS.

.... and this is strange, I didn't experience anything like that on our
X6 boxes here couple of months ago. But maybe something got changed in
later kernels to trigger that behavior. Michal, what chipset is that and
do you have the latest BIOS on it?

--
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Gemeinde Aschheim, Landkreis Muenchen
Registergericht Muenchen, HRB Nr. 43632

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Michal Schmidt on
----- "Borislav Petkov" <borislav.petkov(a)amd.com> wrote:
> ... and this is strange, I didn't experience anything like that on
> our X6 boxes here couple of months ago. But maybe something got changed
> in later kernels to trigger that behavior. Michal, what chipset is that
> and do you have the latest BIOS on it?

The chipset is 890FX. Motherboard Asus M4A89TD PRO/USB3.
BIOS is the latest version: 0901 (05/17/2010 according to dmidecode)

I found another person reporting the same symptoms on
GIGABYTE GA-MA770T-UD3P AthlonXII 620 (4 cores) kernel 2.6.33.5 (Mandriva)
(http://www.abclinuxu.cz/poradna/hardware/show/308799, in Czech)

Michal
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: john stultz on
On Tue, Jul 13, 2010 at 2:01 PM, Michal Schmidt <mschmidt(a)redhat.com> wrote:
> ----- "Borislav Petkov" <borislav.petkov(a)amd.com> wrote:
>> ... and this is strange, I didn't experience anything like that on
>> our X6 boxes here couple of months ago. But maybe something got changed
>> in later kernels to trigger that behavior. Michal, what chipset is that
>> and do you have the latest BIOS on it?
>
> The chipset is 890FX. Motherboard Asus M4A89TD PRO/USB3.
> BIOS is the latest version: 0901 (05/17/2010 according to dmidecode)
>
> I found another person reporting the same symptoms on
> GIGABYTE GA-MA770T-UD3P AthlonXII 620 (4 cores) kernel 2.6.33.5 (Mandriva)
> (http://www.abclinuxu.cz/poradna/hardware/show/308799, in Czech)

This also sounds like: https://bugzilla.kernel.org/show_bug.cgi?id=15289

thanks
-john
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Borislav Petkov on
From: Michal Schmidt <mschmidt(a)redhat.com>
Date: Tue, Jul 13, 2010 at 08:59:58PM +0200

Hi,

> On my system with AMD Phenom II X6 I am seeing pauses at boot (usually during
> udev startup) which require a key press to continue. It only happens if C1E is
> enabled in the BIOS.
>
> It's caused by the APIC timer's inability to wake up the CPU from C1E (AMD
> erratum #400). Linux has a workaround for it, but it's not being applied
> correctly in this case. Though c1e_idle() detects C1E just fine, by the time
> acpi_idle ('processor.ko' module) takes over, it is forgotten.
>
> After AMD C1E is detected, it is not sufficient to flag it in boot_cpu_data,
> because the flag will get cleared in identify_cpu() when more CPUs are brought
> up later. The fix is to mark the flag as forced.

I don't think that the workaround is wrong, assuming I'm not missing
something. I'm seeing the following sequence on my machine here in which
the cores are brought up and checked for c1e:

The BSP does

start_kernel()
|->check_bugs()
|->identify_boot_cpu() # here we do select_idle_routine(), i.e. pm_idle = c1e_idle
....
|->rest_init()
|->kernel_thread(kernel_init,... )

and kernel_init() does smp_init() where we init the rest of the cores.

Now, each core does

start_secondary()
|->smp_callin()
|->smp_store_cpu_info # here we copy boot_cpu_data for the starting AP
|->identify_secondary_cpu
|->identify_cpu
|->select_idle_routine() # here we exit early since pm_idle is set already


now all the cores except the BSP do cpu_idle but since bits 27,28 in the
int pending MSR (see below) are not set yet, they spin a bit in cpu_idle
doing default_idle. You can see this with my debugging patch below.

Now here comes the key moment - the BSP enters cpu_idle _after_ all APs
have been initialized and does set X86_FEATURE_AMDC1E. We haven't set
the c1e_detected variable earlier since the hardware sets bit 28 in
MSR_K8_INT_PENDING_MSG, C1eOnCmpHalt only after all cores have entered
halt. After this bit is set, we set c1e_detected and switch to broadcast
mode on each core.

Now the question is, why does your system doesn't do that in that order?
And I don't think your patch is the right fix - it doesn't change
anything in the above sequence on my system except enabling the "amdc1e"
feature string in /proc/cpuinfo which we don't need actually since dmesg
already contains that info.

And the more puzzling question is, how does your patch fix your
system...?

So, IMHO, what is more likely is that it has something to do with
https://bugzilla.kernel.org/show_bug.cgi?id=15289, as John pointed out
earlier (thanks John, Michal's situation looks quite similar).

So, please apply the debug patch below and send me your whole dmesg to
see what happens. Also, I'd like to see whether the SMI bit (27) in that
same MSR is set so please do when the machine is up

for i in $(seq 0 5); do lsmsr -c $i Int -V 3; done

after installing the x86info tool.

Thanks.

C1E dbg patch:

---
arch/x86/include/asm/acpi.h | 11 +++++++++--
arch/x86/kernel/process.c | 12 ++++++++++++
2 files changed, 21 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/acpi.h b/arch/x86/include/asm/acpi.h
index aa2c39d..39b8348 100644
--- a/arch/x86/include/asm/acpi.h
+++ b/arch/x86/include/asm/acpi.h
@@ -123,6 +123,9 @@ extern void acpi_reserve_wakeup_memory(void);
*/
static inline unsigned int acpi_processor_cstate_check(unsigned int max_cstate)
{
+
+ pr_err("%s: enter\n", __func__);
+
/*
* Early models (<=5) of AMD Opterons are not supposed to go into
* C2 state.
@@ -134,10 +137,14 @@ static inline unsigned int acpi_processor_cstate_check(unsigned int max_cstate)
boot_cpu_data.x86_model <= 0x05 &&
boot_cpu_data.x86_mask < 0x0A)
return 1;
- else if (boot_cpu_has(X86_FEATURE_AMDC1E))
+ else if (boot_cpu_has(X86_FEATURE_AMDC1E)) {
+ pr_err("%s: C1E\n", __func__);
return 1;
- else
+ }
+ else {
+ pr_err("%s: max_cstate: %d\n", __func__, max_cstate);
return max_cstate;
+ }
}

static inline bool arch_has_acpi_pdc(void)
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index cfe109d..116c8bd 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -573,6 +573,18 @@ static void c1e_idle(void)
if (need_resched())
return;

+ if (!boot_cpu_has(X86_FEATURE_AMDC1E)) {
+ u32 lo, hi;
+
+ rdmsr(MSR_K8_INT_PENDING_MSG, lo, hi);
+
+ pr_err("%s: bits 0x%08x\n",
+ __func__, lo & K8_INTP_C1E_ACTIVE_MASK);
+
+ pr_err("%s: cpu: %d, c1e_detected: %d\n",
+ __func__, raw_smp_processor_id(), c1e_detected);
+ }
+
if (!c1e_detected) {
u32 lo, hi;

--
1.7.0



--
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Gemeinde Aschheim, Landkreis Muenchen
Registergericht Muenchen, HRB Nr. 43632

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/