[PATCH] Fix NULL pointer for Xen guests [Kernel]

Prev: linux-next: manual merge of the staging-next tree with the net tree
Next: [PATCHv9 2.6.34-rc5 5/5] mx5: Add USB to Freescale MX51 defconfig

From: Prarit Bhargava on 27 Apr 2010 14:50

On 04/27/2010 02:34 PM, Konrad Rzeszutek Wilk wrote:
>>> Can you provide a short example of test scenario? As in what I should do
>>> to reproduce this problem?
>>>
>>>
>> Take the latest upstream (well ... to be honest, a bit older than that
>> because of some other bugs) -- take 2.6.33 and try to boot it as a PV
>>
> 2.6.34-rc5 PV boots under Xen for me (and pretty much since 2.6.33 +
> Suresh fix for the CONFIG_RODATA_MARK).
>
> Perhaps I am missing some of the .config options you have set that make it not work?
>
> The irqbalance daemon looks to be running - but I think you are hitting
> this during bootup? How long do you have to wait for this to trigger?
>
>

It happens during bootup. I don't have a 2.6.33 vanilla panic handy
but I do have one from an earlier 2.6.32...

rip: ffffffff81256f45 delay_tsc+0x45

rsp: ffff8800fac95a98

rax: fffffffff6ef46d0 rbx: 00000002 rcx: f6ef46d0 rdx: 0010850c

rsi: 002b3bb6 rdi: 002b3bcc rbp: ffff8800fac95ab8

r8: ffffffff r9: 00000002 r10: 00000002 r11: 00000000

r12: fffffffff6dec1c4 r13: 00000002 r14: 002b3bcc r15: 00000001

cs: 0000e033 ds: 00000000 fs: 00000000 gs: 00000000

Stack:

000000000002ef45 ffff8800fac95c88 0000000000000009 ffff8800fac93540

ffff8800fac95ac8 ffffffff81256ef6 ffff8800fac95b48 ffffffff814c6341

0000000000000010 ffff8800fac95b38 ffff880000000008 ffff8800fac95b58

ffff8800fac95b08 a22d306b065d4a66 0000000000000000 0000000000000000

Code:

f3 90 65 8b 1c 25 d8 e3 00 00 44 39 eb 75 23 66 66 90 0f ae e8<e8> 46 3d dc ff
66 90 48 98 48 89

Call Trace:

[<ffffffff81256f45>] delay_tsc+0x45<--

[<ffffffff81256ef6>] __const_udelay+0x46

[<ffffffff814c6341>] panic+0x135

[<ffffffff814ca23c>] oops_end+0xdc

[<ffffffff81042272>] no_context+0xf2

[<ffffffff8125946c>] __bitmap_weight+0x8c

[<ffffffff81042505>] __bad_area_nosemaphore+0x125

[<ffffffff8105fad4>] find_busiest_group+0x254

[<ffffffff810425d3>] bad_area_nosemaphore+0x13

[<ffffffff814cbccf>] do_page_fault+0x2ef

[<ffffffff814c9595>] page_fault+0x25

[<ffffffff810302f2>] irq_force_complete_move+0x12

[<ffffffff81015214>] fixup_irqs+0xa4

[<ffffffff8102ce59>] cpu_disable_common+0x1a9

[<ffffffff8100f9c2>] check_events+0x12

[<ffffffff810c2550>] __stop_machine+0x120

[<ffffffff8100ff75>] xen_cpu_disable+0x25

[<ffffffff814b0427>] take_cpu_down+0x17

[<ffffffff810c25f9>] stop_cpu+0xa9

[<ffffffff8108869d>] worker_thread+0x16d

[<ffffffff8100f19d>] xen_force_evtchn_callback+0xd

[<ffffffff8108dd00>] wake_up_bit+0x40

[<ffffffff814c90f6>] _spin_unlock_irqrestore+0x16

[<ffffffff81088530>] create_workqueue_thread+0xd0

[<ffffffff8108d9a6>] kthread+0x96

[<ffffffff8101418a>] child_rip+0xa

[<ffffffff81013351>] int_ret_from_sys_call+0x7

[<ffffffff81013add>] retint_restore_args+0x5

[<ffffffff81014180>] kernel_thread+0xe0

> How many CPUs did you assign to your guest?
>
>

It didn't matter as long as vcpus >1 and maxcpus > vcpus.

> What are the "other bugs" you speak off?
>

I got a different panic (which I've yet to resolve).

>
>> guest. I'm using a RHEL5 Xen HV fwiw ...
>>
> OK, so your control domain is RHEL5. Mine is the Jeremy's xen/next one
> (2.6.32). Let me try to compile RHEL5 under FC11 - any tricks necessary
> to do that?
>

I haven't tried it -- it might work :)

Also, did you try booting with maxvcpus > vcpus as drjones suggested ?

P.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Greg KH on 28 Apr 2010 15:20

On Wed, Apr 28, 2010 at 11:50:39AM -0700, Andrew Morton wrote:
> I worry that if the -stable maintainer see me drop a patch, but the
> patch in Linus's tree doesn't have the stable tag, they might not merge
> the fix into -stable. I bugged them about this scenario recently and
> the reply was a bit waffly ;)

It was?

I try my best, that if I see you drop a patch, to go dig through Linus's
tree to find if it landed there. If not, I leave it in my queue, and do
that for a few releases. If after a long time (like 6 months) I either
ping someone, or just drop it from my queue as I guessed that someone
dropped it for some reason.

If I miss one of these, please let me know.

> By far the safest thing to do is to include the stable tag in your
> changelog right at the outset.

Yes, that's the _easiest_ and will not get lost.

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Konrad Rzeszutek Wilk on 3 May 2010 15:20

>> OK, so your control domain is RHEL5. Mine is the Jeremy's xen/next one
>> (2.6.32). Let me try to compile RHEL5 under FC11 - any tricks necessary
>> to do that?
>>
>
> I haven't tried it -- it might work :)
>
> Also, did you try booting with maxvcpus > vcpus as drjones suggested ?

Yes. No luck reproducing the crash/panic. I am just not seeing the failure you
guys are seeing.

Let me build once more 2.6.33 vanilla + CONFIG_DEBUG_MARK_RODATA=n) and check
this. And also install a vanilla RHEL5 dom0 as it looks impossible to
compile a 2.6.18-era kernel under FC11.

The Xen I am using is xen-unstable - so 4.0.1. I know that the IRQ balance
code in the Xen hypervisor was fixed in 4.0 (it used to run out of
context - now it runs in the IRQ context). Maybe this bug you are seeing
(and have the fix for) is just a red-heering?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Prarit Bhargava on 3 May 2010 16:00

On 05/03/2010 03:16 PM, Konrad Rzeszutek Wilk wrote:
>>> OK, so your control domain is RHEL5. Mine is the Jeremy's xen/next one
>>> (2.6.32). Let me try to compile RHEL5 under FC11 - any tricks necessary
>>> to do that?
>>>
>>>
>> I haven't tried it -- it might work :)
>>
>> Also, did you try booting with maxvcpus> vcpus as drjones suggested ?
>>
> Yes. No luck reproducing the crash/panic. I am just not seeing the failure you
> guys are seeing.
>
> Let me build once more 2.6.33 vanilla + CONFIG_DEBUG_MARK_RODATA=n) and check
> this. And also install a vanilla RHEL5 dom0 as it looks impossible to
> compile a 2.6.18-era kernel under FC11.
>

Let me try reproducing this on FC11 + 2.6.33.

P.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Konrad Rzeszutek Wilk on 4 May 2010 11:10

On Mon, May 03, 2010 at 03:16:34PM -0400, Konrad Rzeszutek Wilk wrote:
> >> OK, so your control domain is RHEL5. Mine is the Jeremy's xen/next one
> >> (2.6.32). Let me try to compile RHEL5 under FC11 - any tricks necessary
> >> to do that?
> >>
> >
> > I haven't tried it -- it might work :)
> >
> > Also, did you try booting with maxvcpus > vcpus as drjones suggested ?
>
> Yes. No luck reproducing the crash/panic. I am just not seeing the failure you
> guys are seeing.
>
> Let me build once more 2.6.33 vanilla + CONFIG_DEBUG_MARK_RODATA=n) and check
> this. And also install a vanilla RHEL5 dom0 as it looks impossible to
> compile a 2.6.18-era kernel under FC11.

Rebuilding everything from scratch did it. I am seeing a similar
failure where xenctx reports:

Call Trace:
[<ffffffff8107f780>] stop_cpu+0xc6 <--
[<ffffffff8105520e>] worker_thread+0x15d
[<ffffffff8107f6ba>] __stop_machine+0x106
[<ffffffff81058afb>] wake_up_bit+0x25
[<ffffffff81038720>] spin_unlock_irqrestore+0x9
[<ffffffff810550b1>] spin_lock_irq+0xb
[<ffffffff810586cb>] kthread+0x7a
[<ffffffff8100a964>] kernel_thread_helper+0x4
[<ffffffff81009d61>] int_ret_from_sys_call+0x7
[<ffffffff814033dd>] retint_restore_args+0x5
[<ffffffff8100a960>] gs_change+0x13

With this guest file:

kernel = "/mnt/lab/vs11/vmlinuz"
ramdisk = "/mnt/lab/vs11/initramfs.cpio.gz"
memory = 2048
maxvcpus = 4
vcpus = 2
vif = [ 'mac=00:0F:4B:00:00:71, bridge=switch' ]
vfb = [ 'vnc=1, vnclisten=0.0.0.0,vncunused=1']
root = "debug loglevel=10 plymouth:splash=solar plymouth:debug norm console=hvc0 initcall_debug"

This is with the latest linux kernel:
d93ac51c7a129db7a1431d859a3ef45a0b1f3fc5 (Merge branch 'for-linus' of
git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client)

With your patch the PV guests keeps on going.

So:

Tested-by: Konrad Rzeszutek Wilk <konrad.wilk(a)oracle.com>
>
> The Xen I am using is xen-unstable - so 4.0.1. I know that the IRQ balance
> code in the Xen hypervisor was fixed in 4.0 (it used to run out of
> context - now it runs in the IRQ context). Maybe this bug you are seeing
> (and have the fix for) is just a red-heering?

Interestingly enough, I couldn't reproduce this on my Intel box, but on
a AMD box with a very wacked TSC (cpu MHz : 2795681.405) I can
reproduce this.

> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo(a)vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

First | Prev | Next | Last
Pages: 1 2 3
Prev: linux-next: manual merge of the staging-next tree with the net tree
Next: [PATCHv9 2.6.34-rc5 5/5] mx5: Add USB to Freescale MX51 defconfig