|
From: Andrey Panin on 5 May 2008 03:30 On 121, 04 30, 2008 at 12:54:21PM -0300, Glauber Costa wrote: > Hugh Dickins wrote: >> On Wed, 30 Apr 2008, J.A. Magall??n wrote: >>> I have a couple problems with latest git (-14): >>> >>> - It only recognises 2 processors out of 4 (dual Xeon HT) >>> - It oopses on the swapper process just on boot... >>> >>> Difference in dmesg is below. If full correct dmesg or config is >>> needed, please ask for them. The kernel was built copying old >>> 2.6.25 config to .config && make oldconfig. I filled the missing >>> gaps like PAT and others... >>> >>> -Brought up 4 CPUs >>> +native_cpu_up: bad cpu 2 >>> +native_cpu_up: bad cpu 3 >>> +Brought up 2 CPUs >> >> Yes, I've been getting this for some days too: only 2 processors on >> dual Xeon HT in 32-bit mode; whereas x86_64 finds all 4 just fine. >> Ran lots of testing on 2.6.25-mm1 before I noticed it there. >> >> I bisected for a while, but it got confusing (arrived at a bisect >> point which gave only 1 processor: smpboot code getting rearranged), >> so I was forced (quel horreur!) to investigate properly. I've just >> now had success with the patch below, please give it a try: >> but it'll need an Ack from Glauber before it can go in. >> >>> +WARNING: at include/linux/blkdev.h:427 blk_queue_init_tags+0x110/0x11f() >> >> I presume this warning and backtrace is what you report as an oops: >> I think you'll find Linus included a fix for this one overnight, and >> it should have gone away in 2.6.25-git15 (but I didn't see it myself). >> >> Hugh >> >> [PATCH] x86_32: fix HT cpu booting >> >> Since recent smpboot 32/64-bit merge, my dual Xeon with HT has been >> booting only 2 of its 4 cpus (when running an i386 kernel; but x86_64 >> is okay). J.A. Magall??n reports the same. >> >> native_cpu_up: bad cpu 2 >> native_cpu_up: bad cpu 3 >> >> The mach-default cpu_present_to_apicid() was just returning cpu number >> (2, 3) instead of apicid (6, 7): looks like we now need the x86_64 code >> even for the i386 case. >> >> Comparing with other versions of cpu_present_to_apicid(), it seems a >> good idea to include an NR_CPUS test too, since cpu_present() doesn't >> include that; but that wasn't a problem here, and may no problem at all. >> >> One point worth noting - is it a worry? Prior to that smpboot merge, >> my Xeon booted the two HT siblings on one physical first, then the >> two siblings on the other physical after - when i386, but alternated >> them when x86_64. Since the merge, the x86_64 sequence is unchanged, >> but the i386 sequence is now like x86_64. I prefer this consistency, >> and I prefer the new sequence: booting with maxcpus=2 then uses the >> independent physicals without HT sharing; but surprises in store? >> >> Signed-off-by: Hugh Dickins <hugh(a)veritas.com> >> >> --- 2.6.25-git/include/asm-x86/mach-default/mach_apic.h 2008-04-23 07:24:16.000000000 +0100 >> +++ linux/include/asm-x86/mach-default/mach_apic.h 2008-04-30 14:55:14.000000000 +0100 >> @@ -109,13 +109,8 @@ static inline int cpu_to_logical_apicid( >> static inline int cpu_present_to_apicid(int mps_cpu) >> { >> -#ifdef CONFIG_X86_64 >> - if (cpu_present(mps_cpu)) >> + if (mps_cpu < NR_CPUS && cpu_present(mps_cpu)) >> return (int)per_cpu(x86_bios_cpu_apicid, mps_cpu); >> -#else >> - if (mps_cpu < get_physical_broadcast()) >> - return mps_cpu; >> -#endif >> else >> return BAD_APICID; >> } > > Hugh, thanks for tracing this. The patch looks sane to me > > However, since this problem was raised, I'm still concerned that visws may > have the same problem, since it uses the same logic that i386 mach_default > used to. (and I did not touched, since x86_64 code went int o > mach_default). > > Could anyone with access to such hardware give it a try ? Unfortunately, no. Visws port is in non-bootable state currently, 2.6.25 hangs in calibrate_delay_direct() and I still do not know why :( -- Andrey Panin | Linux and UNIX system administrator pazke(a)donpac.ru | PGP key: wwwkeys.pgp.net
From: Thomas Gleixner on 5 May 2008 04:00 On Mon, 5 May 2008, Andrey Panin wrote: > > However, since this problem was raised, I'm still concerned that visws may > > have the same problem, since it uses the same logic that i386 mach_default > > used to. (and I did not touched, since x86_64 code went int o > > mach_default). > > > > Could anyone with access to such hardware give it a try ? > > Unfortunately, no. Visws port is in non-bootable state currently, 2.6.25 hangs > in calibrate_delay_direct() and I still do not know why :( What was the last kernel which booted on visws ? Thanks, tglx -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Jens Axboe on 6 May 2008 15:20 On Fri, May 02 2008, Nick Piggin wrote: > On Thu, May 01, 2008 at 01:11:51PM +0100, Hugh Dickins wrote: > > On Thu, 1 May 2008, J.A. Magall�n wrote: > > > On Wed, 30 Apr 2008 16:17:46 +0100 (BST), Hugh Dickins <hugh(a)veritas.com> wrote: > > > > On Wed, 30 Apr 2008, J.A. Magall�n wrote: > > > > > > > > > > I have a couple problems with latest git (-14): > > > > > > > > > > - It only recognises 2 processors out of 4 (dual Xeon HT) > > > > > - It oopses on the swapper process just on boot... > > > > > > > > > > Difference in dmesg is below. If full correct dmesg or config is > > > > > needed, please ask for them. The kernel was built copying old > > > > > 2.6.25 config to .config && make oldconfig. I filled the missing > > > > > gaps like PAT and others... > > ... > > > > > +WARNING: at include/linux/blkdev.h:427 blk_queue_init_tags+0x110/0x11f() > > > > > > > > I presume this warning and backtrace is what you report as an oops: > > > > I think you'll find Linus included a fix for this one overnight, and > > > > it should have gone away in 2.6.25-git15 (but I didn't see it myself). > > > > > > -git16 plus your patch gives me my 4 cpus again. > > > > I'm glad to hear this, thanks. > > > > > But I still get the warning: > > > > But sorry to hear this. That warning has undergone several revisions > > already: I expect yours is another false positive not to worry about, > > but it still needs to be fixed. I won't meddle in there, Cc'ed Jens > > and Nick who will know what's appropriate. > > Thanks for the heads up Hugh. I think we're OK at this point because > we're running in allocation/setup code so there should be no concurrency > on queue_flags. I remember following this call chain and making this > conclusion (hopefully correct, Jens?)... however I don't know how I > concluded that the warning would not fire. That's USUALLY correct, but not always. If blk_queue_init_tags() is called for resizing depth, then it's a running queue and we should not use _unlocked() for that. So basically they can all be _unlocked() due to lack of concurrency at init time, but not this one: } else if (q->queue_tags) { rc = blk_queue_resize_tags(q, depth); if (rc) return rc; queue_flag_set(QUEUE_FLAG_QUEUED, q); return 0; } ... So if a driver ever re-calls blk_queue_init_tags() with a tag map already set, then it needs to hold the queue lock. -- Jens Axboe -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
|
Pages: 1 Prev: Ability to limit or disable page caching? Next: core_pattern pipe documentation - draft 2 |