From: Andrey Panin on
On 121, 04 30, 2008 at 12:54:21PM -0300, Glauber Costa wrote:
> Hugh Dickins wrote:
>> On Wed, 30 Apr 2008, J.A. Magall??n wrote:
>>> I have a couple problems with latest git (-14):
>>>
>>> - It only recognises 2 processors out of 4 (dual Xeon HT)
>>> - It oopses on the swapper process just on boot...
>>>
>>> Difference in dmesg is below. If full correct dmesg or config is
>>> needed, please ask for them. The kernel was built copying old
>>> 2.6.25 config to .config && make oldconfig. I filled the missing
>>> gaps like PAT and others...
>>>
>>> -Brought up 4 CPUs
>>> +native_cpu_up: bad cpu 2
>>> +native_cpu_up: bad cpu 3
>>> +Brought up 2 CPUs
>>
>> Yes, I've been getting this for some days too: only 2 processors on
>> dual Xeon HT in 32-bit mode; whereas x86_64 finds all 4 just fine.
>> Ran lots of testing on 2.6.25-mm1 before I noticed it there.
>>
>> I bisected for a while, but it got confusing (arrived at a bisect
>> point which gave only 1 processor: smpboot code getting rearranged),
>> so I was forced (quel horreur!) to investigate properly. I've just
>> now had success with the patch below, please give it a try:
>> but it'll need an Ack from Glauber before it can go in.
>>
>>> +WARNING: at include/linux/blkdev.h:427 blk_queue_init_tags+0x110/0x11f()
>>
>> I presume this warning and backtrace is what you report as an oops:
>> I think you'll find Linus included a fix for this one overnight, and
>> it should have gone away in 2.6.25-git15 (but I didn't see it myself).
>>
>> Hugh
>>
>> [PATCH] x86_32: fix HT cpu booting
>>
>> Since recent smpboot 32/64-bit merge, my dual Xeon with HT has been
>> booting only 2 of its 4 cpus (when running an i386 kernel; but x86_64
>> is okay). J.A. Magall??n reports the same.
>>
>> native_cpu_up: bad cpu 2
>> native_cpu_up: bad cpu 3
>>
>> The mach-default cpu_present_to_apicid() was just returning cpu number
>> (2, 3) instead of apicid (6, 7): looks like we now need the x86_64 code
>> even for the i386 case.
>>
>> Comparing with other versions of cpu_present_to_apicid(), it seems a
>> good idea to include an NR_CPUS test too, since cpu_present() doesn't
>> include that; but that wasn't a problem here, and may no problem at all.
>>
>> One point worth noting - is it a worry? Prior to that smpboot merge,
>> my Xeon booted the two HT siblings on one physical first, then the
>> two siblings on the other physical after - when i386, but alternated
>> them when x86_64. Since the merge, the x86_64 sequence is unchanged,
>> but the i386 sequence is now like x86_64. I prefer this consistency,
>> and I prefer the new sequence: booting with maxcpus=2 then uses the
>> independent physicals without HT sharing; but surprises in store?
>>
>> Signed-off-by: Hugh Dickins <hugh(a)veritas.com>
>>
>> --- 2.6.25-git/include/asm-x86/mach-default/mach_apic.h 2008-04-23 07:24:16.000000000 +0100
>> +++ linux/include/asm-x86/mach-default/mach_apic.h 2008-04-30 14:55:14.000000000 +0100
>> @@ -109,13 +109,8 @@ static inline int cpu_to_logical_apicid(
>> static inline int cpu_present_to_apicid(int mps_cpu)
>> {
>> -#ifdef CONFIG_X86_64
>> - if (cpu_present(mps_cpu))
>> + if (mps_cpu < NR_CPUS && cpu_present(mps_cpu))
>> return (int)per_cpu(x86_bios_cpu_apicid, mps_cpu);
>> -#else
>> - if (mps_cpu < get_physical_broadcast())
>> - return mps_cpu;
>> -#endif
>> else
>> return BAD_APICID;
>> }
>
> Hugh, thanks for tracing this. The patch looks sane to me
>
> However, since this problem was raised, I'm still concerned that visws may
> have the same problem, since it uses the same logic that i386 mach_default
> used to. (and I did not touched, since x86_64 code went int o
> mach_default).
>
> Could anyone with access to such hardware give it a try ?

Unfortunately, no. Visws port is in non-bootable state currently, 2.6.25 hangs
in calibrate_delay_direct() and I still do not know why :(

--
Andrey Panin | Linux and UNIX system administrator
pazke(a)donpac.ru | PGP key: wwwkeys.pgp.net
From: Thomas Gleixner on
On Mon, 5 May 2008, Andrey Panin wrote:
> > However, since this problem was raised, I'm still concerned that visws may
> > have the same problem, since it uses the same logic that i386 mach_default
> > used to. (and I did not touched, since x86_64 code went int o
> > mach_default).
> >
> > Could anyone with access to such hardware give it a try ?
>
> Unfortunately, no. Visws port is in non-bootable state currently, 2.6.25 hangs
> in calibrate_delay_direct() and I still do not know why :(

What was the last kernel which booted on visws ?

Thanks,
tglx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Jens Axboe on
On Fri, May 02 2008, Nick Piggin wrote:
> On Thu, May 01, 2008 at 01:11:51PM +0100, Hugh Dickins wrote:
> > On Thu, 1 May 2008, J.A. Magall�n wrote:
> > > On Wed, 30 Apr 2008 16:17:46 +0100 (BST), Hugh Dickins <hugh(a)veritas.com> wrote:
> > > > On Wed, 30 Apr 2008, J.A. Magall�n wrote:
> > > > >
> > > > > I have a couple problems with latest git (-14):
> > > > >
> > > > > - It only recognises 2 processors out of 4 (dual Xeon HT)
> > > > > - It oopses on the swapper process just on boot...
> > > > >
> > > > > Difference in dmesg is below. If full correct dmesg or config is
> > > > > needed, please ask for them. The kernel was built copying old
> > > > > 2.6.25 config to .config && make oldconfig. I filled the missing
> > > > > gaps like PAT and others...
> > ...
> > > > > +WARNING: at include/linux/blkdev.h:427 blk_queue_init_tags+0x110/0x11f()
> > > >
> > > > I presume this warning and backtrace is what you report as an oops:
> > > > I think you'll find Linus included a fix for this one overnight, and
> > > > it should have gone away in 2.6.25-git15 (but I didn't see it myself).
> > >
> > > -git16 plus your patch gives me my 4 cpus again.
> >
> > I'm glad to hear this, thanks.
> >
> > > But I still get the warning:
> >
> > But sorry to hear this. That warning has undergone several revisions
> > already: I expect yours is another false positive not to worry about,
> > but it still needs to be fixed. I won't meddle in there, Cc'ed Jens
> > and Nick who will know what's appropriate.
>
> Thanks for the heads up Hugh. I think we're OK at this point because
> we're running in allocation/setup code so there should be no concurrency
> on queue_flags. I remember following this call chain and making this
> conclusion (hopefully correct, Jens?)... however I don't know how I
> concluded that the warning would not fire.

That's USUALLY correct, but not always. If blk_queue_init_tags() is
called for resizing depth, then it's a running queue and we should not
use _unlocked() for that. So basically they can all be _unlocked() due
to lack of concurrency at init time, but not this one:

} else if (q->queue_tags) {
rc = blk_queue_resize_tags(q, depth);
if (rc)
return rc;
queue_flag_set(QUEUE_FLAG_QUEUED, q);
return 0;
} ...

So if a driver ever re-calls blk_queue_init_tags() with a tag map
already set, then it needs to hold the queue lock.

--
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/