From: Bjorn Helgaas on
On Sunday, August 01, 2010 08:31:02 pm Donald Parsons wrote:
> 2.6.35 still fails to boot for me, as first reported here:
> http://lkml.indiana.edu/hypermail/linux/kernel/1007.3/01144.html
>
> I've manually bisected it down to around May 20 between
> 2.6.34-git4 (boots) and 2.6.34-git5 (boot fails)
> Also -git[23] boot, and -git8, -rc[126], rc6-git[136] all fail.
>
> Unfortunately first time I tried was with 2.6.35-rc6 and
> it failed to boot.
>
> Failure when switching from initramfs to real /root?
> Removing kernel "quiet" param appears to show several
> lines listing:
>
> usb drives/hubs? followed by
> dracut switching root (when booting works)
> or
> usb drives/hubs? followed by
> (missing dracut... line)
> No root device found
> Boot has failed, sleeping forever. (when it does not boot)
>
> Grub, typical entry:
> title Fedora (2.6.35)
> root (hd0,0)
> kernel /vmlinuz-2.6.35 ro
> root=UUID=686dc496-8814-4c36-8fb7-5ded2916e825 rhgb
> SYSFONT=latarcyrheb-sun16 LANG=en_US.UTF-8 KEYTABLE=us
> rdblacklist=nouveau init=/sbin/bootchartd
> initrd /initramfs-2.6.35.img
>
>
> My boot failure seems to be different than other two reported
> in the thread "2.6.35-rc6-git6: Reported regressions from 2.6.34"
> under Bug #16173 and #16228

Will it boot with the "pci=nocrs" option? If so, please open a
report at https://bugzilla.kernel.org, mark it a regression, assign
it to me, and attach the complete dmesg log. And please respond to
this thread with a pointer to the bugzilla.

Otherwise, a complete console log should have a clue. The best
thing would be a log from a serial console or netconsole, with
"ignore_loglevel".

Thanks a lot for your report!

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Randy Dunlap on
On Sun, 01 Aug 2010 22:31:02 -0400 Donald Parsons wrote:

> 2.6.35 still fails to boot for me, as first reported here:
> http://lkml.indiana.edu/hypermail/linux/kernel/1007.3/01144.html
>
> I've manually bisected it down to around May 20 between
> 2.6.34-git4 (boots) and 2.6.34-git5 (boot fails)
> Also -git[23] boot, and -git8, -rc[126], rc6-git[136] all fail.
>
> Unfortunately first time I tried was with 2.6.35-rc6 and
> it failed to boot.
>
> Failure when switching from initramfs to real /root?
> Removing kernel "quiet" param appears to show several
> lines listing:
>
> usb drives/hubs? followed by
> dracut switching root (when booting works)
> or
> usb drives/hubs? followed by
> (missing dracut... line)
> No root device found
> Boot has failed, sleeping forever. (when it does not boot)
>
> Grub, typical entry:
> title Fedora (2.6.35)
> root (hd0,0)
> kernel /vmlinuz-2.6.35 ro
> root=UUID=686dc496-8814-4c36-8fb7-5ded2916e825 rhgb
> SYSFONT=latarcyrheb-sun16 LANG=en_US.UTF-8 KEYTABLE=us
> rdblacklist=nouveau init=/sbin/bootchartd
> initrd /initramfs-2.6.35.img
>
>
> My boot failure seems to be different than other two reported
> in the thread "2.6.35-rc6-git6: Reported regressions from 2.6.34"
> under Bug #16173 and #16228
> http://lkml.indiana.edu/hypermail/linux/kernel/1008.0/00080.html
>
> System is up to date Fedora 12 on Asus P5B Deluxe, Core2 6600 2.4GHz
>
> 00:1f.2 SATA controller: Intel Corporation 82801HR/HO/HH (ICH8R/DO/DH)
> 6 port SATA AHCI Controller (rev 02)
> 01:00.0 VGA compatible controller: nVidia Corporation G70 [GeForce 7600
> GT] (rev a1)
> 03:00.0 SATA controller: JMicron Technologies, Inc. 20360/20363 Serial
> ATA Controller (rev 02)
>
> Using 2.6.34.1 shows
> # lsmod | grep ata
> ata_generic 3427 0
> pata_acpi 3227 0
> pata_jmicron 2547 0
> libata 157450 4 ata_generic,pata_acpi,pata_jmicron,ahci
> scsi_mod 147895 5 sg,sd_mod,sr_mod,usb_storage,libata
>
> The .config's were made from 2.6.34.1/.config using oldconfig and enter
> key (defaults for any questions).

Please post the 2.6.35 .config file.


> Updating BIOS from 1232 to 1238 (latest) gave no change.
> Tried gcc's 4.4.4, 4.5.0, and 4.5.1 with no change.
>
> Thanks for any help,
> Don

---
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Donald Parsons on
On Sun, 2010-08-01 at 21:38 -0600, Bjorn Helgaas wrote:
> On Sunday, August 01, 2010 08:31:02 pm Donald Parsons wrote:
> > 2.6.35 still fails to boot for me, as first reported here:
> > http://lkml.indiana.edu/hypermail/linux/kernel/1007.3/01144.html
> >
> > I've manually bisected it down to around May 20 between
> > 2.6.34-git4 (boots) and 2.6.34-git5 (boot fails)
> > Also -git[23] boot, and -git8, -rc[126], rc6-git[136] all fail.
> >
> > Unfortunately first time I tried was with 2.6.35-rc6 and
> > it failed to boot.
> >
> > Failure when switching from initramfs to real /root?
> > Removing kernel "quiet" param appears to show several
> > lines listing:
> >
> > usb drives/hubs? followed by
> > dracut switching root (when booting works)
> > or
> > usb drives/hubs? followed by
> > (missing dracut... line)
> > No root device found
> > Boot has failed, sleeping forever. (when it does not boot)
> >
> > Grub, typical entry:
> > title Fedora (2.6.35)
> > root (hd0,0)
> > kernel /vmlinuz-2.6.35 ro
> > root=UUID=686dc496-8814-4c36-8fb7-5ded2916e825 rhgb
> > SYSFONT=latarcyrheb-sun16 LANG=en_US.UTF-8 KEYTABLE=us
> > rdblacklist=nouveau init=/sbin/bootchartd
> > initrd /initramfs-2.6.35.img
> >
> >
> > My boot failure seems to be different than other two reported
> > in the thread "2.6.35-rc6-git6: Reported regressions from 2.6.34"
> > under Bug #16173 and #16228
>
> Will it boot with the "pci=nocrs" option? If so, please open a

No, I tried this on a few attempts when I saw it mentioned under
bug #16228. But it had no effect/benefit. Sorry, I should have
mentioned this.

> report at https://bugzilla.kernel.org, mark it a regression, assign
> it to me, and attach the complete dmesg log. And please respond to
> this thread with a pointer to the bugzilla.
>
> Otherwise, a complete console log should have a clue. The best
> thing would be a log from a serial console or netconsole, with
> "ignore_loglevel".

Maybe I will try netconsole tomorrow. But is Ethernet up when
this boot failure happens? I think not, since initramfs should
not need networking.

Should I try building sata driver into kernel? Oh, I am using
ext3, and fdisk -l shows:

Disk /dev/sda: 320.1 GB, 320072933376 bytes
Device Boot Start End Blocks Id System
/dev/sda1 * 1 50 401593+ 83 Linux (/boot)
/dev/sda2 51 7051 56235532+ 83 Linux (/)
/dev/sda3 7052 8377 10651095 82 Linux swap
/dev/sda4 8378 38913 245280420 83 Linux ()
Disk /dev/sdb: 750.2 GB, 750156374016 bytes
/dev/sdb1 1 91201 732572001 83 Linux (/home)

> Thanks a lot for your report!
>
> Bjorn

And thanks for your interest!
Don



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Dave Chinner on
On Sun, Aug 01, 2010 at 07:50:02PM -0700, Linus Torvalds wrote:
> On Sun, Aug 1, 2010 at 7:33 PM, Dave Chinner <david(a)fromorbit.com> wrote:
> >
> > There hasn't been nearly enough review or testing of this patch
> > series yet. ´┐ŻBefore a merge, it needs to be split up in smaller,
> > more digestable chunks for more comprehensive review, regression
> > testing and behavioural analysis.
>
> I dunno. We merge _way_ scarier things in the VM and the block layer,
> for much less actual upside, and with less review.

Scary stuff outside of direct VFS/FS interfaces is generally hidden
from me by my +6 Blinkers of Blissful Ignorance. I make the
assumption that the experts involved know the risks and have weighed
them up appropriately. ;)

> The RCU pathname lookup has some rather impressive performance
> upsides, and I agree that it would be good to get a lot of review and
> testing, but the latter isn't going to happen without it being
> mainlined, and the former is sadly lacking. The person I'd like most
> to review it is Al,

Most definitely.

> but anybody in the filesystem world should
> basically see it as a #1 priority,

Agreed - I've actually looked at every patch, commented on some
of the more questionable things, got quoted by LWN for saying that
it "fell off the locking cliff", have run benchmarks on it and sent
patches fixing bugs back to Nick.

It's just really hard to digest it all in one lump and core VFS
changes on this scale scare me....

> because unlike all the masturbatory
> patches like xstat() that add new functionality that nobody will
> likely ever use, Nick's patchseries improves on the thing that
> everybody uses heavily every day without even thinking about it.
>
> Is it tough to review? Yes. It's core code, not just some random
> addition that adds a new feature and doesn't impact any old code. But
> that's also the thing that makes it meaningful, and makes me think it
> should get merged _much_ more eagerly than most code we ever see.

I agree with you for the pure locking changes.

But for the bits that change writeback, LRU ordering and reclaim
calculations the benefits are not quite so obvious, nor is the
correctness of the code/behaviour quite so provably correct. Maybe
I'm being a bit too paranoid, but generally it pays to be a bit
conservative as a filesystem developer because the cost of screwing
up can be pretty high...

Cheers,

Dave.
--
Dave Chinner
david(a)fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Nick Piggin on
On Mon, Aug 02, 2010 at 03:58:34PM +1000, Dave Chinner wrote:
> On Sun, Aug 01, 2010 at 07:50:02PM -0700, Linus Torvalds wrote:
> > On Sun, Aug 1, 2010 at 7:33 PM, Dave Chinner <david(a)fromorbit.com> wrote:
> > >
> > > There hasn't been nearly enough review or testing of this patch
> > > series yet. ´┐ŻBefore a merge, it needs to be split up in smaller,
> > > more digestable chunks for more comprehensive review, regression
> > > testing and behavioural analysis.

BTW. it has in fact had quite a bit of testing in earlier form in the
-rt tree for a long time, and several fixes come from there. And good
performance results there too.


> > I dunno. We merge _way_ scarier things in the VM and the block layer,
> > for much less actual upside, and with less review.
>
> Scary stuff outside of direct VFS/FS interfaces is generally hidden
> from me by my +6 Blinkers of Blissful Ignorance. I make the
> assumption that the experts involved know the risks and have weighed
> them up appropriately. ;)
>
> > The RCU pathname lookup has some rather impressive performance
> > upsides, and I agree that it would be good to get a lot of review and
> > testing, but the latter isn't going to happen without it being
> > mainlined, and the former is sadly lacking. The person I'd like most
> > to review it is Al,
>
> Most definitely.

I hate to say but I would like to see it mature for another release. It
should also clash a bit with Al's recent inode work that he'll want to
push.

What I can do is send some of the ground work patches this time around,
put the tree into linux-next, and put reviewers on notice.

I think it is all conceptually sound, but it will inevitably have some
bugs left to shake out, and things to be fixed on the review side. I
don't anticipate a problem that could not be fixed in the release cycle,
but I think aiming for post 2.6.36 is a bit fairer for vfs guys,
honestly. LSF is next week too, so most of them will be busy with travel
and such. But I do hope to discuss the vfs-scale patches there.


> > but anybody in the filesystem world should
> > basically see it as a #1 priority,
>
> Agreed - I've actually looked at every patch, commented on some
> of the more questionable things, got quoted by LWN for saying that
> it "fell off the locking cliff", have run benchmarks on it and sent
> patches fixing bugs back to Nick.
>
> It's just really hard to digest it all in one lump and core VFS
> changes on this scale scare me....

For filesystems developers, the dcache and inode locking changes
should be more or less just following simple steps as shown in the
patch series. If they're not abusing dcache_lock (and most except
autofs4 are not), then it should not be a big deal.

There are a couple of locking constraints changed at the API level,
but I didn't run into any problems there yet. It should be all
documented in Documentation/filesystems/* although I need to run a
few more passes over the series to ensure I caught everything.


> > because unlike all the masturbatory
> > patches like xstat() that add new functionality that nobody will
> > likely ever use, Nick's patchseries improves on the thing that
> > everybody uses heavily every day without even thinking about it.
> >
> > Is it tough to review? Yes. It's core code, not just some random
> > addition that adds a new feature and doesn't impact any old code. But
> > that's also the thing that makes it meaningful, and makes me think it
> > should get merged _much_ more eagerly than most code we ever see.
>
> I agree with you for the pure locking changes.
>
> But for the bits that change writeback, LRU ordering and reclaim
> calculations the benefits are not quite so obvious, nor is the
> correctness of the code/behaviour quite so provably correct. Maybe
> I'm being a bit too paranoid, but generally it pays to be a bit
> conservative as a filesystem developer because the cost of screwing
> up can be pretty high...

Writeback shouldn't be changed. LRU ordering is changed for 2
reasons. Firstly, to make things per-zone instead of global. This
basically fits our whole reclaim model much better, although it
will inevitably cause some random little changes but I think it
is agreed this is a good thing (memory shortage in one zone or
node does not require global shrinkings, NUMA level parallelism
of reclaim.)

The other thing is converting the last few dcache refcounting, and
all of inode refcounting over to this "lazy LRU" model. This can
have a bigger impact, but it really reduces locking on the per-zone
lists, so it definitely helps speed and scalability of non-reclaim
fastpaths. I'm up for changing this if numbers show it hurts, it
would be rather easy to do, but in comparison to the overall
patchset, it would rate as a minor tweak :)


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/