From: Mark Knecht on
On Sat, Jul 3, 2010 at 9:13 AM, Tejun Heo <tj(a)kernel.org> wrote:
> Hello,
>
> On 07/03/2010 06:06 PM, Mark Knecht wrote:
>>> Can you please *attach* full logs of a successful boot and several
>>> failing boots?
>>
>> Certainly? Which logs? dmesg or something else?
>
> dmesg output preferably with printk timestamp enabled.
>
> Thanks.
>
> --
> tejun
>

OK, I enable printk timing.

Here are two boots. The first one had /dev/sde come up missing:

mark(a)c2stable ~ $ ls -al /dev/sd*
brw-rw---- 1 root disk 8, 0 Jul 3 2010 /dev/sda
brw-rw---- 1 root disk 8, 1 Jul 3 2010 /dev/sda1
brw-rw---- 1 root disk 8, 2 Jul 3 2010 /dev/sda2
brw-rw---- 1 root disk 8, 3 Jul 3 2010 /dev/sda3
brw-rw---- 1 root disk 8, 4 Jul 3 2010 /dev/sda4
brw-rw---- 1 root disk 8, 5 Jul 3 2010 /dev/sda5
brw-rw---- 1 root disk 8, 6 Jul 3 2010 /dev/sda6
brw-rw---- 1 root disk 8, 16 Jul 3 2010 /dev/sdb
brw-rw---- 1 root disk 8, 17 Jul 3 2010 /dev/sdb1
brw-rw---- 1 root disk 8, 18 Jul 3 2010 /dev/sdb2
brw-rw---- 1 root disk 8, 19 Jul 3 2010 /dev/sdb3
brw-rw---- 1 root disk 8, 20 Jul 3 2010 /dev/sdb4
brw-rw---- 1 root disk 8, 21 Jul 3 2010 /dev/sdb5
brw-rw---- 1 root disk 8, 22 Jul 3 2010 /dev/sdb6
brw-rw---- 1 root disk 8, 32 Jul 3 2010 /dev/sdc
brw-rw---- 1 root disk 8, 33 Jul 3 2010 /dev/sdc1
brw-rw---- 1 root disk 8, 34 Jul 3 2010 /dev/sdc2
brw-rw---- 1 root disk 8, 35 Jul 3 2010 /dev/sdc3
brw-rw---- 1 root disk 8, 36 Jul 3 2010 /dev/sdc4
brw-rw---- 1 root disk 8, 37 Jul 3 2010 /dev/sdc5
brw-rw---- 1 root disk 8, 38 Jul 3 2010 /dev/sdc6
brw-rw---- 1 root disk 8, 48 Jul 3 2010 /dev/sdd
brw-rw---- 1 root disk 8, 49 Jul 3 2010 /dev/sdd1
mark(a)c2stable ~ $

I then did two warm boots and got the same problem so I shut down
completely and did a cold boot which worked:

mark(a)c2stable ~ $ ls -al /dev/sd*
brw-rw---- 1 root disk 8, 0 Jul 3 2010 /dev/sda
brw-rw---- 1 root disk 8, 1 Jul 3 2010 /dev/sda1
brw-rw---- 1 root disk 8, 2 Jul 3 2010 /dev/sda2
brw-rw---- 1 root disk 8, 3 Jul 3 2010 /dev/sda3
brw-rw---- 1 root disk 8, 4 Jul 3 2010 /dev/sda4
brw-rw---- 1 root disk 8, 5 Jul 3 2010 /dev/sda5
brw-rw---- 1 root disk 8, 6 Jul 3 2010 /dev/sda6
brw-rw---- 1 root disk 8, 16 Jul 3 2010 /dev/sdb
brw-rw---- 1 root disk 8, 17 Jul 3 2010 /dev/sdb1
brw-rw---- 1 root disk 8, 18 Jul 3 2010 /dev/sdb2
brw-rw---- 1 root disk 8, 19 Jul 3 2010 /dev/sdb3
brw-rw---- 1 root disk 8, 20 Jul 3 2010 /dev/sdb4
brw-rw---- 1 root disk 8, 21 Jul 3 2010 /dev/sdb5
brw-rw---- 1 root disk 8, 22 Jul 3 2010 /dev/sdb6
brw-rw---- 1 root disk 8, 32 Jul 3 2010 /dev/sdc
brw-rw---- 1 root disk 8, 33 Jul 3 2010 /dev/sdc1
brw-rw---- 1 root disk 8, 34 Jul 3 2010 /dev/sdc2
brw-rw---- 1 root disk 8, 35 Jul 3 2010 /dev/sdc3
brw-rw---- 1 root disk 8, 36 Jul 3 2010 /dev/sdc4
brw-rw---- 1 root disk 8, 37 Jul 3 2010 /dev/sdc5
brw-rw---- 1 root disk 8, 38 Jul 3 2010 /dev/sdc6
brw-rw---- 1 root disk 8, 48 Jul 3 2010 /dev/sdd
brw-rw---- 1 root disk 8, 49 Jul 3 2010 /dev/sdd1
brw-rw---- 1 root disk 8, 64 Jul 3 2010 /dev/sde
brw-rw---- 1 root disk 8, 65 Jul 3 2010 /dev/sde1
mark(a)c2stable ~ $

Let me know what else you might need.

Thanks!

Cheers,
Mark
From: Stan Hoeppner on
Mark Knecht put forth on 7/3/2010 11:06 AM:

>>> I have a newish machine - maybe 3 months old - which unreliably
>>> finds its disk drives at each boot. Probably 40% of the time booting 1
>>> or more drives will be missing.

Please provide the make/model of the PC. If it's whitebox or DIY please
provide make/model of PSU, mobo and CPU. How many USB peripherals are powered
by the PC? Are you powering a water cooling loop pump from the PC's power
supply? Is this PC in a temperature controlled environment (A/C)?

--
Stan
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Mark Knecht on
On Sat, Jul 3, 2010 at 11:56 AM, Stan Hoeppner <stan(a)hardwarefreak.com> wrote:
> Mark Knecht put forth on 7/3/2010 11:06 AM:
>
>>>>    I have a newish machine - maybe 3 months old - which unreliably
>>>> finds its disk drives at each boot. Probably 40% of the time booting 1
>>>> or more drives will be missing.
>
> Please provide the make/model of the PC.  If it's whitebox or DIY please
> provide make/model of PSU, mobo and CPU.  How many USB peripherals are powered
> by the PC?  Are you powering a water cooling loop pump from the PC's power
> supply?  Is this PC in a temperature controlled environment (A/C)?
>
> --
> Stan
>

Build it myself.

Asus Rampage II Extreme motherboard
12GB Crucial DRAM currently installed (Holds 24GB)
Intel Core i7-980X CPU @ 3.33Ghz
Palit nVidia 9500GT-based graphics card
Sony Nec Optiarc AD-7241S-0B 24X Dual Layer DVD+/-RW SATA Drive
(5x) WD5002ABYS RE3 Enterprise Class 500GB hard drives

No external devices other than monitor, mouse, keyboard and the USB
interface to the UPS are attached. No USB, 1394 or eSATA are attached
at this time.

It's all powered by:

Corsair CMPSU-750TX 750-Watt TX Series 80 Plus Certified Power Supply

Air cooled using the stock Intel fan that came with the processor and
sitting in a home office environment.

The machine draws (steady state) about 250-275W according to both the
UPS it's hooked to as well as my trusty Kill-a-Watt. What it might
draw transient at power on while drives are spinning up I wouldn't
hazard a guess but it does seem to be well below the rating of the
supply. The PSU actually has something like 8 or 10 SATA power
connections, not that that means anything. I'm using 6. (1 CDRW, 5
drives)

Note two things:

1) All the drives are always reported by BIOS at boot time. Now, that
doesn't guarantee that the drives spin up. It may only mean they can
be read by BIOS, but they are there as far as I can tell. They show up
in the boot screens and in BIOS itself if I drop in to play with
settings.

2) Whatever state the machine comes up in - drives recognized or not -
it will run forever in that state under some pretty heavy loads so it
isn't like the PSU can't completely do the job. It could possibly be
marginal though.

QUESTION: There are some settings in BIOS for delaying the drive. (Or
something. I'm using the machine and not in BIOS) There were settings
from 0 to 35 seconds if I remember correctly. Possibly I should try
setting each drive to a different value to different value to stagger
power up?

If you need more info or have other ideas please let me know.

Thanks,
Mark
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Stan Hoeppner on
Mark Knecht put forth on 7/3/2010 2:21 PM:

> Note two things:
>
> 1) All the drives are always reported by BIOS at boot time. Now, that
> doesn't guarantee that the drives spin up. It may only mean they can
> be read by BIOS, but they are there as far as I can tell. They show up
> in the boot screens and in BIOS itself if I drop in to play with
> settings.

I missed that. I thought I read it was both. My bad.

> QUESTION: There are some settings in BIOS for delaying the drive. (Or
> something. I'm using the machine and not in BIOS) There were settings
> from 0 to 35 seconds if I remember correctly. Possibly I should try
> setting each drive to a different value to different value to stagger
> power up?

If that PSU meets published specs you shouldn't need delayed spin up with
those 5 drives.

> If you need more info or have other ideas please let me know.

Your answers here should have pretty much eliminated hardware issues as the
cause, unless that particular mobo has BIOS or other issues I'm unaware of.

I've found it's always best to ask about hardware with this kind of report
just to eliminate possibilities. All that gear is good quality stuff. If the
problem is due to hardware, it's because one of your components is defective,
but we don't see evidence of that at this point.

Also, TTBOMK, if a SATA drive motor doesn't spin up, the drive firmware won't
report the drive as ready upstream, thus the BIOS won't list the drive.

--
Stan


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Mark Knecht on
On Sat, Jul 3, 2010 at 12:42 PM, Stan Hoeppner <stan(a)hardwarefreak.com> wrote:
> Mark Knecht put forth on 7/3/2010 2:21 PM:
>
>> Note two things:
>>
>> 1) All the drives are always reported by BIOS at boot time. Now, that
>> doesn't guarantee that the drives spin up. It may only mean they can
>> be read by BIOS, but they are there as far as I can tell. They show up
>> in the boot screens and in BIOS itself if I drop in to play with
>> settings.
>
> I missed that.  I thought I read it was both.  My bad.
>

Not a problem. It's good to be as clear as possible for all involved.

>> QUESTION: There are some settings in BIOS for delaying the drive. (Or
>> something. I'm using the machine and not in BIOS) There were settings
>> from 0 to 35 seconds if I remember correctly. Possibly I should try
>> setting each drive to a different value to different value to stagger
>> power up?
>
> If that PSU meets published specs you shouldn't need delayed spin up with
> those 5 drives.
>

I've not dropped into BIOS yet as the machine is in use but from the
Asus manual it appears the delay is not on a drive by drive basis so I
don't think I can do much there.

>> If you need more info or have other ideas please let me know.
>
> Your answers here should have pretty much eliminated hardware issues as the
> cause, unless that particular mobo has BIOS or other issues I'm unaware of.
>
> I've found it's always best to ask about hardware with this kind of report
> just to eliminate possibilities.  All that gear is good quality stuff.  If the
> problem is due to hardware, it's because one of your components is defective,
> but we don't see evidence of that at this point.
>
> Also, TTBOMK, if a SATA drive motor doesn't spin up, the drive firmware won't
> report the drive as ready upstream, thus the BIOS won't list the drive.

An off-list response suggested possibly setting some drive jumpers on
non-boot drives to power up in standby. Apparently the kernel will
then spin up those drives later? If I cannot stagger the drives in
BIOS then I will likely try that. Technically I guess I only need
/boot on sda to get the kernel booted. The mdadm RAID1 on sda/sdb/sdc
could start slightly later, and technically the RAID0 on sdd/sde could
start very late as there are only VMWare images on that drive.

Cheers,
Mark
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/