|
From: Stephen Tait on 18 Aug 2005 11:00 I'm just in the process of setting up a Sarge server to be used as a sort of backup server. The main PATA discs are used to boot the OS offof software RAID1, with the rest of the disc space used in JBOD for not-so-important backups. However, I'm having problems getting the new disc array up and running. We've put a SATA controller in the box, a cheap-as-chips PCI Adaptec 1210SA which, according to lspci, uses the SIlicon Image SI3112 chipset to provide two SATA channels. Connected to this are two 320GB drives which I want to turn into a RAID1 array. When the system booted first, I used mdadm to create the RAID1 array md2 (mdadm --create /dev/md2 --level=1 --raid-disks=2 /dev/sda1 /dev/sdb1), checked /proc/mdstat to wait for the array to finish syncing, and then formatted it ext3 and mounted it. Everything seemed to work fine until I rebooted, whereupon the mount failed with the report that it wasn't a valid ext[2|3] superblock; fsck confirmed this and on further inspection it seemed that it wasn't a RAID device any more either. I thought this may have been due to the kernel trying to mount the drives before the needed modules (as far as I can tell, libata, scsi_mod and sata_sil) had been loaded, as I'm using the stock debian 2.6.8-k7-smp kernel image. So I tried making a custom initrd with the needed modules in it, namely: pika(a)zaphod2:~$ cat /etc/mkinitrd/modules # /etc/mkinitrd/modules: Kernel modules to load for initrd. # # This file should contain the names of kernel modules and their arguments # (if any) that are needed to mount the root file system, one per line. # Comments begin with a `#', and everything on the line after them are ignored. # # You must run mkinitrd(8) to effect this change. # # Examples: # # ext2 # wd io=0x300 #First the modules needed to init the discs ide_core ide_generic amd74xx scsi_mod libata sr_mod sd_mod dm_mod sata_sil md raid1 #Filesystems ext2 ext3 #Other stuff I'm not sure if we need shpchp pciehp pci_hotplug =============================================== ....and booted with that instead after editing GRUB's menu.lst. The exact same error occurred, and I'm now at a bit of a loss to explain what's happening. If I try and mount the discs on their own (i.e. mount /dev/sdX /mnt/somedir) then they work just fine, so the hardware works fine - so I'm almost certain it's a problem with initting the RAID arrays at boot. At the moment I'm just rebuilding the array to see what happens when I don't try and mount it at boot, but only after the OS has finished booting, but of course that'll only be a temporary workaround. If it's any help, here are my fstab and mdadm.conf's: pika(a)zaphod2:~$ cat /etc/fstab # /etc/fstab: static file system information. # # <file system> <mount point> <type> <options> <dump> <pass> proc /proc proc defaults 0 0 /dev/md1 / ext3 defaults,errors=remount-ro 0 1 /dev/md0 /boot ext2 defaults 0 2 /dev/hdb9 /home ext3 defaults 0 2 /dev/hdb4 /mnt/avj-backup ext3 defaults 0 2 /dev/hda7 /mnt/dcj-backup ext3 defaults 0 2 /dev/hdb8 /tmp ext3 defaults 0 2 /dev/md4 /usr ext3 defaults 0 2 /dev/md3 /var ext3 defaults 0 2 /dev/hdb7 none swap sw 0 0 /dev/hdc /media/cdrom0 iso9660 ro,user,noauto 0 0 #/dev/md2 /mnt/dcj-archive ext3 defaults 0 2 # Dirs from the main server (zaphod) over X-over cable zaphodxover:/home/share/avj /mnt/zaphod/avj nfs ro,hard,intr,bg,rsize=8192,wsize=8192 0 0 zaphodxover:/home/share/dcj /mnt/zaphod/dcj nfs ro,hard,intr,bg,rsize=8192,wsize=8192 0 0 =============================================== pika(a)zaphod2:~$ cat /etc/mdadm/mdadm.conf DEVICE partitions ARRAY /dev/md4 level=raid1 num-devices=2 UUID=b8093124:a6d6f876:a29eecb7:e1b332f3 devices=/dev/hda6,/dev/hdb6 ARRAY /dev/md3 level=raid1 num-devices=2 UUID=1973b0c3:e38869d2:ffef0cde:92048042 devices=/dev/hda5,/dev/hdb5 ARRAY /dev/md2 level=raid1 num-devices=2 UUID=78a3be5a:f0838fe2:4d4ce7ed:3a969954 devices=/dev/sda1,/dev/sdb1 ARRAY /dev/md1 level=raid1 num-devices=2 UUID=51d55d28:3e653dce:631dd682:8dd52a37 devices=/dev/hda2,/dev/hdb2 ARRAY /dev/md0 level=raid1 num-devices=2 UUID=56e09876:a751356e:b86535d0:95091b5b devices=/dev/hda1,/dev/hdb1 As you can see, most of the important directories are mounted in software RAID1 on the two PATA discs with unimportant stuff on JBOD, although of course this shouldn't make any difference. All the usual dmesg etc. stuff doesn't seem to tell me anything I don't already know. If anyone has experienced this before or has any pointers as to how I can troubleshoot it, I'd be much obliged! Stephen Tait P.S. before all you hardware types tell me that the SI3112 sucks, yes I know but it was the only SATA controller my company could get hold of, and we already have a 3ware, we just can't afford another one! -- To UNSUBSCRIBE, email to debian-user-REQUEST(a)lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmaster(a)lists.debian.org
From: michael on 18 Aug 2005 11:50 Quoting Stephen Tait <tait(a)digitallaw.co.uk>: > I'm just in the process of setting up a Sarge server to be used as a > sort of backup server. The main PATA discs are used to boot the OS > offof software RAID1, with the rest of the disc space used in JBOD > for not-so-important backups. However, I'm having problems getting > the new disc array up and running. > > We've put a SATA controller in the box, a cheap-as-chips PCI Adaptec > 1210SA which, according to lspci, uses the SIlicon Image SI3112 > chipset to provide two SATA channels. Connected to this are two 320GB > drives which I want to turn into a RAID1 array. When the system > booted first, I used mdadm to create the RAID1 array md2 (mdadm > --create /dev/md2 --level=1 --raid-disks=2 /dev/sda1 /dev/sdb1), > checked /proc/mdstat to wait for the array to finish syncing, and > then formatted it ext3 and mounted it. Everything seemed to work fine > until I rebooted, whereupon the mount failed with the report that it > wasn't a valid ext[2|3] superblock; fsck confirmed this and on > further inspection it seemed that it wasn't a RAID device any more > either. > > ...and booted with that instead after editing GRUB's menu.lst. The > exact same error occurred, and I'm now at a bit of a loss to explain > what's happening. If I try and mount the discs on their own (i.e. > mount /dev/sdX /mnt/somedir) then they work just fine, so the > hardware works fine - so I'm almost certain it's a problem with > initting the RAID arrays at boot. At the moment I'm just rebuilding > the array to see what happens when I don't try and mount it at boot, > but only after the OS has finished booting, but of course that'll > only be a temporary workaround. If it's any help, here are my fstab > and mdadm.conf's: > > pika(a)zaphod2:~$ cat /etc/fstab > # /etc/fstab: static file system information. > # > # <file system> <mount point> <type> <options> <dump> <pass> > proc /proc proc defaults 0 0 > /dev/md1 / ext3 defaults,errors=remount-ro 0 1 > /dev/md0 /boot ext2 defaults 0 2 > /dev/hdb9 /home ext3 defaults 0 2 > /dev/hdb4 /mnt/avj-backup ext3 defaults 0 2 > /dev/hda7 /mnt/dcj-backup ext3 defaults 0 2 > /dev/hdb8 /tmp ext3 defaults 0 2 > /dev/md4 /usr ext3 defaults 0 2 > /dev/md3 /var ext3 defaults 0 2 > /dev/hdb7 none swap sw 0 0 > /dev/hdc /media/cdrom0 iso9660 ro,user,noauto 0 0 > #/dev/md2 /mnt/dcj-archive ext3 defaults 0 2 > >> =============================================== > > pika(a)zaphod2:~$ cat /etc/mdadm/mdadm.conf > DEVICE partitions > ARRAY /dev/md4 level=raid1 num-devices=2 > UUID=b8093124:a6d6f876:a29eecb7:e1b332f3 > devices=/dev/hda6,/dev/hdb6 > ARRAY /dev/md3 level=raid1 num-devices=2 > UUID=1973b0c3:e38869d2:ffef0cde:92048042 > devices=/dev/hda5,/dev/hdb5 > ARRAY /dev/md2 level=raid1 num-devices=2 > UUID=78a3be5a:f0838fe2:4d4ce7ed:3a969954 > devices=/dev/sda1,/dev/sdb1 > ARRAY /dev/md1 level=raid1 num-devices=2 > UUID=51d55d28:3e653dce:631dd682:8dd52a37 > devices=/dev/hda2,/dev/hdb2 > ARRAY /dev/md0 level=raid1 num-devices=2 > UUID=56e09876:a751356e:b86535d0:95091b5b > devices=/dev/hda1,/dev/hdb1 > > As you can see, most of the important directories are mounted in > software RAID1 on the two PATA discs with unimportant stuff on JBOD, > although of course this shouldn't make any difference. All the usual > dmesg etc. stuff doesn't seem to tell me anything I don't already > know. If anyone has experienced this before or has any pointers as to > how I can troubleshoot it, I'd be much obliged! I have had some trouble getting a raid array to inialize on boot in the past. My fix, was to remove its entry from the mdadm.conf file, and re-cfdisk the disks with the auto-detect-raid setting. Then create the raid array and reboot, it came up just fine. Other than that, I'm not sure that else could be wrong. Hopefully someone else on the list has some better ideas. Cheers, Mike -- To UNSUBSCRIBE, email to debian-user-REQUEST(a)lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmaster(a)lists.debian.org
From: Stephen Tait on 19 Aug 2005 10:40 At 16:37 18/08/2005, you wrote: >Quoting Stephen Tait <tait(a)digitallaw.co.uk>: > >>I'm just in the process of setting up a Sarge server to be used as a sort >>of backup server. The main PATA discs are used to boot the OS offof >>software RAID1, with the rest of the disc space used in JBOD for >>not-so-important backups. However, I'm having problems getting the new >>disc array up and running. >> >>We've put a SATA controller in the box, a cheap-as-chips PCI Adaptec >>1210SA which, according to lspci, uses the SIlicon Image SI3112 chipset >>to provide two SATA channels. Connected to this are two 320GB drives >>which I want to turn into a RAID1 array. When the system booted first, I >>used mdadm to create the RAID1 array md2 (mdadm --create /dev/md2 >>--level=1 --raid-disks=2 /dev/sda1 /dev/sdb1), checked /proc/mdstat to >>wait for the array to finish syncing, and then formatted it ext3 and >>mounted it. Everything seemed to work fine until I rebooted, whereupon >>the mount failed with the report that it wasn't a valid ext[2|3] >>superblock; fsck confirmed this and on further inspection it seemed that >>it wasn't a RAID device any more either. >> >>...and booted with that instead after editing GRUB's menu.lst. The exact >>same error occurred, and I'm now at a bit of a loss to explain what's >>happening. If I try and mount the discs on their own (i.e. mount /dev/sdX >>/mnt/somedir) then they work just fine, so the hardware works fine - so >>I'm almost certain it's a problem with initting the RAID arrays at boot. >>At the moment I'm just rebuilding the array to see what happens when I >>don't try and mount it at boot, but only after the OS has finished >>booting, but of course that'll only be a temporary workaround. If it's >>any help, here are my fstab and mdadm.conf's: >> >>pika(a)zaphod2:~$ cat /etc/fstab >># /etc/fstab: static file system information. >># >># <file system> <mount point> <type> <options> <dump> <pass> >>proc /proc proc defaults 0 0 >>/dev/md1 / ext3 defaults,errors=remount-ro 0 1 >>/dev/md0 /boot ext2 defaults 0 2 >>/dev/hdb9 /home ext3 defaults 0 2 >>/dev/hdb4 /mnt/avj-backup ext3 defaults 0 2 >>/dev/hda7 /mnt/dcj-backup ext3 defaults 0 2 >>/dev/hdb8 /tmp ext3 defaults 0 2 >>/dev/md4 /usr ext3 defaults 0 2 >>/dev/md3 /var ext3 defaults 0 2 >>/dev/hdb7 none swap sw 0 0 >>/dev/hdc /media/cdrom0 iso9660 ro,user,noauto 0 0 >>#/dev/md2 /mnt/dcj-archive ext3 defaults 0 2 >> >>>=============================================== >> >>pika(a)zaphod2:~$ cat /etc/mdadm/mdadm.conf >>DEVICE partitions >>ARRAY /dev/md4 level=raid1 num-devices=2 >>UUID=b8093124:a6d6f876:a29eecb7:e1b332f3 >> devices=/dev/hda6,/dev/hdb6 >>ARRAY /dev/md3 level=raid1 num-devices=2 >>UUID=1973b0c3:e38869d2:ffef0cde:92048042 >> devices=/dev/hda5,/dev/hdb5 >>ARRAY /dev/md2 level=raid1 num-devices=2 >>UUID=78a3be5a:f0838fe2:4d4ce7ed:3a969954 >> devices=/dev/sda1,/dev/sdb1 >>ARRAY /dev/md1 level=raid1 num-devices=2 >>UUID=51d55d28:3e653dce:631dd682:8dd52a37 >> devices=/dev/hda2,/dev/hdb2 >>ARRAY /dev/md0 level=raid1 num-devices=2 >>UUID=56e09876:a751356e:b86535d0:95091b5b >> devices=/dev/hda1,/dev/hdb1 >> >>As you can see, most of the important directories are mounted in software >>RAID1 on the two PATA discs with unimportant stuff on JBOD, although of >>course this shouldn't make any difference. All the usual dmesg etc. stuff >>doesn't seem to tell me anything I don't already know. If anyone has >>experienced this before or has any pointers as to how I can troubleshoot >>it, I'd be much obliged! > >I have had some trouble getting a raid array to inialize on boot in the past. >My fix, was to remove its entry from the mdadm.conf file, and re-cfdisk >the disks with the auto-detect-raid setting. Then create the raid array >and reboot, it came up just fine. >Other than that, I'm not sure that else could be wrong. >Hopefully someone else on the list has some better ideas. > >Cheers, >Mike Thanks for the tip Mika, I have just tried this and a number of other configurations, and the RAID array just "dies" (or doesn't initialise) on every single reboot, meaning I have to rebuild the array, reformat it, etc etc every time - obviously not what I want for a backup server without a UPS! I simply don't get it; AFAICT all the modules I need to init a SATA RAID1 array at boot exist within the initrd, and they all seem to get loaded at the right time (since when modprobe does it's thing later on in the boot process I see lots of "loading sata_sil... module already loaded" type messages). I'll post the relevant section of dmesg if anyone can spot anything I'm not familiar with, other than that I'm going to try building a another custom kernel with everything relevant compiled into the kernel (already tried one but I must've missed something as it panicked at boot). Snipped dmesg follows: RAMDISK: cramfs filesystem found at block 0 RAMDISK: Loading 4716 blocks [1 disk] into ram disk... done. VFS: Mounted root (cramfs filesystem) readonly. Freeing unused kernel memory: 168k freed Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2 ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx hda: WDC WD2500JB-00EVA0, ATA DISK drive hdb: WDC WD2000JB-00GVA0, ATA DISK drive hdc: Compaq CRD-8484B, ATAPI CD/DVD-ROM drive Using anticipatory io scheduler ide0 at 0x1f0-0x1f7,0x3f6 on irq 14 ide1 at 0x170-0x177,0x376 on irq 15 AMD7441: IDE controller at PCI slot 0000:00:07.1 AMD7441: chipset revision 4 AMD7441: not 100% native mode: will probe irqs later AMD7441: 0000:00:07.1 (rev 04) UDMA100 controller AMD7441: port 0x01f0 already claimed by ide0 AMD7441: port 0x0170 already claimed by ide1 AMD7441: neither IDE port enabled (BIOS) SCSI subsystem initialized libata version 1.02 loaded. device-mapper: 4.1.0-ioctl (2003-12-10) initialised: dm(a)uk.sistina.com sata_sil version 0.54 ACPI: PCI interrupt 0000:02:05.0[A] -> GSI 17 (level, low) -> IRQ 169 ata1: SATA max UDMA/100 cmd 0xE0823080 ctl 0xE082308A bmdma 0xE0823000 irq 169 ata2: SATA max UDMA/100 cmd 0xE08230C0 ctl 0xE08230CA bmdma 0xE0823008 irq 169 ata1: dev 0 cfg 49:2f00 82:346b 83:7f01 84:4003 85:3469 86:3c01 87:4003 88:203f ata1: dev 0 ATA, max UDMA/100, 625142448 sectors: lba48 ata1: dev 0 configured for UDMA/100 scsi0 : sata_sil ata2: dev 0 cfg 49:2f00 82:346b 83:7f01 84:4003 85:3469 86:3c01 87:4003 88:203f ata2: dev 0 ATA, max UDMA/100, 625142448 sectors: lba48 ata2: dev 0 configured for UDMA/100 scsi1 : sata_sil Vendor: ATA Model: WDC WD3200JD-00K Rev: 08.0 Type: Direct-Access ANSI SCSI revision: 05 SCSI device sda: 625142448 512-byte hdwr sectors (320073 MB) SCSI device sda: drive cache: write back /dev/scsi/host0/bus0/target0/lun0: p1 Attached scsi disk sda at scsi0, channel 0, id 0, lun 0 Vendor: ATA Model: WDC WD3200JD-00K Rev: 08.0 Type: Direct-Access ANSI SCSI revision: 05 SCSI device sdb: 625142448 512-byte hdwr sectors (320073 MB) SCSI device sdb: drive cache: write back /dev/scsi/host1/bus0/target0/lun0: p1 Attached scsi disk sdb at scsi1, channel 0, id 0, lun 0 md: md driver 0.90.0 MAX_MD_DEVS=256, MD_SB_DISKS=27 md: raid1 personality registered as nr 3 cpci_hotplug: CompactPCI Hot Plug Core version: 0.2 pci_hotplug: PCI Hot Plug PCI Core version: 0.5 shpchp: HPC vendor_id 1022 device_id 700d ss_vid 0 ss_did 0 shpchp: shpc_init: cannot reserve MMIO region shpchp: HPC vendor_id 1022 device_id 7448 ss_vid 0 ss_did 0 shpchp: shpc_init: cannot reserve MMIO region shpchp: Standard Hot Plug PCI Controller Driver version: 0.4 pciehp: PCI Express Hot Plug Controller Driver version: 0.4 vesafb: probe of vesafb0 failed with error -6 NET: Registered protocol family 1 hda: max request size: 1024KiB hda: 488397168 sectors (250059 MB) w/8192KiB Cache, CHS=30401/255/63 /dev/ide/host0/bus0/target0/lun0: p1 p2 p3 < p5 p6 p7 > hdb: max request size: 1024KiB hdb: 390721968 sectors (200049 MB) w/8192KiB Cache, CHS=24321/255/63 /dev/ide/host0/bus0/target1/lun0: p1 p2 p3 < p5 p6 p7 p8 p9 > p4 md: md1 stopped. md: bind<hdb2> md: bind<hda2> raid1: raid set md1 active with 2 out of 2 mirrors kjournald starting. Commit interval 5 seconds EXT3-fs: mounted filesystem with ordered data mode. Adding 1951856k swap on /dev/hdb7. Priority:-1 extents:1 EXT3 FS on md1, internal journal hdc: ATAPI 48X CD-ROM drive, 128kB Cache Uniform CD-ROM driver Revision: 3.20 ieee1394: Initialized config rom entry `ip1394' sbp2: $Rev: 1219 $ Ben Collins <bcollins(a)debian.org> ACPI: PCI interrupt 0000:02:06.0[A] -> GSI 18 (level, low) -> IRQ 185 3c59x: Donald Becker and others. www.scyld.com/network/vortex.html 0000:02:06.0: 3Com PCI 3c905C Tornado at 0xa400. Vers LK1.1.19 Capability LSM initialized md: md4 stopped. md: bind<hdb6> md: bind<hda6> raid1: raid set md4 active with 2 out of 2 mirrors md: md3 stopped. md: bind<hdb5> md: bind<hda5> raid1: raid set md3 active with 2 out of 2 mirrors md: md2 stopped. md: md0 stopped. md: bind<hdb1> md: bind<hda1> raid1: raid set md0 active with 2 out of 2 mirrors As you can see, the only mention of md2 is the "md: md2 stopped" line, whereas of course I'd be expecting a "raid1: raid set md2 active with 2 out of 2 mirrors" message. Does anyone more au fait with kernel software RAID know why the kernel won't even attempt to start md2? Should I try a newer kernel? Were there problems with SATA and software RAID in 2.6.8? So many questions, and an angry boss! P.S. I don't know if it's anything remotely significant, but after setting up software RAID on Gentoo I was led to believe that RAID configuration was done via the help of /etc/raidtab which the Sarge installer didn't put on my machine, so I assumed it wasn't needed and everything was done via mdadm.conf; I doubt it'd help my current situation, but would it do any harm to put one in there? Gentoo, by default, has an empty mdadm.conf so I'm assuming that the two both serve a similar function. Yours one very confused Debian user! Stephen Tait -- To UNSUBSCRIBE, email to debian-user-REQUEST(a)lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmaster(a)lists.debian.org
From: michael on 19 Aug 2005 11:00 Quoting Stephen Tait <tait(a)digitallaw.co.uk>: > > As you can see, the only mention of md2 is the "md: md2 stopped" > line, whereas of course I'd be expecting a "raid1: raid set md2 > active with 2 out of 2 mirrors" message. Does anyone more au fait > with kernel software RAID know why the kernel won't even attempt to > start md2? > Just for fun, have you tried re-partitioning your disks to create a new array? It looks like your were using the entire disks of sda and sdb? Try making a raid 1 from something like sda2 and sdb2. Ignore the first partition. You may have to zero the superblock prior. I think your system is fine, if you can manually create an array, mount it and start using it. If its not happening on boot, then there's something little thats not making it initialize. But who knows for sure. Hope you get it working. Cheers, Mike -- To UNSUBSCRIBE, email to debian-user-REQUEST(a)lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmaster(a)lists.debian.org
From: Stephen Tait on 19 Aug 2005 12:20 At 15:44 19/08/2005, you wrote: >Quoting Stephen Tait <tait(a)digitallaw.co.uk>: > >> >>As you can see, the only mention of md2 is the "md: md2 stopped" line, >>whereas of course I'd be expecting a "raid1: raid set md2 active with 2 >>out of 2 mirrors" message. Does anyone more au fait with kernel software >>RAID know why the kernel won't even attempt to start md2? > >Just for fun, have you tried re-partitioning your disks to create a new array? >It looks like your were using the entire disks of sda and sdb? >Try making a raid 1 from something like sda2 and sdb2. Ignore the first >partition. You may have to zero the superblock prior. >I think your system is fine, if you can manually create an array, mount it >and start using it. If its not happening on boot, then there's something >little thats not making it initialize. >But who knows for sure. > >Hope you get it working. > >Cheers, >Mike Interesting you should mention the superblock, since the reason md2 wasn't created during the install process was due to some error message I got about the superblock that wanted me to reboot in order to re-read the partition table, although I can't remember it exactly. And yes, sd[a|b]1 is a primary partition with the whole disc on it. I'll try chopping up the drives a bit more and give it another whirl... ho hum, only another 220mins to go until it re-syncs... again! Yeah, I've turned sd[a|b]1 into an as-yet unformatted 1024MB partition with the RAID array on sd[a|b]2 that comprises the rest of the drive. Oh how I wish for another 3ware... :D I've also just built a customised kernel with all of the relevant ATA and SCSI options compiled in rather than as modules; if either of these two approaches works I'll post details back to the list, if they don't work I'll cry :D Thanks again for your help Mike! Stephen Tait -- To UNSUBSCRIBE, email to debian-user-REQUEST(a)lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmaster(a)lists.debian.org
|
Next
|
Last
Pages: 1 2 Prev: apache-modconf to disable apache_ssl and enable mod_ssl Next: PCI modem suggestions |