Prev: [fw] Oracle Drops Hitachi Data Storage Arrays
Next: Any way to revive a dropped 1.5TB Seagate drive? MUST GET IT TO SPIN UP
From: Ant on 6 Mar 2010 10:33 On 3/6/2010 2:22 AM PT, Pascal Hambourg typed: > Hello, > > Ant a �crit : >> I was poking around to see why my old Linux/Debian box was rarely and >> randomly crashing with kernel panics. I read that its errors can be >> found in /var/log/syslog (dmesg didn't show me anything related to >> Kernel panics that I could find): >> >> # cat /var/log/syslog >> ... >> Mar 4 23:12:07 foobar smartd[2647]: Device: /dev/hda, SMART Usage >> Attribute: 194 Temperature_Celsius changed from 30 to 31 >> ... >> Mar 5 15:11:31 foobar smartd[2610]: Device: /dev/hda, SMART Prefailure >> Attribute: 1 Raw_Read_Error_Rate changed from 58 to 59 >> Mar 5 15:11:31 foobar smartd[2610]: Device: /dev/hda, SMART Usage >> Attribute: 195 Hardware_ECC_Recovered changed from 58 to 59 > > These are not errors but just (useless IMHO) notifications on SMART > attribute changes. Ah OK. Thanks. :) >> foobar:/home/ant/download# smartctl -a /dev/hda > [...] > > From this, hda seems to be perfectly healthy. Thanks for the confirmation. :) -- "We ants are runnin' the show! We're the lords of the earth!" --ANTZ /\___/\ / /\ /\ \ Phil./Ant @ http://antfarm.ma.cx (Personal Web Site) | |o o| | Ant's Quality Foraged Links: http://aqfl.net \ _ / Nuke ANT from e-mail address: philpi(a)earthlink.netANT ( ) or ANTant(a)zimage.com Ant is currently not listening to any songs on his home computer.
From: Ant on 6 Mar 2010 10:34 > It doesn't seem like this is the cause of your kernel panics. These are > just informational messages, at worst warnings, but nothing more. The > drive in question seems to be experiencing some communications error > with your computer. If it's an IDE drive, then suggest looking into > changing cables. If it's SATA, it's more rare for there to be > communications errors, but not unthinkable. However, as the attribute > says, it recovered from that error, so it's not a failure. OK. That's good then. > You might want to turn on core dump saves on the machine, if you haven't > already done so. How do I enable core dumps saves for kernel panics? -- "We ants are runnin' the show! We're the lords of the earth!" --ANTZ /\___/\ / /\ /\ \ Phil./Ant @ http://antfarm.ma.cx (Personal Web Site) | |o o| | Ant's Quality Foraged Links: http://aqfl.net \ _ / Nuke ANT from e-mail address: philpi(a)earthlink.netANT ( ) or ANTant(a)zimage.com Ant is currently not listening to any songs on his home computer.
From: Rod Speed on 6 Mar 2010 13:15 "Ant" <ant(a)zimage.comANT> wrote in message news:gfSdncBxmoq-ZgzWnZ2dnUVZ_uadnZ2d(a)earthlink.com... >I was poking around to see why my old Linux/Debian box was rarely and randomly crashing with kernel panics. I read that >its errors can be found in /var/log/syslog (dmesg didn't show me anything related to Kernel panics that I could find): > > # cat /var/log/syslog > ... > Mar 4 23:12:07 foobar smartd[2647]: Device: /dev/hda, SMART Usage Attribute: 194 Temperature_Celsius changed from 30 > to 31 > ... > Mar 5 15:11:31 foobar smartd[2610]: Device: /dev/hda, SMART Prefailure Attribute: 1 Raw_Read_Error_Rate changed from > 58 to 59 > Mar 5 15:11:31 foobar smartd[2610]: Device: /dev/hda, SMART Usage Attribute: 195 Hardware_ECC_Recovered changed from > 58 to 59 > Mar 5 15:15:01 foobar /USR/SBIN/CRON[8815]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1) > Mar 5 15:17:01 foobar /USR/SBIN/CRON[11199]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly) > Mar 5 15:25:01 foobar /USR/SBIN/CRON[20721]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1) > Mar 5 15:35:01 foobar /USR/SBIN/CRON[32588]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1) > Mar 5 15:45:01 foobar /USR/SBIN/CRON[12129]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1) > Mar 5 15:55:01 foobar /USR/SBIN/CRON[23947]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1) > < rebooted my crashed PC from its kernel panic > > Mar 5 21:05:19 foobar syslogd 1.5.0#5: restart. > ... > > I couldn't find any similiar from an earlier one like: > ... > Mar 5 05:17:01 foobar /USR/SBIN/CRON[26833]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly) > Mar 5 05:25:01 foobar /USR/SBIN/CRON[29514]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1) > Mar 5 05:35:01 foobar /USR/SBIN/CRON[372]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1) > Mar 5 05:45:01 foobar /USR/SBIN/CRON[3772]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1) > Mar 5 05:55:01 foobar /USR/SBIN/CRON[7160]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1) > Mar 5 06:41:19 foobar syslogd 1.5.0#5: restart. > ... > > # hdparm /dev/hda > > /dev/hda: > multcount = 16 (on) > IO_support = 1 (32-bit) > unmaskirq = 1 (on) > using_dma = 1 (on) > keepsettings = 0 (off) > readonly = 0 (off) > readahead = 256 (on) > geometry = 16383/255/63, sectors = 156301488, start = 0 > foobar:/home/ant/download# hdparm /dev/hda^C > foobar:/home/ant/download# smartctl -a /dev/hda > smartctl 5.40 2010-02-03 r3060 [i686-pc-linux-gnu] (local build) > Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net > > === START OF INFORMATION SECTION === > Model Family: Seagate Barracuda 7200.7 and 7200.7 Plus family > Device Model: ST380011A > Serial Number: 4JV5P7LN > Firmware Version: 8.01 > User Capacity: 80,026,361,856 bytes > Device is: In smartctl database [for details use: -P show] > ATA Version is: 6 > ATA Standard is: ATA/ATAPI-6 T13 1410D revision 2 > Local Time is: Fri Mar 5 22:32:16 2010 PST > SMART support is: Available - device has SMART capability. > SMART support is: Enabled > > === START OF READ SMART DATA SECTION === > SMART overall-health self-assessment test result: PASSED > > General SMART Values: > Offline data collection status: (0x82) Offline data collection activity > was completed without error. > Auto Offline Data Collection: Enabled. > Self-test execution status: ( 0) The previous self-test routine completed > without error or no self-test has ever > been run. > Total time to complete Offline > data collection: ( 430) seconds. > Offline data collection > capabilities: (0x5b) SMART execute Offline immediate. > Auto Offline data collection on/off support. > Suspend Offline collection upon new > command. > Offline surface scan supported. > Self-test supported. > No Conveyance Self-test supported. > Selective Self-test supported. > SMART capabilities: (0x0003) Saves SMART data before entering > power-saving mode. > Supports SMART auto save timer. > Error logging capability: (0x01) Error logging supported. > General Purpose Logging supported. > Short self-test routine > recommended polling time: ( 1) minutes. > Extended self-test routine > recommended polling time: ( 58) minutes. > > SMART Attributes Data Structure revision number: 10 > Vendor Specific SMART Attributes with Thresholds: > ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE > 1 Raw_Read_Error_Rate 0x000f 060 056 006 Pre-fail Always - 40077017 > 3 Spin_Up_Time 0x0003 098 098 000 Pre-fail Always - 0 > 4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 0 > 5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 0 > 7 Seek_Error_Rate 0x000f 085 060 030 Pre-fail Always - 339834978 > 9 Power_On_Hours 0x0032 060 060 000 Old_age Always - 35554 > 10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0 > 12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 289 > 194 Temperature_Celsius 0x0022 030 048 000 Old_age Always - 30 > 195 Hardware_ECC_Recovered 0x001a 060 055 000 Old_age Always - 40077017 > 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0 > 198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0 > 199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0 > 200 Multi_Zone_Error_Rate 0x0000 100 253 000 Old_age Offline - 0 > 202 Data_Address_Mark_Errs 0x0032 100 253 000 Old_age Always - 0 > > SMART Error Log Version: 1 > No Errors Logged > > SMART Self-test log structure revision number 1 > Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error > # 1 Extended offline Completed without error 00% 31886 - > # 2 Extended offline Completed without error 00% 22233 - > # 3 Extended offline Completed without error 00% 18951 - > # 4 Extended offline Completed without error 00% 18674 - > # 5 Extended offline Completed without error 00% 15957 - > # 6 Extended offline Completed without error 00% 14448 - > > SMART Selective self-test log data structure revision number 1 > SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS > 1 0 0 Not_testing > 2 0 0 Not_testing > 3 0 0 Not_testing > 4 0 0 Not_testing > 5 0 0 Not_testing > Selective self-test flags (0x0): > After scanning selected spans, do NOT read-scan remainder of disk. > If Selective self-test is pending on power-up, resume after 0 minute delay. > > > Are those bad? Thank you in advance. :) Yes, the reallocated sectors are much higher than I would continue to use with new hard drives so cheap.
From: Pascal Hambourg on 6 Mar 2010 14:41 Rod Speed a �crit : >> 5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 0 >> 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0 [...] > Yes, the reallocated sectors are much higher than I would continue to > use with new hard drives so cheap. Huh ? From the above, the drive has no reallocated sectors yet, and no pending (unreadable) sectors either. PS : was it useful to quote all the post just to comment on one line ?
From: Yousuf Khan on 6 Mar 2010 14:58
Ant wrote: >> You might want to turn on core dump saves on the machine, if you haven't >> already done so. > > How do I enable core dumps saves for kernel panics? This might be a little old, or not entirely relevant to your distro. HOWTO enable core-dumps - LinuxReviews - Mozilla Firefox chrome://browser/content/browser.xul Look up for your own distro, they may have an easier way to do this, depending what tools are included with your distro. Yousuf Khan |