Barracuda ST31000528AS problem [Linux Hardware]

Prev: low power linux server
Next: MSI NF980-G65 Motherboard

From: Piotr Szymański on 1 Nov 2009 08:50

Hi All,

I have two Seagate Barracuda 7200.12 1 TB (ST31000528AS) drives in a
Linux software RAID-1 configuration. Today I've got a notification from
smartd that one of the drives (sda) is failing:

Device: /dev/sda, ATA error count increased from 0 to 6

Some other log messages (like: "ata1.00: cmd ... Emask 0x409 (media
error)", "end_request: I/O error, dev sda, sector 39072000") and the
disk's SMART error log seem to confirm that the disk is dying. My
problem is that I'm seeing SMART warnings about the other drive too:

smartd[5845]: Device: /dev/sdb, SMART Prefailure Attribute: 1
Raw_Read_Error_Rate changed from 108 to 117

Below is the listing of SMART attributes for the good drive (smartctl -A
/dev/sdb):

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE
UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 113 099 006 Pre-fail
Always - 52634145
3 Spin_Up_Time 0x0003 095 095 000 Pre-fail
Always - 0
4 Start_Stop_Count 0x0032 100 100 020 Old_age
Always - 56
5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail
Always - 24
7 Seek_Error_Rate 0x000f 075 060 030 Pre-fail
Always - 35530576
9 Power_On_Hours 0x0032 096 096 000 Old_age
Always - 3861
10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail
Always - 0
12 Power_Cycle_Count 0x0032 100 100 020 Old_age
Always - 56
183 Unknown_Attribute 0x0000 100 100 000 Old_age
Offline - 0
184 Unknown_Attribute 0x0032 100 100 099 Old_age Always
- 0
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always
- 0
188 Unknown_Attribute 0x0032 100 099 000 Old_age Always
- 1
189 High_Fly_Writes 0x003a 099 099 000 Old_age Always
- 1
190 Airflow_Temperature_Cel 0x0022 067 059 045 Old_age Always
- 33 (Lifetime Min/Max 32/41)
194 Temperature_Celsius 0x0022 033 041 000 Old_age Always
- 33 (0 19 0 0)
195 Hardware_ECC_Recovered 0x001a 036 015 000 Old_age Always
- 52634145
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always
- 0
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age
Offline - 0
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always
- 0
240 Head_Flying_Hours 0x0000 100 253 000 Old_age
Offline - 91955249811373
241 Unknown_Attribute 0x0000 100 253 000 Old_age
Offline - 1261294398
242 Unknown_Attribute 0x0000 100 253 000 Old_age
Offline - 1519044357

And here is the listing for the bad drive:

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE
UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 109 100 006 Pre-fail
Always - 23028010
3 Spin_Up_Time 0x0003 095 095 000 Pre-fail
Always - 0
4 Start_Stop_Count 0x0032 100 100 020 Old_age
Always - 59
5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail
Always - 17
7 Seek_Error_Rate 0x000f 078 060 030 Pre-fail
Always - 81078197
9 Power_On_Hours 0x0032 096 096 000 Old_age
Always - 3861
10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail
Always - 0
12 Power_Cycle_Count 0x0032 100 100 020 Old_age
Always - 59
183 Unknown_Attribute 0x0000 100 100 000 Old_age
Offline - 0
184 Unknown_Attribute 0x0032 100 100 099 Old_age Always
- 0
187 Reported_Uncorrect 0x0032 094 094 000 Old_age Always
- 6
188 Unknown_Attribute 0x0032 100 096 000 Old_age Always
- 26
189 High_Fly_Writes 0x003a 100 100 000 Old_age Always
- 0
190 Airflow_Temperature_Cel 0x0022 070 062 045 Old_age Always
- 30 (Lifetime Min/Max 29/38)
194 Temperature_Celsius 0x0022 030 040 000 Old_age Always
- 30 (0 19 0 0)
195 Hardware_ECC_Recovered 0x001a 041 022 000 Old_age Always
- 23028010
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always
- 0
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age
Offline - 0
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always
- 0
240 Head_Flying_Hours 0x0000 100 253 000 Old_age
Offline - 82240033787773
241 Unknown_Attribute 0x0000 100 253 000 Old_age
Offline - 2371531202
242 Unknown_Attribute 0x0000 100 253 000 Old_age
Offline - 3348144171

Both have a nonzero Reallocated_Sector_Ct and Seek_Error_Rate.
I cannot run an extended SMART test on the drive as due to some firmware
problem it doesn't move past 10% completion.

Do you think the other drive is failing also?

Thanks!

--
Peter Szyma�ski <szyman(at)magres.net>

From: philo on 1 Nov 2009 11:11

Piotr Szyma�ski wrote:
> Hi All,
>
> I have two Seagate Barracuda 7200.12 1 TB (ST31000528AS) drives in a
> Linux software RAID-1 configuration. Today I've got a notification from
> smartd that one of the drives (sda) is failing:
>
> Device: /dev/sda, ATA error count increased from 0 to 6
>
> Some other log messages (like: "ata1.00: cmd ... Emask 0x409 (media
> error)", "end_request: I/O error, dev sda, sector 39072000") and the
> disk's SMART error log seem to confirm that the disk is dying. My
> problem is that I'm seeing SMART warnings about the other drive too:
>
> smartd[5845]: Device: /dev/sdb, SMART Prefailure Attribute: 1
> Raw_Read_Error_Rate changed from 108 to 117
>
> Below is the listing of SMART attributes for the good drive (smartctl -A
> /dev/sdb):
>
> === START OF READ SMART DATA SECTION ===
> SMART Attributes Data Structure revision number: 10
> Vendor Specific SMART Attributes with Thresholds:
> ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED
> WHEN_FAILED RAW_VALUE
> 1 Raw_Read_Error_Rate 0x000f 113 099 006 Pre-fail
> Always - 52634145
> 3 Spin_Up_Time 0x0003 095 095 000 Pre-fail
> Always - 0
> 4 Start_Stop_Count 0x0032 100 100 020 Old_age
> Always - 56
> 5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail
> Always - 24
> 7 Seek_Error_Rate 0x000f 075 060 030 Pre-fail
> Always - 35530576
> 9 Power_On_Hours 0x0032 096 096 000 Old_age
> Always - 3861
> 10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail
> Always - 0
> 12 Power_Cycle_Count 0x0032 100 100 020 Old_age
> Always - 56
> 183 Unknown_Attribute 0x0000 100 100 000 Old_age
> Offline - 0
> 184 Unknown_Attribute 0x0032 100 100 099 Old_age Always
> - 0
> 187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always
> - 0
> 188 Unknown_Attribute 0x0032 100 099 000 Old_age Always
> - 1
> 189 High_Fly_Writes 0x003a 099 099 000 Old_age Always
> - 1
> 190 Airflow_Temperature_Cel 0x0022 067 059 045 Old_age Always
> - 33 (Lifetime Min/Max 32/41)
> 194 Temperature_Celsius 0x0022 033 041 000 Old_age Always
> - 33 (0 19 0 0)
> 195 Hardware_ECC_Recovered 0x001a 036 015 000 Old_age Always
> - 52634145
> 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always
> - 0
> 198 Offline_Uncorrectable 0x0010 100 100 000 Old_age
> Offline - 0
> 199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always
> - 0
> 240 Head_Flying_Hours 0x0000 100 253 000 Old_age
> Offline - 91955249811373
> 241 Unknown_Attribute 0x0000 100 253 000 Old_age
> Offline - 1261294398
> 242 Unknown_Attribute 0x0000 100 253 000 Old_age
> Offline - 1519044357
>
> And here is the listing for the bad drive:
>
> === START OF READ SMART DATA SECTION ===
> SMART Attributes Data Structure revision number: 10
> Vendor Specific SMART Attributes with Thresholds:
> ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED
> WHEN_FAILED RAW_VALUE
> 1 Raw_Read_Error_Rate 0x000f 109 100 006 Pre-fail
> Always - 23028010
> 3 Spin_Up_Time 0x0003 095 095 000 Pre-fail
> Always - 0
> 4 Start_Stop_Count 0x0032 100 100 020 Old_age
> Always - 59
> 5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail
> Always - 17
> 7 Seek_Error_Rate 0x000f 078 060 030 Pre-fail
> Always - 81078197
> 9 Power_On_Hours 0x0032 096 096 000 Old_age
> Always - 3861
> 10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail
> Always - 0
> 12 Power_Cycle_Count 0x0032 100 100 020 Old_age
> Always - 59
> 183 Unknown_Attribute 0x0000 100 100 000 Old_age
> Offline - 0
> 184 Unknown_Attribute 0x0032 100 100 099 Old_age Always
> - 0
> 187 Reported_Uncorrect 0x0032 094 094 000 Old_age Always
> - 6
> 188 Unknown_Attribute 0x0032 100 096 000 Old_age Always
> - 26
> 189 High_Fly_Writes 0x003a 100 100 000 Old_age Always
> - 0
> 190 Airflow_Temperature_Cel 0x0022 070 062 045 Old_age Always
> - 30 (Lifetime Min/Max 29/38)
> 194 Temperature_Celsius 0x0022 030 040 000 Old_age Always
> - 30 (0 19 0 0)
> 195 Hardware_ECC_Recovered 0x001a 041 022 000 Old_age Always
> - 23028010
> 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always
> - 0
> 198 Offline_Uncorrectable 0x0010 100 100 000 Old_age
> Offline - 0
> 199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always
> - 0
> 240 Head_Flying_Hours 0x0000 100 253 000 Old_age
> Offline - 82240033787773
> 241 Unknown_Attribute 0x0000 100 253 000 Old_age
> Offline - 2371531202
> 242 Unknown_Attribute 0x0000 100 253 000 Old_age
> Offline - 3348144171
>
> Both have a nonzero Reallocated_Sector_Ct and Seek_Error_Rate.
> I cannot run an extended SMART test on the drive as due to some firmware
> problem it doesn't move past 10% completion.
>
> Do you think the other drive is failing also?
>
> Thanks!
>

Replace the drive at once!

Do not fool with it any more

Just because one drive is bad...it does not necessarily mean the other
one is bad too

From: root on 1 Nov 2009 12:24

Piotr Szyma�ski <szyman(a)REMOVETHISmagres.net> wrote:
> Hi All,
>
> I have two Seagate Barracuda 7200.12 1 TB (ST31000528AS) drives in a
> Linux software RAID-1 configuration. Today I've got a notification from
> smartd that one of the drives (sda) is failing:
>

I had two of the 1Tb drives fail within a week of purchase.
Send them back to Seagate for replacement. When you call
Seagate they will warn you that they may reject your drive
if you don't pack it correctly. I simply packed the first
drive in the original box and returned it. They took it
and returned the drive in a big box with lots of foam around
the drive. I returned the second drive in the box they
sent. Since then I have had no problems with the replacement
drives. Something rotten about the first 1Tb drives.

PS if you opt for them to send you a drive before they
get your drive you will get hit with a $25 shipping charge.
The UPS shipping for one drive is about $9.

From: philo on 1 Nov 2009 13:57

root wrote:
> Piotr Szyma�ski <szyman(a)REMOVETHISmagres.net> wrote:
>> Hi All,
>>
>> I have two Seagate Barracuda 7200.12 1 TB (ST31000528AS) drives in a
>> Linux software RAID-1 configuration. Today I've got a notification from
>> smartd that one of the drives (sda) is failing:
>>
>
> I had two of the 1Tb drives fail within a week of purchase.
> Send them back to Seagate for replacement. When you call
> Seagate they will warn you that they may reject your drive
> if you don't pack it correctly. I simply packed the first
> drive in the original box and returned it. They took it
> and returned the drive in a big box with lots of foam around
> the drive. I returned the second drive in the box they
> sent. Since then I have had no problems with the replacement
> drives. Something rotten about the first 1Tb drives.
>
> PS if you opt for them to send you a drive before they
> get your drive you will get hit with a $25 shipping charge.
> The UPS shipping for one drive is about $9.

It may be hard to warranty a drive that has not yet failed...
unless there's a known manufacturing defect...
but worth checking into

From: Joe on 1 Nov 2009 14:31

On 2009-11-01, philo <philo(a)privacy.invalid> wrote:
> root wrote:
>> Piotr Szymañski <szyman(a)REMOVETHISmagres.net> wrote:
>>> Hi All,
>>>
>>> I have two Seagate Barracuda 7200.12 1 TB (ST31000528AS) drives in a
>>> Linux software RAID-1 configuration. Today I've got a notification from
>>> smartd that one of the drives (sda) is failing:
>>>
>>
>> I had two of the 1Tb drives fail within a week of purchase.
>> Send them back to Seagate for replacement. When you call
>> Seagate they will warn you that they may reject your drive
>> if you don't pack it correctly. I simply packed the first
>> drive in the original box and returned it. They took it
>> and returned the drive in a big box with lots of foam around
>> the drive. I returned the second drive in the box they
>> sent. Since then I have had no problems with the replacement
>> drives. Something rotten about the first 1Tb drives.
>>
>> PS if you opt for them to send you a drive before they
>> get your drive you will get hit with a $25 shipping charge.
>> The UPS shipping for one drive is about $9.
>
>
>
> It may be hard to warranty a drive that has not yet failed...
> unless there's a known manufacturing defect...
> but worth checking into

Not at all. Warranty covers SMART failures on every drive I've dealt
with...

--
Joe - Linux User #449481/Ubuntu User #19733
joe at hits - buffalo dot com
"Hate is baggage, life is too short to go around pissed off all the
time..." - Danny, American History X

| Next | Last
Pages: 1 2 3 4
Prev: low power linux server
Next: MSI NF980-G65 Motherboard