From: Franc Zabkar on
I've been reading this document which is an analysis of Google's hard
disc failure rates:

Failure Trends in a Large Disk Drive Population:
http://research.google.com/archive/disk_failures.pdf

It states that "contrary to previously reported results, we found very
little correlation between failure rates and either elevated
temperature or activity levels."

Figure 4 "shows that failures do not increase when the average
temperature increases. In fact, there is a clear trend showing that
lower temperatures are associated with higher failure rates. Only at
very high temperatures is there a slight reversal of this trend."

"Figure 5 looks at the average temperatures for different age groups.
The distributions are in sync with Figure 4 showing a mostly flat
failure rate at mid-range temperatures and a modest increase at the
low end of the temperature distribution. What stands out are the 3 and
4-year old drives, where the trend for higher failures with higher
temperature is much more constant and also more pronounced."

"Overall our experiments can confirm previously reported temperature
effects only for the high end of our temperature range and especially
for older drives. In the lower and middle temperature ranges, higher
temperatures are not associated with higher failure rates."

Figure 5 suggests that Google's optimum temperature for hard drives is
between 35C and 40C.

Elsewhere I found this old IBM article:
http://web.archive.org/web/20000519230551/http://www.storage.ibm.com/hardsoft/diskdrdl/technolo/drivetemp/drivetemp.htm

It states that "figure 2 shows the dramatic effect that temperature
has on the overall reliability of a hard disk drive. Derivations [sic]
from a nominal operating temperature (assumed to be maintained over
the life of a drive) can result in a derivation [sic] from the nominal
failure rate. As the temperature exceeds the recommended level, the
failure rate increases two to three percent for every one degree rise
above it. For example, a hard disk drive running for an extended
period of time at five degrees above the recommended temperature can
experience an increase in failure rate of 10 to 15 percent. Likewise,
operating a drive below the recommended temperature can extend drive
life."

This last statement is a bit ambiguous. If a hard drive is more
reliable at a temperature below that which is recommended, then why
not recommend a lower temperature in the first place? Then again,
maybe the author's intended meaning was "recommended maximum
temperature".

- Franc Zabkar
--
Please remove one 'i' from my address when replying by email.
From: Arno Wagner on
Previously Franc Zabkar <fzabkar(a)iinternode.on.net> wrote:
> I've been reading this document which is an analysis of Google's hard
> disc failure rates:
[...]

If you can keep your HDDs below around 40C or so, then you will
run them under data-center conditions. These conditions is what
the Google study is about. An example from my personal experience
is with Maxtor disks. They had direct outside airflow and stayed
<30C under load and at 22C when idle. No failures in 3 years for
about 50 disks. These were the same Maxtors known to die fast when
run hot (e.g. at 50-60C).

Conditions in a typical PC are different. The HDDs are often
not directly cooled with outside air and can get hot under load.
If you have temperature spikes in the 50C range or higher,
temperature is a major factor in HDD death. How major exactly is
currently unknown or only known to the manufacturers. Most drives
have a 55C stated maximum temperature. The Maxtors I mention above
had a statement in their product manual that up to 60C the drive
failure rate would not increase, despite a 55C maximum temperature.
There is reason to believe that statement was over-optimistic or
a plain lie. So don't expect the HDD manufacturers to tell you
about high-temperature life expectancy.

Bottom line, the Google study shows that if you can get the drives
consitently down to below 40C, temperature does not matter a lot.
So the recomendation would be to have your drives (under load,
on a hot day) below 40C at all times. Note that this also applies
to external enclosures.

Arno

From: Franc Zabkar on
On 16 Apr 2008 12:20:06 GMT, Arno Wagner <me(a)privacy.net> put finger
to keyboard and composed:

>Bottom line, the Google study shows that if you can get the drives
>consitently down to below 40C, temperature does not matter a lot.
>So the recomendation would be to have your drives (under load,
>on a hot day) below 40C at all times. Note that this also applies
>to external enclosures.
>
>Arno

AFAICS, the Google study conclusively shows that failure rates also
increase when temperatures drop below 35C. In fact lower temps appear
to be more dangerous than slightly higher temps, except when the drive
is getting old, in which case higher temps start to become
significant.

- Franc Zabkar
--
Please remove one 'i' from my address when replying by email.
From: Arno Wagner on
Previously Franc Zabkar <fzabkar(a)iinternode.on.net> wrote:
> On 16 Apr 2008 12:20:06 GMT, Arno Wagner <me(a)privacy.net> put finger
> to keyboard and composed:

>>Bottom line, the Google study shows that if you can get the drives
>>consitently down to below 40C, temperature does not matter a lot.
>>So the recomendation would be to have your drives (under load,
>>on a hot day) below 40C at all times. Note that this also applies
>>to external enclosures.
>>
>>Arno
>
> AFAICS, the Google study conclusively shows that failure rates also
> increase when temperatures drop below 35C. In fact lower temps appear
> to be more dangerous than slightly higher temps, except when the drive
> is getting old, in which case higher temps start to become
> significant.

Don't read too much into it. AFAIR they did not separate by
manufacturer, model and manufactuuring date. It is quite possible that
the drives running at lower temperatures were actually from a batch
that had less life expectancy from the start and stay at lower
temperatures because of different cooling characteristics, i.e. there
may well be a systematic error in the measurements.

Arno
From: Franc Zabkar on
On 16 Apr 2008 22:10:18 GMT, Arno Wagner <me(a)privacy.net> put finger
to keyboard and composed:

>Previously Franc Zabkar <fzabkar(a)iinternode.on.net> wrote:
>> On 16 Apr 2008 12:20:06 GMT, Arno Wagner <me(a)privacy.net> put finger
>> to keyboard and composed:
>
>>>Bottom line, the Google study shows that if you can get the drives
>>>consitently down to below 40C, temperature does not matter a lot.
>>>So the recomendation would be to have your drives (under load,
>>>on a hot day) below 40C at all times. Note that this also applies
>>>to external enclosures.
>>>
>>>Arno
>>
>> AFAICS, the Google study conclusively shows that failure rates also
>> increase when temperatures drop below 35C. In fact lower temps appear
>> to be more dangerous than slightly higher temps, except when the drive
>> is getting old, in which case higher temps start to become
>> significant.
>
>Don't read too much into it. AFAIR they did not separate by
>manufacturer, model and manufactuuring date. It is quite possible that
>the drives running at lower temperatures were actually from a batch
>that had less life expectancy from the start and stay at lower
>temperatures because of different cooling characteristics, i.e. there
>may well be a systematic error in the measurements.
>
>Arno

The way I read it, the reliability-versus-temperature result was found
to be consistent across all models and manufacturers.

==================================================================
Failure rates are known to be highly correlated with drive models,
manufacturers and vintages. Our results do not contradict this fact.
For example, Figure 2 [Annualized failure rates broken down by age
groups] changes significantly when we normalize failure rates per each
drive model. Most age-related results are impacted by drive vintages.
However, in this paper, we do not show a breakdown of drives per
manufacturer, model, or vintage due to the proprietary nature of these
data.

Interestingly, this does not change our conclusions. In contrast to
age-related results, we note that all results shown in the rest of the
paper are not affected significantly by the population mix.

==================================================================
The data in this study are collected from a large number of disk
drives, deployed in several types of systems across all of Google�s
services. More than one hundred thousand disk drives were used for all
the results presented here. The disks are a combination of serial and
parallel ATA consumer-grade hard disk drives, ranging in speed from
5400 to 7200 rpm, and in size from 80 to 400 GB. All units in this
study were put into production in or after 2001. The population
contains several models from many of the largest disk drive
manufacturers and from at least nine different models.

==================================================================

- Franc Zabkar
--
Please remove one 'i' from my address when replying by email.