From: r0g on
I built a freeNAS (0.7.x) for someone recently and it seems to have been
running fine since I installed it over a month ago. I just got to the
stage where I'd stopped worrying about it as it had 20 days uptime since
the last reboot when, out of nowhere, it seems to have spontaneously
rebooted itself in the middle of the night.

The machine is several miles away and I couldn't get their router to
allow ssh through so I can't remotely administer it. What I do have
though are the status reports (several per day)...

The only thing that looks weird in the dmesg log is the last entry:

pid 19683 (php), uid 0: exited on signal 11

Of course I have no idea what time that was (any idea how to get dmesg
to log times too?) or if it is related. There certainly shouldn't have
been anything using PHP around the time of the reboot (about 4:30am) as
nobody was present and there are no cron jobs for 4 hours either side of it.

The other thing that stands out from the status reports is the memory
usage and the amount of free vs dirty RAM available. Here's the last few
days memory usage...

Sat, 10 Apr 2010 00:00:02 +0000 Mem: 28M Active, 1486M Inact, 293M
Wired, 42M Cache, 204M Buf, 31M Free
Sat, 10 Apr 2010 11:00:02 +0000 Mem: 27M Active, 1485M Inact, 293M
Wired, 42M Cache, 204M Buf, 32M Free
Sat, 10 Apr 2010 12:00:02 +0000 Mem: 27M Active, 1485M Inact, 293M
Wired, 42M Cache, 204M Buf, 32M Free
Sat, 10 Apr 2010 14:00:02 +0000 Mem: 27M Active, 1486M Inact, 293M
Wired, 42M Cache, 204M Buf, 32M Free
Sat, 10 Apr 2010 16:00:01 +0000 Mem: 27M Active, 1485M Inact, 293M
Wired, 42M Cache, 204M Buf, 32M Free
Sat, 10 Apr 2010 18:00:01 +0000 Mem: 26M Active, 1486M Inact, 293M
Wired, 42M Cache, 204M Buf, 33M Free
Sun, 11 Apr 2010 00:00:01 +0000 Mem: 25M Active, 1486M Inact, 293M
Wired, 42M Cache, 204M Buf, 34M Free
Sun, 11 Apr 2010 09:00:01 +0000 Mem: 24M Active, 1486M Inact, 292M
Wired, 42M Cache, 204M Buf, 35M Free
Sun, 11 Apr 2010 12:00:01 +0000 Mem: 24M Active, 1486M Inact, 292M
Wired, 42M Cache, 204M Buf, 35M Free
Sun, 11 Apr 2010 14:00:01 +0000 Mem: 24M Active, 1486M Inact, 292M
Wired, 42M Cache, 204M Buf, 35M Free
Sun, 11 Apr 2010 16:00:02 +0000 Mem: 24M Active, 1486M Inact, 292M
Wired, 42M Cache, 204M Buf, 35M Free
Sun, 11 Apr 2010 18:00:02 +0000 Mem: 24M Active, 1486M Inact, 292M
Wired, 42M Cache, 204M Buf, 35M Free
Mon, 12 Apr 2010 00:00:02 +0000 Mem: 25M Active, 1486M Inact, 293M
Wired, 42M Cache, 204M Buf, 33M Free
Mon, 12 Apr 2010 09:00:01 +0000 Mem: 18M Active, 19M Inact, 63M Wired,
348K Cache, 54M Buf, 1780M Free
Mon, 12 Apr 2010 11:00:02 +0000 Mem: 19M Active, 29M Inact, 93M Wired,
292K Cache, 93M Buf, 1739M Free
Mon, 12 Apr 2010 12:00:01 +0000 Mem: 19M Active, 33M Inact, 107M Wired,
292K Cache, 110M Buf, 1720M Free
Mon, 12 Apr 2010 14:00:02 +0000 Mem: 19M Active, 33M Inact, 104M Wired,
292K Cache, 110M Buf, 1723M Free
Mon, 12 Apr 2010 16:00:01 +0000 Mem: 20M Active, 33M Inact, 120M Wired,
292K Cache, 125M Buf, 1707M Free
Mon, 12 Apr 2010 18:00:01 +0000 Mem: 16M Active, 78M Inact, 204M Wired,
292K Cache, 204M Buf, 1581M Free
Tue, 13 Apr 2010 00:00:01 +0000 Mem: 17M Active, 79M Inact, 205M Wired,
292K Cache, 204M Buf, 1579M Free
Tue, 13 Apr 2010 09:00:01 +0000 Mem: 16M Active, 81M Inact, 205M Wired,
292K Cache, 204M Buf, 1578M Free
Tue, 13 Apr 2010 11:00:02 +0000 Mem: 25M Active, 1497M Inact, 263M
Wired, 43M Cache, 204M Buf, 52M Free
Tue, 13 Apr 2010 12:00:01 +0000 Mem: 26M Active, 1498M Inact, 263M
Wired, 40M Cache, 204M Buf, 53M Free
Tue, 13 Apr 2010 16:00:01 +0000 Mem: 24M Active, 1507M Inact, 268M
Wired, 30M Cache, 204M Buf, 50M Free


See how after the machine rebooted on Monday morning most of it's 2GB
was counted as Free and there was a small but growing amount of Inact
for about 24 hours until suddenly in the space of 2 hours on a Tuesday
morning it dropped back down to just a couple of dozen MB Free and over
a gig and a half Inact. That is how it normally runs and I had assumed
was normal until now.

Is it normal to have so much Inact and so little Free? and might such a
shortage of Free panic the machine into rebooting if an app asks for
more than I have?

Reading up on the meaning of Inact and Free it seems that Inact is
"dirty" memory and needs to be paged out before it can be classed as
Free and reused. Is that right? If so does that mean you shouldn't ever
run a freeBSD system without swap?

I'm using the latest "Stable" and "embedded" version of freeNAS (0.7)
which the authors recommend is run from a flash drive. Given that is the
recommended configuration I wouldn't have thought it would have any swap
space at all and I can't see where it might be if it does...

Free disk space:
----------------
Filesystem Size Used Avail Capacity Mounted on
/dev/md0 92M 88M 4.5M 95% /
devfs 1.0K 1.0K 0B 100% /dev
procfs 4.0K 4.0K 0B 100% /proc
/dev/da1p1 1.8G 26K 1.7G 0% /mnt/sys
/dev/mirror/deltaraidp1 1.3T 155G 1.1T 12% /mnt/delta
/dev/md1 31M 4.0K 28M 0% /var
/dev/da0a 38M 37M 1.4M 96% /cf

Alternatively could the lack of free space on any of the above
partitions be a cause of seemingly random rebooting?

Thanks for your help and sorry for the painfully long message!

Roger.
From: Dominic Fandrey on
On 13/04/2010 19:49, r0g wrote:
> I built a freeNAS (0.7.x) for someone recently and it seems to have been
> running fine since I installed it over a month ago. I just got to the
> stage where I'd stopped worrying about it as it had 20 days uptime since
> the last reboot when, out of nowhere, it seems to have spontaneously
> rebooted itself in the middle of the night.

Are you really certain that it did reboot? Or are you just gleaming
this from the memory reports?

I cannot really give helpful advice, but maybe there's something that
points you into a different direction.

> ...
>
>
> See how after the machine rebooted on Monday morning most of it's 2GB
> was counted as Free and there was a small but growing amount of Inact
> for about 24 hours until suddenly in the space of 2 hours on a Tuesday
> morning it dropped back down to just a couple of dozen MB Free and over
> a gig and a half Inact. That is how it normally runs and I had assumed
> was normal until now.
>
> Is it normal to have so much Inact and so little Free? and might such a
> shortage of Free panic the machine into rebooting if an app asks for
> more than I have?

Well, free memory is wasted memory. So FreeBSD attempts to use it as
soon as an opportunity arises. File system caching can be used to
use up memory in a way that doesn't hurt (i.e. unchanged files in
memory can just be discarded if the space is needed for something else).

The 24 hours seem significant for me, because that is the time after
which inactive processes (like the gettys on remote controlled machines)
get pushed into swap. Because you said you don't have swap that cannot
occur, but I wonder whether there are more memory freeing mec

> Reading up on the meaning of Inact and Free it seems that Inact is
> "dirty" memory and needs to be paged out before it can be classed as
> Free and reused. Is that right? If so does that mean you shouldn't ever
> run a freeBSD system without swap?

Hmm, I never found satisfying descriptions of what these things mean.
But it always appeared to me that they adopt to the available memory,
so that doesn't look worrying to me.

Regards

--
A: Because it fouls the order in which people normally read text.
Q: Why is top-posting such a bad thing?
A: Top-posting.
Q: What is the most annoying thing on usenet and in e-mail?
From: Stefan Ollermann on
r0g wrote:

> I built a freeNAS (0.7.x) for someone recently and it seems to have been
> running fine since I installed it over a month ago. I just got to the
> stage where I'd stopped worrying about it as it had 20 days uptime since
> the last reboot when, out of nowhere, it seems to have spontaneously
> rebooted itself in the middle of the night.

Maybe a power or hardware (heat) failure?

> The machine is several miles away and I couldn't get their router to
> allow ssh through so I can't remotely administer it. What I do have
> though are the status reports (several per day)...
>
> The only thing that looks weird in the dmesg log is the last entry:
>
> pid 19683 (php), uid 0: exited on signal 11
>
> Of course I have no idea what time that was (any idea how to get dmesg
> to log times too?) or if it is related. There certainly shouldn't have
> been anything using PHP around the time of the reboot (about 4:30am) as
> nobody was present and there are no cron jobs for 4 hours either side of
> it.

Doesnt freeNAS log to /var/log/messages ?

<snip>

> Thanks for your help and sorry for the painfully long message!
>
> Roger.

Stefan

From: r0g on
Stefan Ollermann wrote:
> r0g wrote:
>
>> I built a freeNAS (0.7.x) for someone recently and it seems to have been
>> running fine since I installed it over a month ago. I just got to the
>> stage where I'd stopped worrying about it as it had 20 days uptime since
>> the last reboot when, out of nowhere, it seems to have spontaneously
>> rebooted itself in the middle of the night.
>
> Maybe a power or hardware (heat) failure?
>

Nah, I don't think so, certainly drive temps have never even hit 30 deg
C, the last load reading was 0.0 0.0 0.0 and the machine is on a
reasonable UPS that should do a clean shutdown if power drops and the
subsequent status report said the filesystem was not cleanly dismounted.

Dominic, I know it must have rebooted or restarted in some sense as my
uptime went from 20 odd days back down to 4 hours between status
reports. Having said that I did notice the "power cycled" parameter in
the SMART logs didn't increment so power to the drives must have been
maintained.

All of this makes me think software error.




>> The machine is several miles away and I couldn't get their router to
>> allow ssh through so I can't remotely administer it. What I do have
>> though are the status reports (several per day)...
>>
>> The only thing that looks weird in the dmesg log is the last entry:
>>
>> pid 19683 (php), uid 0: exited on signal 11
>>
>> Of course I have no idea what time that was (any idea how to get dmesg
>> to log times too?) or if it is related. There certainly shouldn't have
>> been anything using PHP around the time of the reboot (about 4:30am) as
>> nobody was present and there are no cron jobs for 4 hours either side of
>> it.
>
> Doesnt freeNAS log to /var/log/messages ?
>

Err, quite possibly, I was just hoping to avoid biking into central
London to check! Sadly when I set it up I couldn't get their godawful BT
"business" router to let ssh through (on an arbitrary high numbered port
natch) before I ran out of time so I can't log in remotely and browse
/var/anything right now :(

Dominic, the more I read the more I agree with you. At first Iwondered
if one of the components was leaking memory but it seems it's quite
normal for bsd to use most of the available memory so the behaviour I
observed isn't really strange at all. Which of course leaves me with a
mystery still! Hmm....

It seems to filesystems thing is a dead end too, I've mapped each
partition's usage over time and, with the exception of the big 1.4TB
partition, they have remained almost entirely constant since it was
first switched on.

Looks like I might have to pay them a visit after all :/

Cheers,

Roger.