From: Jens Axboe on
On 08/10/2010 12:35 PM, Jeff Layton wrote:
> On Tue, 10 Aug 2010 12:10:05 -0400
> Jens Axboe <axboe(a)kernel.dk> wrote:
>
>> On 08/10/2010 10:27 AM, Jeff Layton wrote:
>>> On Tue, 10 Aug 2010 10:22:41 -0400
>>> Jeff Moyer <jmoyer(a)redhat.com> wrote:
>>>
>>>> Jeff Layton <jlayton(a)redhat.com> writes:
>>>>
>>>>> Saw this oops on my test machine this morning. I rebooted the machine
>>>>> last night and hadn't done anything on it other than log in this
>>>>> morning. The kernel here is based on Steve French's git tree, which is
>>>>> based on Linus' as of Sunday Aug 8th. Last non-cifs commit is:
>>>>
>>>> This looks a lot like this bug:
>>>> https://bugzilla.redhat.com/show_bug.cgi?id=577968
>>>>
>>>> See also:
>>>> http://kerneloops.org/guilty.php?guilty=cfq_free_io_context&version=2.6.34-rc&start=2228224&end=2260991&class=oops
>>>>
>>>> It's been around since 2.6.30.8 according to kerneloops.org. If you
>>>> find that you have a reliable way of reproducing the issue, that would
>>>> be great.
>>>>
>>>
>>> Ok, thanks -- no clear reproducer so far. This morning was the
>>> first time I've seen it and it was on the console of my rawhide
>>> machine. The last thing I did with it was reboot it last night. I
>>> suspect that the gzip process came from a cron job or something.
>>
>> What version did you hit it on?
>>
>
> It was a kernel built out of git, based on Steve French's git tree. The
> last commit from Linus in it was
> 45d7f32c7a43cbb9592886d38190e379e2eb2226. Everything else on top of
> that was patches that only touched cifs code. cifs.ko hadn't been
> plugged in since it was rebooted.

OK. That bug is pretty elusive, so far I haven't been able to figure
out what the heck is going on here and my attempts at reproducing
have all failed. The reports so far seem to have the cron component
in common. Does fedora ionice some cron jobs or anything like that?
Or use CLONE_IO?

--
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Jeff Layton on
On Tue, 10 Aug 2010 19:58:41 -0400
Jens Axboe <axboe(a)kernel.dk> wrote:

> On 08/10/2010 12:35 PM, Jeff Layton wrote:
> > On Tue, 10 Aug 2010 12:10:05 -0400
> > Jens Axboe <axboe(a)kernel.dk> wrote:
> >
> >> On 08/10/2010 10:27 AM, Jeff Layton wrote:
> >>> On Tue, 10 Aug 2010 10:22:41 -0400
> >>> Jeff Moyer <jmoyer(a)redhat.com> wrote:
> >>>
> >>>> Jeff Layton <jlayton(a)redhat.com> writes:
> >>>>
> >>>>> Saw this oops on my test machine this morning. I rebooted the machine
> >>>>> last night and hadn't done anything on it other than log in this
> >>>>> morning. The kernel here is based on Steve French's git tree, which is
> >>>>> based on Linus' as of Sunday Aug 8th. Last non-cifs commit is:
> >>>>
> >>>> This looks a lot like this bug:
> >>>> https://bugzilla.redhat.com/show_bug.cgi?id=577968
> >>>>
> >>>> See also:
> >>>> http://kerneloops.org/guilty.php?guilty=cfq_free_io_context&version=2.6.34-rc&start=2228224&end=2260991&class=oops
> >>>>
> >>>> It's been around since 2.6.30.8 according to kerneloops.org. If you
> >>>> find that you have a reliable way of reproducing the issue, that would
> >>>> be great.
> >>>>
> >>>
> >>> Ok, thanks -- no clear reproducer so far. This morning was the
> >>> first time I've seen it and it was on the console of my rawhide
> >>> machine. The last thing I did with it was reboot it last night. I
> >>> suspect that the gzip process came from a cron job or something.
> >>
> >> What version did you hit it on?
> >>
> >
> > It was a kernel built out of git, based on Steve French's git tree. The
> > last commit from Linus in it was
> > 45d7f32c7a43cbb9592886d38190e379e2eb2226. Everything else on top of
> > that was patches that only touched cifs code. cifs.ko hadn't been
> > plugged in since it was rebooted.
>
> OK. That bug is pretty elusive, so far I haven't been able to figure
> out what the heck is going on here and my attempts at reproducing
> have all failed. The reports so far seem to have the cron component
> in common. Does fedora ionice some cron jobs or anything like that?
> Or use CLONE_IO?
>

Yes. I sort of doubt anything there would use CLONE_IO, but ionice is
definitely used. Fedora uses anacron. I don't see any explicit calls to
gzip in there, but it's possible something else is calling it:

# grep ionice /etc/cron.*/*
/etc/cron.daily/mlocate.cron:ionice -c2 -n7 -p $$ >/dev/null 2>&1
/etc/cron.daily/readahead.cron:ionice -c3 -p $$ >/dev/null 2>&1

# cat /etc/anacrontab
# /etc/anacrontab: configuration file for anacron

# See anacron(8) and anacrontab(5) for details.

SHELL=/bin/sh
PATH=/sbin:/bin:/usr/sbin:/usr/bin
MAILTO=root
# the maximal random delay added to the base delay of the jobs
RANDOM_DELAY=45
# the jobs will be started during the following hours only
START_HOURS_RANGE=3-22

#period in days delay in minutes job-identifier command
1 5 cron.daily nice run-parts /etc/cron.daily
7 25 cron.weekly nice run-parts /etc/cron.weekly
@monthly 45 cron.monthly nice run-parts /etc/cron.monthly

--
Jeff Layton <jlayton(a)redhat.com>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Jens Axboe on
On 08/10/2010 09:23 PM, Jeff Layton wrote:
> On Tue, 10 Aug 2010 19:58:41 -0400
> Jens Axboe <axboe(a)kernel.dk> wrote:
>
>> On 08/10/2010 12:35 PM, Jeff Layton wrote:
>>> On Tue, 10 Aug 2010 12:10:05 -0400
>>> Jens Axboe <axboe(a)kernel.dk> wrote:
>>>
>>>> On 08/10/2010 10:27 AM, Jeff Layton wrote:
>>>>> On Tue, 10 Aug 2010 10:22:41 -0400
>>>>> Jeff Moyer <jmoyer(a)redhat.com> wrote:
>>>>>
>>>>>> Jeff Layton <jlayton(a)redhat.com> writes:
>>>>>>
>>>>>>> Saw this oops on my test machine this morning. I rebooted the machine
>>>>>>> last night and hadn't done anything on it other than log in this
>>>>>>> morning. The kernel here is based on Steve French's git tree, which is
>>>>>>> based on Linus' as of Sunday Aug 8th. Last non-cifs commit is:
>>>>>>
>>>>>> This looks a lot like this bug:
>>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=577968
>>>>>>
>>>>>> See also:
>>>>>> http://kerneloops.org/guilty.php?guilty=cfq_free_io_context&version=2.6.34-rc&start=2228224&end=2260991&class=oops
>>>>>>
>>>>>> It's been around since 2.6.30.8 according to kerneloops.org. If you
>>>>>> find that you have a reliable way of reproducing the issue, that would
>>>>>> be great.
>>>>>>
>>>>>
>>>>> Ok, thanks -- no clear reproducer so far. This morning was the
>>>>> first time I've seen it and it was on the console of my rawhide
>>>>> machine. The last thing I did with it was reboot it last night. I
>>>>> suspect that the gzip process came from a cron job or something.
>>>>
>>>> What version did you hit it on?
>>>>
>>>
>>> It was a kernel built out of git, based on Steve French's git tree. The
>>> last commit from Linus in it was
>>> 45d7f32c7a43cbb9592886d38190e379e2eb2226. Everything else on top of
>>> that was patches that only touched cifs code. cifs.ko hadn't been
>>> plugged in since it was rebooted.
>>
>> OK. That bug is pretty elusive, so far I haven't been able to figure
>> out what the heck is going on here and my attempts at reproducing
>> have all failed. The reports so far seem to have the cron component
>> in common. Does fedora ionice some cron jobs or anything like that?
>> Or use CLONE_IO?
>>
>
> Yes. I sort of doubt anything there would use CLONE_IO, but ionice is
> definitely used. Fedora uses anacron. I don't see any explicit calls to
> gzip in there, but it's possible something else is calling it:
>
> # grep ionice /etc/cron.*/*
> /etc/cron.daily/mlocate.cron:ionice -c2 -n7 -p $$ >/dev/null 2>&1
> /etc/cron.daily/readahead.cron:ionice -c3 -p $$ >/dev/null 2>&1
>
> # cat /etc/anacrontab
> # /etc/anacrontab: configuration file for anacron
>
> # See anacron(8) and anacrontab(5) for details.
>
> SHELL=/bin/sh
> PATH=/sbin:/bin:/usr/sbin:/usr/bin
> MAILTO=root
> # the maximal random delay added to the base delay of the jobs
> RANDOM_DELAY=45
> # the jobs will be started during the following hours only
> START_HOURS_RANGE=3-22
>
> #period in days delay in minutes job-identifier command
> 1 5 cron.daily nice run-parts /etc/cron.daily
> 7 25 cron.weekly nice run-parts /etc/cron.weekly
> @monthly 45 cron.monthly nice run-parts /etc/cron.monthly

ionice must be a deciding factor in this, perhaps coupled with something
else. Otherwise we would be seeing a lot more of these.

--
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Jeff Moyer on
Jens Axboe <axboe(a)kernel.dk> writes:

>> #period in days delay in minutes job-identifier command
>> 1 5 cron.daily nice run-parts /etc/cron.daily
>> 7 25 cron.weekly nice run-parts /etc/cron.weekly
>> @monthly 45 cron.monthly nice run-parts /etc/cron.monthly
>
> ionice must be a deciding factor in this, perhaps coupled with something
> else. Otherwise we would be seeing a lot more of these.

Well, what's really strange is that this is only affecting f14. I'm
installing a system and I'll see if I can't reproduce it.

Cheers,
Jeff
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/