From: Vivek Goyal on
On Mon, Jul 19, 2010 at 12:08:23PM -0400, Jeff Moyer wrote:
> Vivek Goyal <vgoyal(a)redhat.com> writes:
>
> > On Tue, Jul 13, 2010 at 04:30:23PM -0400, Jeff Moyer wrote:
> >> Vivek Goyal <vgoyal(a)redhat.com> writes:
>
> > I don't mind looking at traces. Do let me know where can I access those.
>
> Forwarded privately.
>
> >> Now, to answer your question, the jbd2 thread runs and issues a barrier,
> >> which causes a forced dispatch of requests. After that a new queue is
> >> selected, and since the fs_mark thread is blocked on the journal commit,
> >> it's always the fio process that gets to run.
> >
> > Ok, that explains it. So somehow after the barrier, fio always wins
> > as issues next read request before the fs_mark is able to issue the
> > next set of writes.
> >
> >>
> >> This, of course, raises the question of why the blk_yield patches didn't
> >> run into the same problem. Looking back at some saved traces, I don't
> >> see WBS (write barrier sync) requests, so I wonder if barriers weren't
> >> supported by my last storage system.
> >
> > I think that blk_yield patches will also run into the same issue if
> > barriers are enabled.
>
> Agreed.
>
> Here are the results again with barriers disabled for Corrado's patch:
>
> fs_mark: 348.2 files/sec
> fio: 53324.6 KB/s
>
> Remember that deadline was seeing 450 files/sec and 78 MB/s. So, in
> this case, the buffered reader appears to be starved. Looking into this
> further, I found that the journal thread is running with I/O priority 0,
> while the fio and fs_mark processes are running at the default (4).
> Because the jbd thread has a higher I/O priority, its requests are
> always closer to the front of the sort list, and thus the sync-noidle
> workload is chosen more often than the sync workload. This essentially
> results in an elevated I/O priority for the fs_mark process as well.
> While troubling, that problem is not directly related to the problem
> we're looking at.
>
> So, I'm still in favor of Corrado's approach. Are there any remaining
> dissenting opinions on this?

Nope. I am fine with moving all WRITE_SYNC with RQ_NOIDLE to sync-noidle
tree and also marking jbd writes as WRITE_SYNC. By bringing dependent
threads on single service tree, we don't have to worry about slice
yielding.

Acked-by: Vivek Goyal <vgoyal(a)redhat.com>

Thanks
Vivek
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Jeff Moyer on
Vivek Goyal <vgoyal(a)redhat.com> writes:

> On Mon, Jul 19, 2010 at 12:08:23PM -0400, Jeff Moyer wrote:
>> So, I'm still in favor of Corrado's approach. Are there any remaining
>> dissenting opinions on this?
>
> Nope. I am fine with moving all WRITE_SYNC with RQ_NOIDLE to sync-noidle
> tree and also marking jbd writes as WRITE_SYNC. By bringing dependent
> threads on single service tree, we don't have to worry about slice
> yielding.
>
> Acked-by: Vivek Goyal <vgoyal(a)redhat.com>

Corrado, would you mind reposting the patches and adding:

Reviewed-by: Jeff Moyer <jmoyer(a)redhat.com>
Tested-by: Jeff Moyer <jmoyer(a)redhat.com>

Thanks!
Jeff
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Christoph Hellwig on
Didn't you guys have a previous iteration of the fixes that gets
rid of REQ_NODILE by improving the heuristics inside cfq? That
would be much, much preffered from the filesystem point of view.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Vivek Goyal on
On Tue, Jul 20, 2010 at 10:11:03AM -0400, Christoph Hellwig wrote:
> Didn't you guys have a previous iteration of the fixes that gets
> rid of REQ_NODILE by improving the heuristics inside cfq? That
> would be much, much preffered from the filesystem point of view.

Actually in this patch, I was thinking we can probably get rid of
RQ_NOIDLE flag and just check for WRITE_SYNC. Any WRITE_SYNC queue
gets served on sync-noidle tree. I am wondering will we not face jbd
thread issues with direct writes also? If yes, then not special casing
direct IO writes and treat them same as O_SYNC writes will make sense.

I really wished that we had some blktrace of some standard workloads
stored somewhere which we could simply replay using "btreplay" and come
to some kind of conclusion whenever we are faced with taking such
decisions.

Thanks
Vivek

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Corrado Zoccolo on
On Tue, Jul 20, 2010 at 4:26 PM, Vivek Goyal <vgoyal(a)redhat.com> wrote:
> On Tue, Jul 20, 2010 at 10:11:03AM -0400, Christoph Hellwig wrote:
>> Didn't you guys have a previous iteration of the fixes that gets
>> rid of REQ_NODILE by improving the heuristics inside cfq?  That
>> would be much, much preffered from the filesystem point of view.
I think the previous iteration required more complex heuristics, while
this one uses existing ones to handle one more class of problems.
I understand that you still see the complexity from the fs side, but
Vivek's proposal may help also there. It only needs to be tested thoroughly.

>
> Actually in this patch, I was thinking we can probably get rid of
> RQ_NOIDLE flag and just check for WRITE_SYNC. Any WRITE_SYNC queue
> gets served on sync-noidle tree. I am wondering will we not face jbd
> thread issues with direct writes also? If yes, then not special casing
> direct IO writes and treat them same as O_SYNC writes will make sense.

Probably it is better to submit this first, since it is already
tested, and then have a different patch that can finish the work
This will help when bisecting for possible regressions, since I'm not
sure why the other writes are not already marked with RQ_NOIDLE (maybe
it was introduced for some good reason to distinguish the two sets,
and we won't know unless we find the workload where it helped).
I'll resend the current patch with Jeff's reviewed and tested tags.

Corrado

>
> I really wished that we had some blktrace of some standard workloads
> stored somewhere which we could simply replay using "btreplay" and come
> to some kind of conclusion whenever we are faced with taking such
> decisions.
>
> Thanks
> Vivek
>
>



--
__________________________________________________________________________

dott. Corrado Zoccolo                          mailto:czoccolo(a)gmail.com
PhD - Department of Computer Science - University of Pisa, Italy
--------------------------------------------------------------------------
The self-confidence of a warrior is not the self-confidence of the average
man. The average man seeks certainty in the eyes of the onlooker and calls
that self-confidence. The warrior seeks impeccability in his own eyes and
calls that humbleness.
                               Tales of Power - C. Castaneda
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/