From: Vivek Goyal on
On Tue, Jul 13, 2010 at 03:38:11PM -0400, Jeff Moyer wrote:
> Corrado Zoccolo <czoccolo(a)gmail.com> writes:
>
> > Can you test the attached patch, where I also added your changes to
> > make jbd(2) to perform sync writes?
>
> I got new storage, so I have new numbers. I only re-ran deadline and
> vanilla cfq for the fs_mark only test. The average of 10 runs comes out
> like so:
>
> deadline: 571.98
> vanilla cfq: 107.42
> patched cfq: 460.9
>
> Mixed workload results with your suggested patch:
>
> fs_mark: 15.65 files/sec
> fio: 132.5 MB/s
>
> So, again, not looking great for the mixed workload, but the patch
> does improve the fs_mark only case. Looking at the blktrace data shows
> that the jbd2 thread preempts the fs_mark thread at all the right
> times. The only thing holding throughput back is the whole notion that
> we need to only dispatch from one queue (even though the storage is
> capable of serving both the reads and writes simultaneously).
>
> I added in the patch that allows the simultaneous dispatch of both reads
> and writes, and here are the results from that run:
>
> fs_mark: 15.975 files/sec
> fio: 132.4 MB/s
>
> So, it looks like that didn't help. The reason this patch doesn't come
> close to the yield patch in the mixed workload is because the yield
> patch set allows the fs_mark process to continue to issue I/O. With
> your patch, the fs_mark process does 64KB of I/O, the jbd2 thread does
> the journal commit, and then the fio process runs again. Given that the
> fs_mark process typically only uses a small fraction of its time slice,
> you end up with an unfair balance.

Hi Jeff,

This is little strange. Given the fact that now both fs_mark and jbd
threads are on sync-noidle tree, we should have idled on sync-noidle
tree to provide fairness and that should have made sure that fs_mark/jbd
do more IO and slice is not lost to fio thread.

Not sure what is happening though in practice. Only you can look at
traces more closely and see if timer is being armed or not.

Thanks
Vivek
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Jeff Moyer on
Vivek Goyal <vgoyal(a)redhat.com> writes:

> On Tue, Jul 13, 2010 at 03:38:11PM -0400, Jeff Moyer wrote:
>> Corrado Zoccolo <czoccolo(a)gmail.com> writes:
>>
>> > Can you test the attached patch, where I also added your changes to
>> > make jbd(2) to perform sync writes?
>>
>> I got new storage, so I have new numbers. I only re-ran deadline and
>> vanilla cfq for the fs_mark only test. The average of 10 runs comes out
>> like so:
>>
>> deadline: 571.98
>> vanilla cfq: 107.42
>> patched cfq: 460.9
>>
>> Mixed workload results with your suggested patch:
>>
>> fs_mark: 15.65 files/sec
>> fio: 132.5 MB/s
>>
>> So, again, not looking great for the mixed workload, but the patch
>> does improve the fs_mark only case. Looking at the blktrace data shows
>> that the jbd2 thread preempts the fs_mark thread at all the right
>> times. The only thing holding throughput back is the whole notion that
>> we need to only dispatch from one queue (even though the storage is
>> capable of serving both the reads and writes simultaneously).
>>
>> I added in the patch that allows the simultaneous dispatch of both reads
>> and writes, and here are the results from that run:
>>
>> fs_mark: 15.975 files/sec
>> fio: 132.4 MB/s
>>
>> So, it looks like that didn't help. The reason this patch doesn't come
>> close to the yield patch in the mixed workload is because the yield
>> patch set allows the fs_mark process to continue to issue I/O. With
>> your patch, the fs_mark process does 64KB of I/O, the jbd2 thread does
>> the journal commit, and then the fio process runs again. Given that the
>> fs_mark process typically only uses a small fraction of its time slice,
>> you end up with an unfair balance.
>
> Hi Jeff,
>
> This is little strange. Given the fact that now both fs_mark and jbd
> threads are on sync-noidle tree, we should have idled on sync-noidle
> tree to provide fairness and that should have made sure that fs_mark/jbd
> do more IO and slice is not lost to fio thread.
>
> Not sure what is happening though in practice. Only you can look at
> traces more closely and see if timer is being armed or not.

Vivek, if you want to look at traces, just ask. I'd be happy to show
them to you, upload them, whatever. I'm not sure why you think
otherwise (though I wouldn't blame you for not wanting to look at
them!).

Now, to answer your question, the jbd2 thread runs and issues a barrier,
which causes a forced dispatch of requests. After that a new queue is
selected, and since the fs_mark thread is blocked on the journal commit,
it's always the fio process that gets to run.

This, of course, raises the question of why the blk_yield patches didn't
run into the same problem. Looking back at some saved traces, I don't
see WBS (write barrier sync) requests, so I wonder if barriers weren't
supported by my last storage system.

Cheers,
Jeff
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Vivek Goyal on
On Tue, Jul 13, 2010 at 04:30:23PM -0400, Jeff Moyer wrote:
> Vivek Goyal <vgoyal(a)redhat.com> writes:
>
> > On Tue, Jul 13, 2010 at 03:38:11PM -0400, Jeff Moyer wrote:
> >> Corrado Zoccolo <czoccolo(a)gmail.com> writes:
> >>
> >> > Can you test the attached patch, where I also added your changes to
> >> > make jbd(2) to perform sync writes?
> >>
> >> I got new storage, so I have new numbers. I only re-ran deadline and
> >> vanilla cfq for the fs_mark only test. The average of 10 runs comes out
> >> like so:
> >>
> >> deadline: 571.98
> >> vanilla cfq: 107.42
> >> patched cfq: 460.9
> >>
> >> Mixed workload results with your suggested patch:
> >>
> >> fs_mark: 15.65 files/sec
> >> fio: 132.5 MB/s
> >>
> >> So, again, not looking great for the mixed workload, but the patch
> >> does improve the fs_mark only case. Looking at the blktrace data shows
> >> that the jbd2 thread preempts the fs_mark thread at all the right
> >> times. The only thing holding throughput back is the whole notion that
> >> we need to only dispatch from one queue (even though the storage is
> >> capable of serving both the reads and writes simultaneously).
> >>
> >> I added in the patch that allows the simultaneous dispatch of both reads
> >> and writes, and here are the results from that run:
> >>
> >> fs_mark: 15.975 files/sec
> >> fio: 132.4 MB/s
> >>
> >> So, it looks like that didn't help. The reason this patch doesn't come
> >> close to the yield patch in the mixed workload is because the yield
> >> patch set allows the fs_mark process to continue to issue I/O. With
> >> your patch, the fs_mark process does 64KB of I/O, the jbd2 thread does
> >> the journal commit, and then the fio process runs again. Given that the
> >> fs_mark process typically only uses a small fraction of its time slice,
> >> you end up with an unfair balance.
> >
> > Hi Jeff,
> >
> > This is little strange. Given the fact that now both fs_mark and jbd
> > threads are on sync-noidle tree, we should have idled on sync-noidle
> > tree to provide fairness and that should have made sure that fs_mark/jbd
> > do more IO and slice is not lost to fio thread.
> >
> > Not sure what is happening though in practice. Only you can look at
> > traces more closely and see if timer is being armed or not.
>
> Vivek, if you want to look at traces, just ask. I'd be happy to show
> them to you, upload them, whatever. I'm not sure why you think
> otherwise (though I wouldn't blame you for not wanting to look at
> them!).

I don't mind looking at traces. Do let me know where can I access those.

>
> Now, to answer your question, the jbd2 thread runs and issues a barrier,
> which causes a forced dispatch of requests. After that a new queue is
> selected, and since the fs_mark thread is blocked on the journal commit,
> it's always the fio process that gets to run.

Ok, that explains it. So somehow after the barrier, fio always wins
as issues next read request before the fs_mark is able to issue the
next set of writes.

>
> This, of course, raises the question of why the blk_yield patches didn't
> run into the same problem. Looking back at some saved traces, I don't
> see WBS (write barrier sync) requests, so I wonder if barriers weren't
> supported by my last storage system.

I think that blk_yield patches will also run into the same issue if
barriers are enabled.

Thanks
Vivek
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Jeff Moyer on
Jeff Moyer <jmoyer(a)redhat.com> writes:

> This, of course, raises the question of why the blk_yield patches didn't
> run into the same problem. Looking back at some saved traces, I don't
> see WBS (write barrier sync) requests, so I wonder if barriers weren't
> supported by my last storage system.

So, I tested Corrado's approach with -o nobarrier, and here are the
results:

fs_mark: 363.291 files/sec
fio: 38.5 MB/s

I don't have time to analyze the data right now, and it's 600MB worth of
binary output. If you want, I can upload a representative sample
somewhere, let me know.

Anyway, I'll post an analysis tomorrow.

Cheers,
Jeff
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Jeff Moyer on
Vivek Goyal <vgoyal(a)redhat.com> writes:

> On Tue, Jul 13, 2010 at 04:30:23PM -0400, Jeff Moyer wrote:
>> Vivek Goyal <vgoyal(a)redhat.com> writes:

> I don't mind looking at traces. Do let me know where can I access those.

Forwarded privately.

>> Now, to answer your question, the jbd2 thread runs and issues a barrier,
>> which causes a forced dispatch of requests. After that a new queue is
>> selected, and since the fs_mark thread is blocked on the journal commit,
>> it's always the fio process that gets to run.
>
> Ok, that explains it. So somehow after the barrier, fio always wins
> as issues next read request before the fs_mark is able to issue the
> next set of writes.
>
>>
>> This, of course, raises the question of why the blk_yield patches didn't
>> run into the same problem. Looking back at some saved traces, I don't
>> see WBS (write barrier sync) requests, so I wonder if barriers weren't
>> supported by my last storage system.
>
> I think that blk_yield patches will also run into the same issue if
> barriers are enabled.

Agreed.

Here are the results again with barriers disabled for Corrado's patch:

fs_mark: 348.2 files/sec
fio: 53324.6 KB/s

Remember that deadline was seeing 450 files/sec and 78 MB/s. So, in
this case, the buffered reader appears to be starved. Looking into this
further, I found that the journal thread is running with I/O priority 0,
while the fio and fs_mark processes are running at the default (4).
Because the jbd thread has a higher I/O priority, its requests are
always closer to the front of the sort list, and thus the sync-noidle
workload is chosen more often than the sync workload. This essentially
results in an elevated I/O priority for the fs_mark process as well.
While troubling, that problem is not directly related to the problem
we're looking at.

So, I'm still in favor of Corrado's approach. Are there any remaining
dissenting opinions on this?

Cheers,
Jeff
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/