From: Corrado Zoccolo on
On Sun, Jun 27, 2010 at 5:44 PM, Jeff Moyer <jmoyer(a)redhat.com> wrote:
> Vivek Goyal <vgoyal(a)redhat.com> writes:
>
>> On Fri, Jun 25, 2010 at 01:03:20PM +0200, Christoph Hellwig wrote:
>>> On Wed, Jun 23, 2010 at 09:44:20PM -0400, Vivek Goyal wrote:
>>> I see the point of this logic for reads where various workloads have
>>> dependent reads that might be close to each other, but I don't really
>>> see any point for writes.
>>>
>>> > So looks like fsync path will do bunch of IO and then will wait for jbd thread
>>> > to finish the work. In this case idling is waste of time.
>>>
>>> Given that ->writepage already does WRITE_SYNC_PLUG I/O which includes
>>> REQ_NODILE I'm still confused why we still have that issue.
>>
>> In current form, cfq honors REQ_NOIDLE conditionally and that's why we
>> still have the issue. If you look at cfq_completed_request(), we continue
>> to idle in following two cases.
>>
>> - If we classifed the queue as SYNC_WORKLOAD.
>> - If there is another random read/write happening on sync-noidle service
>>   tree.
>>
>> SYNC_WORKLOAD means that cfq thinks this particular queue is doing sequential
>> IO. For random IO queues, we don't idle on each individual queue but a
>> group of queue.
>>
>> In jeff's testing, fsync thread/queue sometimes is viewed as sequential
>> workload and goes on SYNC_WORKLOAD tree. In that case even if request is
>> REQ_NOIDLE, we will continue to idle hence fsync issue.
>
> I'm now testing OCFS2, and I'm seeing performance that is not great
> (even with the blk_yield patches applied).  What happens is that we
> successfully yield the queue to the journal thread, but then idle on the
> journal thread (even though RQ_NOIDLE was set).
>
> So, can we just get rid of idling when RQ_NOIDLE is set?
Hi Jeff,
I think I spotted a problem with the initial implementation of the
tree-wide idle when RQ_NOIDLE is set: I assumed that a queue would
either send possibly-idling requests or no-idle requests, but it seems
that RQ_NOIDLE is being used to mark the end of a stream of
possibly-idling requests (in my initial implementation, this will then
cause an unintended idle). The attached patch should fix it, and I
think the logic is simpler than Vivek's. Can you give it a spin?
Otherwise, I think that reverting the "noidle_tree_requires_idle"
behaviour completely may be better than adding complexity, since it is
really trying to solve corner cases (that maybe happen only on
synthetic workloads), but affecting negatively more common cases.

About what it is trying to solve, since I think it was not clear:
- we have a workload of 2 queues, both issuing requests that are being
put in the no-idle tree (e.g. they are random) + 1 queue issuing
idling requests (e.g. sequential).
- if one of the 2 "random" queues marks its requests as RQ_NOIDLE,
then the timeslice for the no-idle tree is not preserved, causing
unfairness, as soon as an RQ_NOIDLE request is serviced and the tree
is empty.

Thanks
Corrado

>
> Vivek sent me this patch to test, and it got rid of the performance
> issue for the fsync workload.  Can we discuss its merits?
>
> Thanks,
> Jeff
>
> Index: linux-2.6/block/cfq-iosched.c
> ===================================================================
> --- linux-2.6.orig/block/cfq-iosched.c  2010-06-25 15:57:33.832125786 -0400
> +++ linux-2.6/block/cfq-iosched.c       2010-06-25 15:59:19.788876361 -0400
> @@ -318,6 +318,7 @@
>        CFQ_CFQQ_FLAG_split_coop,       /* shared cfqq will be splitted */
>        CFQ_CFQQ_FLAG_deep,             /* sync cfqq experienced large depth */
>        CFQ_CFQQ_FLAG_wait_busy,        /* Waiting for next request */
> +       CFQ_CFQQ_FLAG_group_idle,       /* This queue is doing group idle */
>  };
>
>  #define CFQ_CFQQ_FNS(name)                                             \
> @@ -347,6 +348,7 @@
>  CFQ_CFQQ_FNS(split_coop);
>  CFQ_CFQQ_FNS(deep);
>  CFQ_CFQQ_FNS(wait_busy);
> +CFQ_CFQQ_FNS(group_idle);
>  #undef CFQ_CFQQ_FNS
>
>  #ifdef CONFIG_CFQ_GROUP_IOSCHED
> @@ -1613,6 +1615,7 @@
>
>        cfq_clear_cfqq_wait_request(cfqq);
>        cfq_clear_cfqq_wait_busy(cfqq);
> +       cfq_clear_cfqq_group_idle(cfqq);
>
>        /*
>         * If this cfqq is shared between multiple processes, check to
> @@ -3176,6 +3179,13 @@
>        if (cfq_class_rt(new_cfqq) && !cfq_class_rt(cfqq))
>                return true;
>
> +       /*
> +        * If were doing group_idle and we got new request in same group,
> +        * preempt the queue
> +        */
> +       if (cfq_cfqq_group_idle(cfqq))
> +               return true;
> +
>        if (!cfqd->active_cic || !cfq_cfqq_wait_request(cfqq))
>                return false;
>
> @@ -3271,6 +3281,7 @@
>        struct cfq_queue *cfqq = RQ_CFQQ(rq);
>
>        cfq_log_cfqq(cfqd, cfqq, "insert_request");
> +       cfq_clear_cfqq_group_idle(cfqq);
>        cfq_init_prio_data(cfqq, RQ_CIC(rq)->ioc);
>
>        rq_set_fifo_time(rq, jiffies + cfqd->cfq_fifo_expire[rq_is_sync(rq)]);
> @@ -3416,10 +3427,12 @@
>                         * SYNC_NOIDLE_WORKLOAD idles at the end of the tree
>                         * only if we processed at least one !rq_noidle request
>                         */
> -                       if (cfqd->serving_type == SYNC_WORKLOAD
> -                           || cfqd->noidle_tree_requires_idle
> -                           || cfqq->cfqg->nr_cfqq == 1)
> +                       if (cfqd->noidle_tree_requires_idle)
> +                               cfq_arm_slice_timer(cfqd);
> +                       else if (cfqq->cfqg->nr_cfqq == 1) {
> +                               cfq_mark_cfqq_group_idle(cfqq);
>                                cfq_arm_slice_timer(cfqd);
> +                       }
>                }
>        }
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo(a)vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>



--
__________________________________________________________________________

dott. Corrado Zoccolo mailto:czoccolo(a)gmail.com
PhD - Department of Computer Science - University of Pisa, Italy
--------------------------------------------------------------------------
The self-confidence of a warrior is not the self-confidence of the average
man. The average man seeks certainty in the eyes of the onlooker and calls
that self-confidence. The warrior seeks impeccability in his own eyes and
calls that humbleness.
Tales of Power - C. Castaneda



Hi
From: Vivek Goyal on
On Tue, Jun 29, 2010 at 11:06:19AM +0200, Corrado Zoccolo wrote:

[..]
> > I'm now testing OCFS2, and I'm seeing performance that is not great
> > (even with the blk_yield patches applied). �What happens is that we
> > successfully yield the queue to the journal thread, but then idle on the
> > journal thread (even though RQ_NOIDLE was set).
> >
> > So, can we just get rid of idling when RQ_NOIDLE is set?
> Hi Jeff,
> I think I spotted a problem with the initial implementation of the
> tree-wide idle when RQ_NOIDLE is set: I assumed that a queue would
> either send possibly-idling requests or no-idle requests, but it seems
> that RQ_NOIDLE is being used to mark the end of a stream of
> possibly-idling requests (in my initial implementation, this will then
> cause an unintended idle). The attached patch should fix it, and I
> think the logic is simpler than Vivek's. Can you give it a spin?
> Otherwise, I think that reverting the "noidle_tree_requires_idle"
> behaviour completely may be better than adding complexity, since it is
> really trying to solve corner cases (that maybe happen only on
> synthetic workloads), but affecting negatively more common cases.
>

Hi Corrado,

I think you forgot to attach the patch? Can't find it.


> About what it is trying to solve, since I think it was not clear:
> - we have a workload of 2 queues, both issuing requests that are being
> put in the no-idle tree (e.g. they are random) + 1 queue issuing
> idling requests (e.g. sequential).
> - if one of the 2 "random" queues marks its requests as RQ_NOIDLE,
> then the timeslice for the no-idle tree is not preserved, causing
> unfairness, as soon as an RQ_NOIDLE request is serviced and the tree
> is empty.

I think Jeff's primary regressions were coming from the fact that we
will continue to idle on SYNC_WORKLOAD even if RQ_NOIDLE() was set.

Regarding giving up idling on sync-noidle workload, I think it still
makes some sense to keep track if some other random queue is doing IO on
that tree or not and if yes, then continue to idle. That's a different
thing that current logic if more coarse and could be fine grained a bit.

Because I don't have a practical workload example at this point of time, I
also don't mind reverting your old patch and restoring the policy of not
idling if RQ_NOIDLE() was set.

But it still does not answer the question that why O_DIRECT and O_SYNC
paths be different when it comes to RQ_NOIDLE.

Thanks
Vivek
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Corrado Zoccolo on
On Tue, Jun 29, 2010 at 2:30 PM, Vivek Goyal <vgoyal(a)redhat.com> wrote:
> On Tue, Jun 29, 2010 at 11:06:19AM +0200, Corrado Zoccolo wrote:
>
> [..]
>> > I'm now testing OCFS2, and I'm seeing performance that is not great
>> > (even with the blk_yield patches applied).  What happens is that we
>> > successfully yield the queue to the journal thread, but then idle on the
>> > journal thread (even though RQ_NOIDLE was set).
>> >
>> > So, can we just get rid of idling when RQ_NOIDLE is set?
>> Hi Jeff,
>> I think I spotted a problem with the initial implementation of the
>> tree-wide idle when RQ_NOIDLE is set: I assumed that a queue would
>> either send possibly-idling requests or no-idle requests, but it seems
>> that RQ_NOIDLE is being used to mark the end of a stream of
>> possibly-idling requests (in my initial implementation, this will then
>> cause an unintended idle). The attached patch should fix it, and I
>> think the logic is simpler than Vivek's. Can you give it a spin?
>> Otherwise, I think that reverting the "noidle_tree_requires_idle"
>> behaviour completely may be better than adding complexity, since it is
>> really trying to solve corner cases (that maybe happen only on
>> synthetic workloads), but affecting negatively more common cases.
>>
>
> Hi Corrado,
>
> I think you forgot to attach the patch? Can't find it.
No, it's there:
http://linux.derkeiler.com/Mailing-Lists/Kernel/2010-06/msg10854.html
>
>
>> About what it is trying to solve, since I think it was not clear:
>> - we have a workload of 2 queues, both issuing requests that are being
>> put in the no-idle tree (e.g. they are random) + 1 queue issuing
>> idling requests (e.g. sequential).
>> - if one of the 2 "random" queues marks its requests as RQ_NOIDLE,
>> then the timeslice for the no-idle tree is not preserved, causing
>> unfairness, as soon as an RQ_NOIDLE request is serviced and the tree
>> is empty.
>
> I think Jeff's primary regressions were coming from the fact that we
> will continue to idle on SYNC_WORKLOAD even if RQ_NOIDLE() was set.

I see. There are probably two ways of handling this:
- make sure the queues sending RQ_NOIDLE requests are marked as
SYNC_NOIDLE_WORKLOAD. Something like this should work, even if it will
increase switching between trees:
@@ -3046,6 +3046,7 @@ cfq_update_idle_window(struct cfq_data *cfqd,
struct cfq_queue *cfqq,
cfq_mark_cfqq_deep(cfqq);

if (!atomic_read(&cic->ioc->nr_tasks) || !cfqd->cfq_slice_idle ||
+ (cfqq->next_rq && rq_noidle(cfqq->next_rq)) ||
(!cfq_cfqq_deep(cfqq) && CFQQ_SEEKY(cfqq)))
enable_idle = 0;
else if (sample_valid(cic->ttime_samples)) {

- or we could explicitly check the rq_noidle() in
cfq_completed_request for SYNC_WORKLOAD:
@@ -3360,8 +3360,10 @@ static void cfq_completed_request(struct
request_queue *q, struct request *rq)
* SYNC_NOIDLE_WORKLOAD idles at the end of the tree
* only if we processed at least one !rq_noidle request
*/
- if (cfqd->serving_type == SYNC_WORKLOAD
- || cfqd->noidle_tree_requires_idle
+ if ((cfqd->serving_type == SYNC_WORKLOAD
+ && !rq_noidle(rq))
+ || (cfqd->serving_type == SYNC_NOIDLE_WORKLOAD
+ && cfqd->noidle_tree_requires_idle)
|| cfqq->cfqg->nr_cfqq == 1)
cfq_arm_slice_timer(cfqd);
}

> Regarding giving up idling on sync-noidle workload, I think it still
> makes some sense to keep track if some other random queue is doing IO on
> that tree or not and if yes, then continue to idle. That's a different
> thing that current logic if more coarse and could be fine grained a bit.

The patch in previous message should fix this.

>
> Because I don't have a practical workload example at this point of time, I
> also don't mind reverting your old patch and restoring the policy of not
> idling if RQ_NOIDLE() was set.
>
> But it still does not answer the question that why O_DIRECT and O_SYNC
> paths be different when it comes to RQ_NOIDLE.
This is outside my knowledge.

Thanks
Corrado
>
> Thanks
> Vivek
>



--
__________________________________________________________________________

dott. Corrado Zoccolo mailto:czoccolo(a)gmail.com
PhD - Department of Computer Science - University of Pisa, Italy
--------------------------------------------------------------------------
The self-confidence of a warrior is not the self-confidence of the average
man. The average man seeks certainty in the eyes of the onlooker and calls
that self-confidence. The warrior seeks impeccability in his own eyes and
calls that humbleness.
Tales of Power - C. Castaneda
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/