[RFC] CFQ: Make prio_trees per cfq group basis to improve IO performance [Kernel]

Prev: Remove REDWOOD_[456] config options and conditional code
Next: [PATCH repost] pci: fix compilation when CONFIG_PCI_MSI=n

From: Jeff Moyer on 16 Jul 2010 10:30

Vivek Goyal <vgoyal(a)redhat.com> writes:

> On Fri, Jul 16, 2010 at 05:21:00PM +0800, Gui Jianfeng wrote:
>> Currently, prio_trees is global, and we rely on cfqq_close() to search
>> a coorperator. If the returned cfqq and the active cfqq don't belong to
>> the same group, coorperator searching fails. Actually, that's not the case.
>> Even if cfqq_close() returns a cfqq which belong to another cfq group,
>> it's still likely that a coorperator(same cfqg) resides in prio_trees.
>> This patch introduces per cfq group prio_trees that should solve the above
>> issue.
>>
>
> Hi Gui,
>
> I am not sure I understand the issue here. So are you saying that once
> we find a cfqq which is close but belongs to a different group we reject
> it. But there could be another cfqq in the same group which is not as
> close but still close enough.
>
> For example, assume there are two queues q1 and q2 and in group and third
> queue q3 in group B. Assume q1 is active queue and we are searching for
> cooperator. If cooperator code finds q3 as closest then we will not pick
> this queue as it belongs to a different group. But it could happen that
> q2 is also close enough and we never considered that possibility.
>
> If yes, then its a good theoritical concern but I am worried practically
> how often does it happen. Do you have any workload which suffers because
> of this?

That was my reading. It also means that, in the case that we have
cgroups in use, each rb tree will be smaller.

> I am not too inclined to push more complexity in CFQ until and unless we
> have a good use case.

I don't think this adds complexity, does it? It simply moves the
priority trees up a level, which is arguably where they belong.

>> +static struct cfq_queue *
>> +cfq_prio_tree_lookup(struct cfq_group *cfqg, struct rb_root *root,
>> + sector_t sector, struct rb_node **ret_parent,
>> + struct rb_node ***rb_link)
>> +{

You can get rid of the cfqg argument. I know you're just keeping with
the prior model (where cfqd was passed in and not used), but let's kill
it.

Cheers,
Jeff
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Jeff Moyer on 16 Jul 2010 11:10

Vivek Goyal <vgoyal(a)redhat.com> writes:

> On Fri, Jul 16, 2010 at 10:21:46AM -0400, Jeff Moyer wrote:
>> Vivek Goyal <vgoyal(a)redhat.com> writes:
>>
>> > On Fri, Jul 16, 2010 at 05:21:00PM +0800, Gui Jianfeng wrote:
>> >> Currently, prio_trees is global, and we rely on cfqq_close() to search
>> >> a coorperator. If the returned cfqq and the active cfqq don't belong to
>> >> the same group, coorperator searching fails. Actually, that's not the case.
>> >> Even if cfqq_close() returns a cfqq which belong to another cfq group,
>> >> it's still likely that a coorperator(same cfqg) resides in prio_trees.
>> >> This patch introduces per cfq group prio_trees that should solve the above
>> >> issue.
>> >>
>> >
>> > Hi Gui,
>> >
>> > I am not sure I understand the issue here. So are you saying that once
>> > we find a cfqq which is close but belongs to a different group we reject
>> > it. But there could be another cfqq in the same group which is not as
>> > close but still close enough.
>> >
>> > For example, assume there are two queues q1 and q2 and in group and third
>> > queue q3 in group B. Assume q1 is active queue and we are searching for
>> > cooperator. If cooperator code finds q3 as closest then we will not pick
>> > this queue as it belongs to a different group. But it could happen that
>> > q2 is also close enough and we never considered that possibility.
>> >
>> > If yes, then its a good theoritical concern but I am worried practically
>> > how often does it happen. Do you have any workload which suffers because
>> > of this?
>>
>> That was my reading. It also means that, in the case that we have
>> cgroups in use, each rb tree will be smaller.
>>
>> > I am not too inclined to push more complexity in CFQ until and unless we
>> > have a good use case.
>>
>> I don't think this adds complexity, does it? It simply moves the
>> priority trees up a level, which is arguably where they belong.
>
> What happens when cfqq moves to a different group. group_isolation=0. Then
> we also need to add code to change prio tree of the cfqq. Curretnly prio
> tree are global so we don't have to worry about it. I don't think this
> patch takes are of that issue.

Yeah, that had occurred to me.

> That's a different thing that I am beginning to not like group_isoation=0
> because this additional variable that cfqq's can move dynamically across
> groups is making life hard while adding more code in CFQ. So if nobody
> is using it I was thinking of getting rid of group_isolation tunable.
>
> It does bring the issue of severe performance penalty for sync-noidle
> workloads across groups. I think that should be solved by a different
> tunable like don't worry about fairness if group is not driving a minimum
> queue depth and this should be adjustable by tunable so that system admin
> can decide the right balance between fairness/isolation and throughput.

I'm not sure what you concluded here. ;-)

The way I see it, Gui's patch makes sense. It sounds like you agree,
but you didn't like it because you have to write extra code to deal with
the case of group_isolation=0. I simply don't agree with that line of
reasoning.

Now, there is the question of whether Gui's patch introduces any *real*
benefit. I'd honestly be surprised if it did. Gui, can you give us
some benchmark results that show the benefit? If there is no benefit,
then I'm happy to leave the code the way it is.

Cheers,
Jeff
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

|
Pages: 1
Prev: Remove REDWOOD_[456] config options and conditional code
Next: [PATCH repost] pci: fix compilation when CONFIG_PCI_MSI=n