From: KAMEZAWA Hiroyuki on
On Sat, 10 Jul 2010 09:24:17 -0400
Vivek Goyal <vgoyal(a)redhat.com> wrote:

> On Fri, Jul 09, 2010 at 05:55:23PM -0700, Nauman Rafique wrote:
>
> [..]
> > > Well, right.  I agree.
> > > But I think we can work parallel.  I will try to struggle on both.
> >
> > IMHO, we have a classic chicken and egg problem here. We should try to
> > merge pieces as they become available. If we get to agree on patches
> > that do async IO tracking for IO controller, we should go ahead with
> > them instead of trying to wait for per cgroup dirty ratios.
> >
> > In terms of getting numbers, we have been using patches that add per
> > cpuset dirty ratios on top of NUMA_EMU, and we get good
> > differentiation between buffered writes as well as buffered writes vs.
> > reads.
> >
> > It is really obvious that as long as flusher threads ,etc are not
> > cgroup aware, differentiation for buffered writes would not be perfect
> > in all cases, but this is a step in the right direction and we should
> > go for it.
>
> Working parallel on two separate pieces is fine. But pushing second piece
> in first does not make much sense to me because second piece does not work
> if first piece is not in. There is no way to test it. What's the point of
> pushing a code in kernel which only compiles but does not achieve intented
> purposes because some other pieces are missing.
>
> Per cgroup dirty ratio is a little hard problem and few attempts have
> already been made at it. IMHO, we need to first work on that piece and
> get it inside the kernel and then work on IO tracking patches. Lets
> fix the hard problem first that is necessary to make second set of patches
> work.
>

I've just waited for dirty-ratio patches because I know someone is working on.
But, hmm, I'll consider to start work by myself.

(Off-topic)
BTW, why io-cgroup's hierarchy level is limited to 2 ?
Because of that limitation, libvirt can't work well...

Thanks,
-Kame

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Vivek Goyal on
On Mon, Jul 12, 2010 at 09:20:04AM +0900, KAMEZAWA Hiroyuki wrote:
> On Sat, 10 Jul 2010 09:24:17 -0400
> Vivek Goyal <vgoyal(a)redhat.com> wrote:
>
> > On Fri, Jul 09, 2010 at 05:55:23PM -0700, Nauman Rafique wrote:
> >
> > [..]
> > > > Well, right. �I agree.
> > > > But I think we can work parallel. �I will try to struggle on both.
> > >
> > > IMHO, we have a classic chicken and egg problem here. We should try to
> > > merge pieces as they become available. If we get to agree on patches
> > > that do async IO tracking for IO controller, we should go ahead with
> > > them instead of trying to wait for per cgroup dirty ratios.
> > >
> > > In terms of getting numbers, we have been using patches that add per
> > > cpuset dirty ratios on top of NUMA_EMU, and we get good
> > > differentiation between buffered writes as well as buffered writes vs.
> > > reads.
> > >
> > > It is really obvious that as long as flusher threads ,etc are not
> > > cgroup aware, differentiation for buffered writes would not be perfect
> > > in all cases, but this is a step in the right direction and we should
> > > go for it.
> >
> > Working parallel on two separate pieces is fine. But pushing second piece
> > in first does not make much sense to me because second piece does not work
> > if first piece is not in. There is no way to test it. What's the point of
> > pushing a code in kernel which only compiles but does not achieve intented
> > purposes because some other pieces are missing.
> >
> > Per cgroup dirty ratio is a little hard problem and few attempts have
> > already been made at it. IMHO, we need to first work on that piece and
> > get it inside the kernel and then work on IO tracking patches. Lets
> > fix the hard problem first that is necessary to make second set of patches
> > work.
> >
>
> I've just waited for dirty-ratio patches because I know someone is working on.
> But, hmm, I'll consider to start work by myself.
>

If you can spare time to get it going, it would be great.

> (Off-topic)
> BTW, why io-cgroup's hierarchy level is limited to 2 ?
> Because of that limitation, libvirt can't work well...

Because current CFQ code is not written to support hierarchy. So it was
better to not allow creation of groups inside of groups to avoid suprises.

We need to figure out something for libvirt. One of the options would be
that libvirt allows blkio group creation in /root. Or one shall have to
look into hierarchical support in CFQ.

Things get little complicated in CFQ once we want to support hierarchy.
And to begin with I am not expecting many people to really create groups
inside groups. That's why I am currently focussing on making sure that
current infrastructure works well instead of just adding more features to
it.

Few things I am looking into.

- CFQ performance is not good at high end storage. So group control also
suffers from same issue. Trying to introduce group_idle tunable to
solve some of the problems.

- Even after group_idle, overall throughput suffers if groups don't have
enough traffic to keep the array busy. Trying to create a mode where a
user can specify to let fairness go if groups don't have enough traffic
to keep array busy.

- Request descriptors are still per queue and not per group. I noticed the
moment we create more groups, we start running into the issue of not
enough request descriptors and it starts introducing serialization among
groups. Need to have per group request descriptor intrastructure in.

First I am planning to sort out above issues and then look into other
enhancements.

Thanks
Vivek
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: KAMEZAWA Hiroyuki on
On Mon, 12 Jul 2010 09:18:05 -0400
Vivek Goyal <vgoyal(a)redhat.com> wrote:

> > I've just waited for dirty-ratio patches because I know someone is working on.
> > But, hmm, I'll consider to start work by myself.
> >
>
> If you can spare time to get it going, it would be great.
>
> > (Off-topic)
> > BTW, why io-cgroup's hierarchy level is limited to 2 ?
> > Because of that limitation, libvirt can't work well...
>
> Because current CFQ code is not written to support hierarchy. So it was
> better to not allow creation of groups inside of groups to avoid suprises.
>
> We need to figure out something for libvirt. One of the options would be
> that libvirt allows blkio group creation in /root. Or one shall have to
> look into hierarchical support in CFQ.
>

Hmm, can't we start from a hierarchy which doesn't support inheritance ?
IOW, blkio cgroup has children directories but all cgroups are treated as
flat. In future, true hierarchy support may be added and you may able to
use it via mount option....
For example, memory cgroup's hierarchy support is optional..because it's slow.

Cgroup's feature as mounting several subsystems at a mount point at once
is very useful in many case.

Thanks
-Kame

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Vivek Goyal on
On Tue, Jul 13, 2010 at 01:36:36PM +0900, KAMEZAWA Hiroyuki wrote:
> On Mon, 12 Jul 2010 09:18:05 -0400
> Vivek Goyal <vgoyal(a)redhat.com> wrote:
>
> > > I've just waited for dirty-ratio patches because I know someone is working on.
> > > But, hmm, I'll consider to start work by myself.
> > >
> >
> > If you can spare time to get it going, it would be great.
> >
> > > (Off-topic)
> > > BTW, why io-cgroup's hierarchy level is limited to 2 ?
> > > Because of that limitation, libvirt can't work well...
> >
> > Because current CFQ code is not written to support hierarchy. So it was
> > better to not allow creation of groups inside of groups to avoid suprises.
> >
> > We need to figure out something for libvirt. One of the options would be
> > that libvirt allows blkio group creation in /root. Or one shall have to
> > look into hierarchical support in CFQ.
> >
>
> Hmm, can't we start from a hierarchy which doesn't support inheritance ?
> IOW, blkio cgroup has children directories but all cgroups are treated as
> flat. In future, true hierarchy support may be added and you may able to
> use it via mount option....
> For example, memory cgroup's hierarchy support is optional..because it's slow.

I think doing that is even more cofusing to the user where cgroup dir
structure show hierarchy of groups but in practice that's not the case. It
is easier to deny creating child groups with-in groups right away and
let user space mount blkio at a different mount point and plan the
resource usage accordingly.

>
> Cgroup's feature as mounting several subsystems at a mount point at once
> is very useful in many case.

I agree that it is useful but if some controllers are not supporting
hierarchy, it just adds to more confusion. And later when hierarchy
support comes in, there will be additional issue of keeping this file
"use_hierarchy" like memory controller.

So at this point of time , I am not too inclined towards allowing hierarchical
cgroup creation but treating them as flat in CFQ. I think it adds to the
confusion and user space should handle this situation.

Thanks
Vivek
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: KAMEZAWA Hiroyuki on
On Wed, 14 Jul 2010 10:29:19 -0400
Vivek Goyal <vgoyal(a)redhat.com> wrote:
> >
> > Cgroup's feature as mounting several subsystems at a mount point at once
> > is very useful in many case.
>
> I agree that it is useful but if some controllers are not supporting
> hierarchy, it just adds to more confusion. And later when hierarchy
> support comes in, there will be additional issue of keeping this file
> "use_hierarchy" like memory controller.
>
> So at this point of time , I am not too inclined towards allowing hierarchical
> cgroup creation but treating them as flat in CFQ. I think it adds to the
> confusion and user space should handle this situation.
>

Hmm.

Could you fix error code in create blkio cgroup ? It returns -EINVAL now.
IIUC, mkdir(2) doesn't return -EINVAL as error code (from man.)
Then, it's very confusing. I think -EPERM or -ENOMEM will be much better.

Anyway, I need to see source code of blk-cgroup.c to know why libvirt fails
to create cgroup. Where is the user-visible information (in RHEL or Fedora)
about "you can't use blkio-cgroup via libvirt or libcgroup" ?

Thanks,
-Kame

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/