From: Vivek Goyal on
On Thu, Jul 15, 2010 at 09:00:48AM +0900, KAMEZAWA Hiroyuki wrote:
> On Wed, 14 Jul 2010 10:29:19 -0400
> Vivek Goyal <vgoyal(a)redhat.com> wrote:
> > >
> > > Cgroup's feature as mounting several subsystems at a mount point at once
> > > is very useful in many case.
> >
> > I agree that it is useful but if some controllers are not supporting
> > hierarchy, it just adds to more confusion. And later when hierarchy
> > support comes in, there will be additional issue of keeping this file
> > "use_hierarchy" like memory controller.
> >
> > So at this point of time , I am not too inclined towards allowing hierarchical
> > cgroup creation but treating them as flat in CFQ. I think it adds to the
> > confusion and user space should handle this situation.
> >
>
> Hmm.
>
> Could you fix error code in create blkio cgroup ? It returns -EINVAL now.
> IIUC, mkdir(2) doesn't return -EINVAL as error code (from man.)
> Then, it's very confusing. I think -EPERM or -ENOMEM will be much better.

Hm..., Probably -EPERM is somewhat close to what we are doing. File system
does supoort creation of directories but not after certain level.

I will trace more instances of mkdir error values.

>
> Anyway, I need to see source code of blk-cgroup.c to know why libvirt fails
> to create cgroup.

[CCing daniel berrange]

AFAIK, libvirt does not have support for blkio controller yet. Are you
trying to introduce that?

libvirt creates a direcotry tree. I think /cgroup/libvirt/qemu/kvm-dirs.
So actual virtual machine directors are 2-3 level below and that would
explain that if you try to use blkio controller with libvirt, it will fail
because it will not be able to create directories at that level.

I think libvirt need to special case blkio here to create directories in
top level. It is odd but really there are no easy answeres. Will we not
support a controller in libvirt till controller support hierarchy.

> Where is the user-visible information (in RHEL or Fedora)
> about "you can't use blkio-cgroup via libvirt or libcgroup" ?

[CCing balbir]

I think with libcgroup you can use blkio controller. I know somebody
who was using cgexec command to launch some jobs in blkio cgroups. AFAIK,
libcgroup does not have too much controller specific state and should
not require any modifications for blkio controller.

Balbir can tell us more.

libvirt will require modification to support blkio controller. I also
noticed that libvirt by default puts every virtual machine into its
own cgroup. I think it might not be a very good strategy for blkio
controller because putting every virtual machine in its own cgroup
will kill overall throughput if each virtual machine is not driving
enough IO.

I am also trying to come up with some additional logic of letting go
fairness if a group is not doing sufficient IO.

Daniel, do you know where is the documentation which says what controllers
are currently supported by libvirt.

Thanks
Vivek
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Daniel P. Berrange on
On Fri, Jul 16, 2010 at 09:43:53AM -0400, Vivek Goyal wrote:
> On Thu, Jul 15, 2010 at 09:00:48AM +0900, KAMEZAWA Hiroyuki wrote:
> > On Wed, 14 Jul 2010 10:29:19 -0400
> > Vivek Goyal <vgoyal(a)redhat.com> wrote:
> > > >
> > > > Cgroup's feature as mounting several subsystems at a mount point at once
> > > > is very useful in many case.
> > >
> > > I agree that it is useful but if some controllers are not supporting
> > > hierarchy, it just adds to more confusion. And later when hierarchy
> > > support comes in, there will be additional issue of keeping this file
> > > "use_hierarchy" like memory controller.
> > >
> > > So at this point of time , I am not too inclined towards allowing hierarchical
> > > cgroup creation but treating them as flat in CFQ. I think it adds to the
> > > confusion and user space should handle this situation.
> > >
> >
> > Hmm.
> >
> > Could you fix error code in create blkio cgroup ? It returns -EINVAL now.
> > IIUC, mkdir(2) doesn't return -EINVAL as error code (from man.)
> > Then, it's very confusing. I think -EPERM or -ENOMEM will be much better.
>
> Hm..., Probably -EPERM is somewhat close to what we are doing. File system
> does supoort creation of directories but not after certain level.
>
> I will trace more instances of mkdir error values.
>
> >
> > Anyway, I need to see source code of blk-cgroup.c to know why libvirt fails
> > to create cgroup.
>
> [CCing daniel berrange]
>
> AFAIK, libvirt does not have support for blkio controller yet. Are you
> trying to introduce that?
>
> libvirt creates a direcotry tree. I think /cgroup/libvirt/qemu/kvm-dirs.
> So actual virtual machine directors are 2-3 level below and that would
> explain that if you try to use blkio controller with libvirt, it will fail
> because it will not be able to create directories at that level.

Yes, we use a hierarchy to deal with namespace uniqueness. The
first step is to determine where libvirtd process is placed. This
may be the root cgroup, but it may already be one or more levels
down due to the init system (sysv-init, upstart, systemd etc)
startup policy. Once that's determined we create a 'libvirt'
cgroup which acts as container for everything run by libvirtd.
At the next level is the driver name (qemu, lxc, uml). This allows
confinement of all guests for a particular driver and gives us
a unique namespace for the next level where we have a directory
per guest. This last level is where libvirt actually sets tunables
normally. The higher levels are for administrator use.

$ROOT (where libvirtd process is, not the root mount point)
|
+- libvirt
|
+- qemu
| |
| +- guest1
| +- guest2
| +- guest3
| ...
|
+- lxc
+- guest1
+- guest2
+- guest3
...


> I think libvirt need to special case blkio here to create directories in
> top level. It is odd but really there are no easy answeres. Will we not
> support a controller in libvirt till controller support hierarchy.

We explicitly avoided creating anything at the top level. We always
detect where the libvirtd process has been placed & only ever create
stuff below that point. This ensures the host admin can set overall
limits for virt on a host, and not have libvirt side-step these limits
by jumping back upto the root cgroup.

> > Where is the user-visible information (in RHEL or Fedora)
> > about "you can't use blkio-cgroup via libvirt or libcgroup" ?
>
> [CCing balbir]
>
> I think with libcgroup you can use blkio controller. I know somebody
> who was using cgexec command to launch some jobs in blkio cgroups. AFAIK,
> libcgroup does not have too much controller specific state and should
> not require any modifications for blkio controller.
>
> Balbir can tell us more.
>
> libvirt will require modification to support blkio controller. I also
> noticed that libvirt by default puts every virtual machine into its
> own cgroup. I think it might not be a very good strategy for blkio
> controller because putting every virtual machine in its own cgroup
> will kill overall throughput if each virtual machine is not driving
> enough IO.

A requirement todo everything in the top level and not use a hiearchy
for blkio makes this a pretty unfriendly controller to use. It seriously
limits flexibility of what libvirt and host administrators can do and
means we can't effectively split poilicy between them. It also means
that if the blkio contorller were ever mounted at same point as another
controller, you'd loose the hierarchy support for that other controller
IMHO use of the cgroups hiearchy is key to making cgroups managable for
applications. We can't have many different applications on a system
all having to create many directories at the top level.

> I am also trying to come up with some additional logic of letting go
> fairness if a group is not doing sufficient IO.
>
> Daniel, do you know where is the documentation which says what controllers
> are currently supported by libvirt.

We use cpu, cpuacct, cpuset, memory, devices & freezer currently.

Daniel
--
|: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :|
|: http://libvirt.org -o- http://virt-manager.org -o- http://deltacloud.org :|
|: http://autobuild.org -o- http://search.cpan.org/~danberr/ :|
|: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Vivek Goyal on
On Fri, Jul 16, 2010 at 03:15:49PM +0100, Daniel P. Berrange wrote:

[..]
> > libvirt will require modification to support blkio controller. I also
> > noticed that libvirt by default puts every virtual machine into its
> > own cgroup. I think it might not be a very good strategy for blkio
> > controller because putting every virtual machine in its own cgroup
> > will kill overall throughput if each virtual machine is not driving
> > enough IO.
>
> A requirement todo everything in the top level and not use a hiearchy
> for blkio makes this a pretty unfriendly controller to use. It seriously
> limits flexibility of what libvirt and host administrators can do and
> means we can't effectively split poilicy between them. It also means
> that if the blkio contorller were ever mounted at same point as another
> controller, you'd loose the hierarchy support for that other controller
> IMHO use of the cgroups hiearchy is key to making cgroups managable for
> applications. We can't have many different applications on a system
> all having to create many directories at the top level.
>

I understand that not having hierarchical support is a huge limitation
and in future I would like to be there. Just that at the moment provinding
that support is hard as I am struggling with more basic issues which are
more important.

Secondly, just because some controller allows creation of hierarchy does
not mean that hierarchy is being enforced. For example, memory controller.
IIUC, one needs to explicitly set "use_hierarchy" to enforce hierarchy
otherwise effectively it is flat. So if libvirt is creating groups and
putting machines in child groups thinking that we are not interfering
with admin's policy, is not entirely correct.

So how do we make progress here. I really want to see blkio controller
integrated with libvirt.

About the issue of hierarchy, I can probably travel down the path of allowing
creation of hierarchy but CFQ will treat it as flat. Though I don't like it
because it will force me to introduce variables like "use_hierarchy" once
real hierarchical support comes in but I guess I can live with that.
(Anyway memory controller is already doing it.).

There is another issue though and that is by default every virtual
machine going into a group of its own. As of today, it can have
severe performance penalties (depending on workload) if group is not
driving doing enough IO. (Especially with group_isolation=1).

I was thinking of a model where an admin moves out the bad virtual
machines in separate group and limit their IO.

Vivek
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Daniel P. Berrange on
On Fri, Jul 16, 2010 at 10:35:36AM -0400, Vivek Goyal wrote:
> On Fri, Jul 16, 2010 at 03:15:49PM +0100, Daniel P. Berrange wrote:
> Secondly, just because some controller allows creation of hierarchy does
> not mean that hierarchy is being enforced. For example, memory controller.
> IIUC, one needs to explicitly set "use_hierarchy" to enforce hierarchy
> otherwise effectively it is flat. So if libvirt is creating groups and
> putting machines in child groups thinking that we are not interfering
> with admin's policy, is not entirely correct.

That is true, but that 'use_hierarchy' at least provides admins
the mechanism required to implement the neccessary policy

> So how do we make progress here. I really want to see blkio controller
> integrated with libvirt.
>
> About the issue of hierarchy, I can probably travel down the path of allowing
> creation of hierarchy but CFQ will treat it as flat. Though I don't like it
> because it will force me to introduce variables like "use_hierarchy" once
> real hierarchical support comes in but I guess I can live with that.
> (Anyway memory controller is already doing it.).
>
> There is another issue though and that is by default every virtual
> machine going into a group of its own. As of today, it can have
> severe performance penalties (depending on workload) if group is not
> driving doing enough IO. (Especially with group_isolation=1).
>
> I was thinking of a model where an admin moves out the bad virtual
> machines in separate group and limit their IO.

In the simple / normal case I imagine all guests VMs will be running
unrestricted I/O initially. Thus instead of creating the cgroup at time
of VM startup, we could create the cgroup only when the admin actually
sets an I/O limit. IIUC, this should maintain the one cgroup per guest
model, while avoiding the performance penalty in normal use. The caveat
of course is that this would require blkio controller to have a dedicated
mount point, not shared with other controller. I think we might also
want this kind of model for net I/O, since we probably don't want to
creating TC classes + net_cls groups for every VM the moment it starts
unless the admin has actually set a net I/O limit.

Daniel
--
|: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :|
|: http://libvirt.org -o- http://virt-manager.org -o- http://deltacloud.org :|
|: http://autobuild.org -o- http://search.cpan.org/~danberr/ :|
|: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Vivek Goyal on
On Fri, Jul 16, 2010 at 03:53:09PM +0100, Daniel P. Berrange wrote:
> On Fri, Jul 16, 2010 at 10:35:36AM -0400, Vivek Goyal wrote:
> > On Fri, Jul 16, 2010 at 03:15:49PM +0100, Daniel P. Berrange wrote:
> > Secondly, just because some controller allows creation of hierarchy does
> > not mean that hierarchy is being enforced. For example, memory controller.
> > IIUC, one needs to explicitly set "use_hierarchy" to enforce hierarchy
> > otherwise effectively it is flat. So if libvirt is creating groups and
> > putting machines in child groups thinking that we are not interfering
> > with admin's policy, is not entirely correct.
>
> That is true, but that 'use_hierarchy' at least provides admins
> the mechanism required to implement the neccessary policy
>
> > So how do we make progress here. I really want to see blkio controller
> > integrated with libvirt.
> >
> > About the issue of hierarchy, I can probably travel down the path of allowing
> > creation of hierarchy but CFQ will treat it as flat. Though I don't like it
> > because it will force me to introduce variables like "use_hierarchy" once
> > real hierarchical support comes in but I guess I can live with that.
> > (Anyway memory controller is already doing it.).
> >
> > There is another issue though and that is by default every virtual
> > machine going into a group of its own. As of today, it can have
> > severe performance penalties (depending on workload) if group is not
> > driving doing enough IO. (Especially with group_isolation=1).
> >
> > I was thinking of a model where an admin moves out the bad virtual
> > machines in separate group and limit their IO.
>
> In the simple / normal case I imagine all guests VMs will be running
> unrestricted I/O initially. Thus instead of creating the cgroup at time
> of VM startup, we could create the cgroup only when the admin actually
> sets an I/O limit.

That makes sense. Run all the virtual machines by default in root group
and move out a virtual machine to a separate group of either low weight
(if virtual machine is a bad one and driving lot of IO) or of higher weight
(if we want to give more IO bw to this machine).

> IIUC, this should maintain the one cgroup per guest
> model, while avoiding the performance penalty in normal use. The caveat
> of course is that this would require blkio controller to have a dedicated
> mount point, not shared with other controller.

Yes. Because for other controllers we seem to be putting virtual machines
in separate cgroups by default at startup time. So it seems we will
require a separate mount point here for blkio controller.

> I think we might also
> want this kind of model for net I/O, since we probably don't want to
> creating TC classes + net_cls groups for every VM the moment it starts
> unless the admin has actually set a net I/O limit.

Looks like. So good, then network controller and blkio controller can
share the this new mount point.

Thanks
Vivek
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/