blkiocg async support [Kernel]

Prev: [-next July 9 - s390 ] Badness at fs/sysfs/symlink.c:82 during qeth initalization
Next: Badness at fs/sysfs/symlink.c:82 during qeth initalization

From: Greg Thelen on 22 Jul 2010 15:30

On Sun, Jul 11, 2010 at 5:20 PM, KAMEZAWA Hiroyuki
<kamezawa.hiroyu(a)jp.fujitsu.com> wrote:
> On Sat, 10 Jul 2010 09:24:17 -0400
> Vivek Goyal <vgoyal(a)redhat.com> wrote:
>
>> On Fri, Jul 09, 2010 at 05:55:23PM -0700, Nauman Rafique wrote:
>>
>> [..]
>> > > Well, right. �I agree.
>> > > But I think we can work parallel. �I will try to struggle on both.
>> >
>> > IMHO, we have a classic chicken and egg problem here. We should try to
>> > merge pieces as they become available. If we get to agree on patches
>> > that do async IO tracking for IO controller, we should go ahead with
>> > them instead of trying to wait for per cgroup dirty ratios.
>> >
>> > In terms of getting numbers, we have been using patches that add per
>> > cpuset dirty ratios on top of NUMA_EMU, and we get good
>> > differentiation between buffered writes as well as buffered writes vs.
>> > reads.
>> >
>> > It is really obvious that as long as flusher threads ,etc are not
>> > cgroup aware, differentiation for buffered writes would not be perfect
>> > in all cases, but this is a step in the right direction and we should
>> > go for it.
>>
>> Working parallel on two separate pieces is fine. But pushing second piece
>> in first does not make much sense to me because second piece does not work
>> if first piece is not in. There is no way to test it. What's the point of
>> pushing a code in kernel which only compiles but does not achieve intented
>> purposes because some other pieces are missing.
>>
>> Per cgroup dirty ratio is a little hard problem and few attempts have
>> already been made at it. IMHO, we need to first work on that piece and
>> get it inside the kernel and then work on IO tracking patches. Lets
>> fix the hard problem first that is necessary to make second set of patches
>> work.
>>
>
> I've just waited for dirty-ratio patches because I know someone is working on.
> But, hmm, I'll consider to start work by myself.

I have some patches that I have to address the dirty-ratios. I will post them.

These dirty-ratio patches do not do anything intelligent wrt to
per-cgroup writeback. When a cgroup dirty ratio is exceeded, a
per-bdi writeback is triggered.

> (Off-topic)
> BTW, why io-cgroup's hierarchy level is limited to 2 ?
> Because of that limitation, libvirt can't work well...
>
> Thanks,
> -Kame
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo(a)vger.kernel.org
> More majordomo info at �http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at �http://www.tux.org/lkml/
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: KAMEZAWA Hiroyuki on 22 Jul 2010 20:10

On Thu, 22 Jul 2010 12:28:50 -0700
Greg Thelen <gthelen(a)google.com> wrote:

> > I've just waited for dirty-ratio patches because I know someone is working on.
> > But, hmm, I'll consider to start work by myself.
>
> I have some patches that I have to address the dirty-ratios. I will post them.
>

please wait until my proposal to implement a light-weight lock-less update_stat()
I'll handle FILE_MAPPED in it and add a generic interface for updating statistics
in mem_cgroup via page_cgroup.
(I'll post it today if I can, IOW, I'm lucky.)

If not, we have to discuss the same thing again in a hell.

Thanks,
-Kame

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Balbir Singh on 26 Jul 2010 02:50

* Munihiro Ikeda <m-ikeda(a)ds.jp.nec.com> [2010-07-08 22:57:13]:

> These RFC patches are trial to add async (cached) write support on blkio
> controller.
>
> Only test which has been done is to compile, boot, and that write bandwidth
> seems prioritized when pages which were dirtied by two different processes in
> different cgroups are written back to a device simultaneously. I know this
> is the minimum (or less) test but I posted this as RFC because I would like
> to hear your opinions about the design direction in the early stage.
>
> Patches are for 2.6.35-rc4.
>
> This patch series consists of two chunks.
>
> (1) iotrack (patch 01/11 -- 06/11)
>
> This is a functionality to track who dirtied a page, in exact which cgroup a
> process which dirtied a page belongs to. Blkio controller will read the info
> later and prioritize when the page is actually written to a block device.
> This work is originated from Ryo Tsuruta and Hirokazu Takahashi and includes
> Andrea Righi's idea. It was posted as a part of dm-ioband which was one of
> proposals for IO controller.
>

Does this reuse the memcg infrastructure, if so could you please add a
summary of the changes here.

>
> (2) blkio controller modification (07/11 -- 11/11)
>
> The main part of blkio controller async write support.
> Currently async queues are device-wide and async write IOs are always treated
> as root group.
> These patches make async queues per a cfq_group per a device to control them.
> Async write is handled by flush kernel thread. Because queue pointers are
> stored in cfq_io_context, io_context of the thread has to have multiple
> cfq_io_contexts per a device. So these patches make cfq_io_context per an
> io_context per a cfq_group, which means per an io_context per a cgroup per a
> device.
>
>
> This might be a piece of puzzle for complete async write support of blkio
> controller. One of other pieces in my head is page dirtying ratio control.
> I believe Andrea Righi was working on it...how about the situation?
>

Greg posted the last set of patches, we are yet to see another
iteration.

> And also, I'm thinking that async write support is required by bandwidth
> capping policy of blkio controller. Bandwidth capping can be done in upper
> layer than elevator. However I think it should be also done in elevator layer
> in my opinion. Elevator buffers and sort requests. If there is another
> buffering functionality in upper layer, it is doubled buffering and it can be
> harmful for elevator's prediction.
>

--
Three Cheers,
Balbir
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: KAMEZAWA Hiroyuki on 27 Jul 2010 02:50

On Mon, 26 Jul 2010 23:40:07 -0700
Greg Thelen <gthelen(a)google.com> wrote:

> On Sun, Jul 25, 2010 at 11:41 PM, Balbir Singh
> <balbir(a)linux.vnet.ibm.com> wrote:
> > * Munihiro Ikeda <m-ikeda(a)ds.jp.nec.com> [2010-07-08 22:57:13]:
> >
> >> These RFC patches are trial to add async (cached) write support on blkio
> >> controller.
> >>
> >> Only test which has been done is to compile, boot, and that write bandwidth
> >> seems prioritized when pages which were dirtied by two different processes in
> >> different cgroups are written back to a device simultaneously. I know this
> >> is the minimum (or less) test but I posted this as RFC because I would like
> >> to hear your opinions about the design direction in the early stage.
> >>
> >> Patches are for 2.6.35-rc4.
> >>
> >> This patch series consists of two chunks.
> >>
> >> (1) iotrack (patch 01/11 -- 06/11)
> >>
> >> This is a functionality to track who dirtied a page, in exact which cgroup a
> >> process which dirtied a page belongs to. Blkio controller will read the info
> >> later and prioritize when the page is actually written to a block device.
> >> This work is originated from Ryo Tsuruta and Hirokazu Takahashi and includes
> >> Andrea Righi's idea. It was posted as a part of dm-ioband which was one of
> >> proposals for IO controller.
> >>
> >
> > Does this reuse the memcg infrastructure, if so could you please add a
> > summary of the changes here.
> >
> >>
> >> (2) blkio controller modification (07/11 -- 11/11)
> >>
> >> The main part of blkio controller async write support.
> >> Currently async queues are device-wide and async write IOs are always treated
> >> as root group.
> >> These patches make async queues per a cfq_group per a device to control them.
> >> Async write is handled by flush kernel thread. Because queue pointers are
> >> stored in cfq_io_context, io_context of the thread has to have multiple
> >> cfq_io_contexts per a device. So these patches make cfq_io_context per an
> >> io_context per a cfq_group, which means per an io_context per a cgroup per a
> >> device.
> >>
> >>
> >> This might be a piece of puzzle for complete async write support of blkio
> >> controller. One of other pieces in my head is page dirtying ratio control.
> >> I believe Andrea Righi was working on it...how about the situation?
> >>
> >
> > Greg posted the last set of patches, we are yet to see another
> > iteration.
>
> I am waiting to post the next iteration of memcg dirty limits and ratios until
> Kame-san posts light-weight lockless update_stat(). I can post the dirty ratio
> patches before the lockless updates are available, but I imagine there will be
> a significant merge. So I prefer to wait, assuming that thee changes will be
> coming in the near future.
>
will post RFC version today.

But maybe needs some sort-out..I'll CC you.

Thanks,
-Kame

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Greg Thelen on 27 Jul 2010 02:50

On Sun, Jul 25, 2010 at 11:41 PM, Balbir Singh
<balbir(a)linux.vnet.ibm.com> wrote:
> * Munihiro Ikeda <m-ikeda(a)ds.jp.nec.com> [2010-07-08 22:57:13]:
>
>> These RFC patches are trial to add async (cached) write support on blkio
>> controller.
>>
>> Only test which has been done is to compile, boot, and that write bandwidth
>> seems prioritized when pages which were dirtied by two different processes in
>> different cgroups are written back to a device simultaneously. �I know this
>> is the minimum (or less) test but I posted this as RFC because I would like
>> to hear your opinions about the design direction in the early stage.
>>
>> Patches are for 2.6.35-rc4.
>>
>> This patch series consists of two chunks.
>>
>> (1) iotrack (patch 01/11 -- 06/11)
>>
>> This is a functionality to track who dirtied a page, in exact which cgroup a
>> process which dirtied a page belongs to. �Blkio controller will read the info
>> later and prioritize when the page is actually written to a block device.
>> This work is originated from Ryo Tsuruta and Hirokazu Takahashi and includes
>> Andrea Righi's idea. �It was posted as a part of dm-ioband which was one of
>> proposals for IO controller.
>>
>
> Does this reuse the memcg infrastructure, if so could you please add a
> summary of the changes here.
>
>>
>> (2) blkio controller modification (07/11 -- 11/11)
>>
>> The main part of blkio controller async write support.
>> Currently async queues are device-wide and async write IOs are always treated
>> as root group.
>> These patches make async queues per a cfq_group per a device to control them.
>> Async write is handled by flush kernel thread. �Because queue pointers are
>> stored in cfq_io_context, io_context of the thread has to have multiple
>> cfq_io_contexts per a device. �So these patches make cfq_io_context per an
>> io_context per a cfq_group, which means per an io_context per a cgroup per a
>> device.
>>
>>
>> This might be a piece of puzzle for complete async write support of blkio
>> controller. �One of other pieces in my head is page dirtying ratio control.
>> I believe Andrea Righi was working on it...how about the situation?
>>
>
> Greg posted the last set of patches, we are yet to see another
> iteration.

I am waiting to post the next iteration of memcg dirty limits and ratios until
Kame-san posts light-weight lockless update_stat(). I can post the dirty ratio
patches before the lockless updates are available, but I imagine there will be
a significant merge. So I prefer to wait, assuming that thee changes will be
coming in the near future.

>> And also, I'm thinking that async write support is required by bandwidth
>> capping policy of blkio controller. �Bandwidth capping can be done in upper
>> layer than elevator. �However I think it should be also done in elevator layer
>> in my opinion. �Elevator buffers and sort requests. �If there is another
>> buffering functionality in upper layer, it is doubled buffering and it can be
>> harmful for elevator's prediction.
>>
>
>
> --
> � � � �Three Cheers,
> � � � �Balbir
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo(a)vger.kernel.org
> More majordomo info at �http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at �http://www.tux.org/lkml/
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

First | Prev | Next | Last
Pages: 1 2 3 4 5 6
Prev: [-next July 9 - s390 ] Badness at fs/sysfs/symlink.c:82 during qeth initalization
Next: Badness at fs/sysfs/symlink.c:82 during qeth initalization