From: Nitin Gupta on
On 06/03/2010 10:23 AM, Andreas Dilger wrote:
> On 2010-06-02, at 20:46, Nitin Gupta wrote:
>> On 06/03/2010 04:32 AM, Dan Magenheimer wrote:
>>>> From: Minchan Kim [mailto:minchan.kim(a)gmail.com]
>>>
>>>>> I am also eagerly awaiting Nitin Gupta's cleancache backend
>>>>> and implementation to do in-kernel page cache compression.
>>>>
>>>> Do Nitin say he will make backend of cleancache for
>>>> page cache compression?
>>>>
>>>> It would be good feature.
>>>> I have a interest, too. :)
>>>
>>> That was Nitin's plan for his GSOC project when we last discussed
>>> this. Nitin is on the cc list and can comment if this has
>>> changed.
>>
>> Yes, I have just started work on in-kernel page cache compression
>> backend for cleancache :)
>
> Is there a design doc for this implementation?

Its all on physical paper :)
Anyways, the design is quite simple as it simply has to act on cleancache
callbacks.

> I was thinking it would be quite clever to do compression in, say, 64kB or 128kB chunks in a mapping (to get decent compression) and then write these compressed chunks directly from the page cache to disk in btrfs and/or a revived compressed ext4.
>

Batching of pages to get good compression ratio seems doable.

However, writing this compressed data (with/without batching) to disk seems
quite difficult. Pages given out to cleancache are not part of pagecache and
the disk might also contain uncompressed version of the same data. There is
also the problem of efficient on-disk structure for storing variable sized
compressed chunks. I'm not sure how we can deal with all these issues.

Thanks,
Nitin
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Dan Magenheimer on
> On 06/03/2010 10:23 AM, Andreas Dilger wrote:
> > On 2010-06-02, at 20:46, Nitin Gupta wrote:
>
> > I was thinking it would be quite clever to do compression in, say,
> > 64kB or 128kB chunks in a mapping (to get decent compression) and
> > then write these compressed chunks directly from the page cache
> > to disk in btrfs and/or a revived compressed ext4.
>
> Batching of pages to get good compression ratio seems doable.

Is there evidence that batching a set of random individual 4K
pages will have a significantly better compression ratio than
compressing the pages separately? I certainly understand that
if the pages are from the same file, compression is likely to
be better, but pages evicted from the page cache (which is
the source for all cleancache_puts) are likely to be quite a
bit more random than that, aren't they?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Nitin Gupta on
On 06/03/2010 09:13 PM, Dan Magenheimer wrote:
>> On 06/03/2010 10:23 AM, Andreas Dilger wrote:
>>> On 2010-06-02, at 20:46, Nitin Gupta wrote:
>>
>>> I was thinking it would be quite clever to do compression in, say,
>>> 64kB or 128kB chunks in a mapping (to get decent compression) and
>>> then write these compressed chunks directly from the page cache
>>> to disk in btrfs and/or a revived compressed ext4.
>>
>> Batching of pages to get good compression ratio seems doable.
>
> Is there evidence that batching a set of random individual 4K
> pages will have a significantly better compression ratio than
> compressing the pages separately? I certainly understand that
> if the pages are from the same file, compression is likely to
> be better, but pages evicted from the page cache (which is
> the source for all cleancache_puts) are likely to be quite a
> bit more random than that, aren't they?
>


Batching of pages from random files may not be so effective but
it would be interesting to collect some data for this. Still,
per-inode batching of pages seems doable and this should help
us get over this problem.

Thanks,
Nitin
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Minchan Kim on
Hi, Nitin.

I am happy to hear you started this work.

On Fri, Jun 04, 2010 at 03:06:49PM +0530, Nitin Gupta wrote:
> On 06/03/2010 09:13 PM, Dan Magenheimer wrote:
> >> On 06/03/2010 10:23 AM, Andreas Dilger wrote:
> >>> On 2010-06-02, at 20:46, Nitin Gupta wrote:
> >>
> >>> I was thinking it would be quite clever to do compression in, say,
> >>> 64kB or 128kB chunks in a mapping (to get decent compression) and
> >>> then write these compressed chunks directly from the page cache
> >>> to disk in btrfs and/or a revived compressed ext4.
> >>
> >> Batching of pages to get good compression ratio seems doable.
> >
> > Is there evidence that batching a set of random individual 4K
> > pages will have a significantly better compression ratio than
> > compressing the pages separately? I certainly understand that
> > if the pages are from the same file, compression is likely to
> > be better, but pages evicted from the page cache (which is
> > the source for all cleancache_puts) are likely to be quite a
> > bit more random than that, aren't they?
> >
>
>
> Batching of pages from random files may not be so effective but
> it would be interesting to collect some data for this. Still,
> per-inode batching of pages seems doable and this should help
> us get over this problem.

1)
Please, consider system memory pressure case.
In such case, we have to release compressed cache pages.
Or it would be better to discard not-good-compression pages
when you compress it.

2)
This work is related to page reclaiming.
Page reclaiming is to make free memory.
But this work might free memory little than old.
I admit your concept is good in terms of I/O cost.
But we might discard more clean pages than old if you want to
do batching of pages for good compression.

3)
testcase.

As I mentioned, it could be good in terms of I/O cost.
But it could change system's behavior due to page consumption of backend.
so many page scanning/reclaiming could be happen.
It means hot pages can be discarded with this patch.
But it's a just guessing.
So we need number with testcase we can measure I/O and system
responsivness.

>
> Thanks,
> Nitin

--
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/