Cleancache (was Transcendent Memory): overview [Kernel]

Prev: [PATCH] module: module_unload_init() cleanup
Next: module: module_unload_init() cleanup

From: Dan Magenheimer on 2 Jun 2010 12:10

> From: Christoph Hellwig [mailto:hch(a)infradead.org]
> Subject: Re: [PATCH V2 0/7] Cleancache (was Transcendent Memory):
> overview

Hi Christophe --

Thanks for your feedback!

> > fs/btrfs/super.c | 2
> > fs/buffer.c | 5 +
> > fs/ext3/super.c | 2
> > fs/ext4/super.c | 2
> > fs/mpage.c | 7 +
> > fs/ocfs2/super.c | 3
> > fs/super.c | 8 +
>
> This is missing out a whole lot of filesystems. Even more so why the
> hell do you need hooks into the filesystem?

Let me rephrase/regroup your question. Let me know if
I missed anything...

1) Why is the VFS layer involved at all?

VFS hooks are necessary to avoid a disk read when a page
is already in cleancache and to maintain coherency (via
cleancache_flush operations) between cleancache, the
page cache, and disk. This very small, very clean set
of hooks (placed by Chris Mason) all compile into
nothingness if cleancache is config'ed off, and turn
into "if (*p == NULL)" if config'ed on but no "backend"
claims cleancache_ops or if an fs doesn't opt-in
(see below).

2) Why do the individual filesystems need to be modified?

Some filesystems are built entirely on top of VFS and
the hooks in VFS are sufficient, so don't require an
fs "cleancache_init" hook; the initial implementation
of cleancache didn't provide this hook. But for some
fs (such as btrfs) the VFS hooks are incomplete and
one or more hooks in the fs-specific code is required.
For some other fs's (such as tmpfs), cleancache may even
be counterproductive.

So it seemed prudent to require an fs to "opt in" to
use cleancache, which requires at least one hook in
any fs.

3) Why are filesystems missing?

Only because they haven't been tested. The existence
proof of four fs's (ext3/ext4/ocfs2/btfrs) should be
sufficient to validate the concept, the opt-in approach
means that untested filesystems are not affected, and
the hooks in the four fs's should serve as examples to
show that it should be very easy to add more fs's in the
future.

> Please give your patches some semi-resonable subject line.

Not sure what you mean... are the subject lines too short?
Or should I leave off the back-reference to Transcendent Memory?
Or please suggest something you think is more reasonable?

Thanks,
Dan
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Minchan Kim on 2 Jun 2010 12:40

Hi, Dan.

On Wed, Jun 02, 2010 at 08:27:48AM -0700, Dan Magenheimer wrote:
> Hi Minchan --
>
> > I think cleancache approach is cool. :)
> > I have some suggestions and questions.
>
> Thanks for your interest!
>
> > > If a get_page is successful on a non-shared pool, the page is flushed
> > (thus
> > > making cleancache an "exclusive" cache). �On a shared pool, the page
> >
> > Do you have any reason about force "exclusive" on a non-shared pool?
> > To free memory on pesudo-RAM?
> > I want to make it "inclusive" by some reason but unfortunately I can't
> > say why I want it now.
>
> The main reason is to free up memory in pseudo-RAM and to
> avoid unnecessary cleancache_flush calls. If you want
> inclusive, the page can be put immediately following
> the get. If put-after-get for inclusive becomes common,
> the interface could easily be extended to add a "get_no_flush"
> call.

Sounds good to me.

>
> > While you mentioned it's "exclusive", cleancache_get_page doesn't
> > flush the page at below code.
> > Is it a role of user who implement cleancache_ops->get_page?
>
> Yes, the flush is done by the cleancache implementation.
>
> > If backed device is ram(ie), Could we _move_ the pages from page cache
> > to cleancache?
> > I mean I don't want to copy page when get/put operation. we can just
> > move page in case of backed device "ram". Is it possible?
>
> By "move", do you mean changing the virtual mappings? Yes,
> this could be done as long as the source and destination are
> both directly addressable (that is, true physical RAM), but
> requires TLB manipulation and has some complicated corner
> cases. The copy semantics simplifies the implementation on
> both the "frontend" and the "backend" and also allows the
> backend to do fancy things on-the-fly like page compression
> and page deduplication.

Agree. But I don't mean it.
If I use brd as backend, i want to do it follwing as.

put_page :

remove_from_page_cache(page);
brd_insert_page(page);

get_page :

brd_lookup_page(page);
add_to_page_cache(page);

Of course, I know it's impossible without new metadata and modification of
page cache handling and it makes front and backend's good layered design.

What I want is to remove copy overhead when backend is ram and it's also
part of main memory(ie, we have page descriptor).

Do you have an idea?

>
> > You send the patches which is core of cleancache but I don't see any
> > use case.
> > Could you send use case patches with this series?
> > It could help understand cleancache's benefit.
>
> Do you mean the Xen Transcendent Memory ("tmem") implementation?
> If so, this is four files in the Xen source tree (common/tmem.c,
> common/tmem_xen.c, include/xen/tmem.h, include/xen/tmem_xen.h).
> There is also an html document in the Xen source tree, which can
> be viewed here:
> http://oss.oracle.com/projects/tmem/dist/documentation/internals/xen4-internals-v01.html
>
> Or did you mean a cleancache_ops "backend"? For tmem, there
> is one file linux/drivers/xen/tmem.c and it interfaces between
> the cleancache_ops calls and Xen hypercalls. It should be in
> a Xenlinux pv_ops tree soon, or I can email it sooner.

I mean "backend". :)

>
> I am also eagerly awaiting Nitin Gupta's cleancache backend
> and implementation to do in-kernel page cache compression.

Do Nitin say he will make backend of cleancache for
page cache compression?

It would be good feature.
I have a interest, too. :)

Thanks, Dan.

>
> Thanks,
> Dan

--
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Dan Magenheimer on 2 Jun 2010 19:10

> From: Minchan Kim [mailto:minchan.kim(a)gmail.com]

> > I am also eagerly awaiting Nitin Gupta's cleancache backend
> > and implementation to do in-kernel page cache compression.
>
> Do Nitin say he will make backend of cleancache for
> page cache compression?
>
> It would be good feature.
> I have a interest, too. :)

That was Nitin's plan for his GSOC project when we last discussed
this. Nitin is on the cc list and can comment if this has
changed.

> > By "move", do you mean changing the virtual mappings? Yes,
> > this could be done as long as the source and destination are
> > both directly addressable (that is, true physical RAM), but
> > requires TLB manipulation and has some complicated corner
> > cases. The copy semantics simplifies the implementation on
> > both the "frontend" and the "backend" and also allows the
> > backend to do fancy things on-the-fly like page compression
> > and page deduplication.
>
> Agree. But I don't mean it.
> If I use brd as backend, i want to do it follwing as.
>
> <snip>
>
> Of course, I know it's impossible without new metadata and
> modification of page cache handling and it makes front and
> backend's good layered design.
>
> What I want is to remove copy overhead when backend is ram
> and it's also part of main memory(ie, we have page descriptor).
>
> Do you have an idea?

Copy overhead on modern processors is very low now due to
very wide memory buses. The additional metadata and code
to handle coherency and concurrency, plus existing overhead
for batching and asynchronous access to brd is likely much
higher than the cost to avoid copying.

But if you did implement this without copying, I think
you might need a different set of hooks in various places.
I don't know.

> > Or did you mean a cleancache_ops "backend"? For tmem, there
> > is one file linux/drivers/xen/tmem.c and it interfaces between
> > the cleancache_ops calls and Xen hypercalls. It should be in
> > a Xenlinux pv_ops tree soon, or I can email it sooner.
>
> I mean "backend". :)

I dropped the code used for a RHEL6beta Xen tmem driver here:
http://oss.oracle.com/projects/tmem/dist/files/RHEL6beta/tmem-backend.patch

Thanks,
Dan
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Nitin Gupta on 2 Jun 2010 23:00

On 06/03/2010 04:32 AM, Dan Magenheimer wrote:
>> From: Minchan Kim [mailto:minchan.kim(a)gmail.com]
>
>>> I am also eagerly awaiting Nitin Gupta's cleancache backend
>>> and implementation to do in-kernel page cache compression.
>>
>> Do Nitin say he will make backend of cleancache for
>> page cache compression?
>>
>> It would be good feature.
>> I have a interest, too. :)
>
> That was Nitin's plan for his GSOC project when we last discussed
> this. Nitin is on the cc list and can comment if this has
> changed.
>

Yes, I have just started work on in-kernel page cache compression
backend for cleancache :)

Thanks,
Nitin
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Andreas Dilger on 3 Jun 2010 01:00

On 2010-06-02, at 20:46, Nitin Gupta wrote:
> On 06/03/2010 04:32 AM, Dan Magenheimer wrote:
>>> From: Minchan Kim [mailto:minchan.kim(a)gmail.com]
>>
>>>> I am also eagerly awaiting Nitin Gupta's cleancache backend
>>>> and implementation to do in-kernel page cache compression.
>>>
>>> Do Nitin say he will make backend of cleancache for
>>> page cache compression?
>>>
>>> It would be good feature.
>>> I have a interest, too. :)
>>
>> That was Nitin's plan for his GSOC project when we last discussed
>> this. Nitin is on the cc list and can comment if this has
>> changed.
>
> Yes, I have just started work on in-kernel page cache compression
> backend for cleancache :)

Is there a design doc for this implementation? I was thinking it would be quite clever to do compression in, say, 64kB or 128kB chunks in a mapping (to get decent compression) and then write these compressed chunks directly from the page cache to disk in btrfs and/or a revived compressed ext4.

That would mean that the on-disk compression algorithm needs to match the in-memory algorithm, which implies that the in-memory compression algorithm should be selectable on a per-mapping basis.

Cheers, Andreas
--
Andreas Dilger
Lustre Technical Lead
Oracle Corporation Canada Inc.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

First | Prev | Next | Last
Pages: 1 2 3
Prev: [PATCH] module: module_unload_init() cleanup
Next: module: module_unload_init() cleanup