From: Minchan Kim on
Hello.

I think cleancache approach is cool. :)
I have some suggestions and questions.

On Sat, May 29, 2010 at 2:35 AM, Dan Magenheimer
<dan.magenheimer(a)oracle.com> wrote:
> [PATCH V2 0/7] Cleancache (was Transcendent Memory): overview
>
> Changes since V1:
> - Rebased to 2.6.34 (no functional changes)
> - Convert to sane types (Al Viro)
> - Define some raw constants (Konrad Wilk)
> - Add ack from Andreas Dilger
>
> In previous patch postings, cleancache was part of the Transcendent
> Memory ("tmem") patchset.  This patchset refocuses not on the underlying
> technology (tmem) but instead on the useful functionality provided for Linux,
> and provides a clean API so that cleancache can provide this very useful
> functionality either via a Xen tmem driver OR completely independent of tmem.
> For example: Nitin Gupta (of compcache and ramzswap fame) is implementing
> an in-kernel compression "backend" for cleancache; some believe
> cleancache will be a very nice interface for building RAM-like functionality
> for pseudo-RAM devices such as SSD or phase-change memory; and a Pune
> University team is looking at a backend for virtio (see OLS'2010).
>
> A more complete description of cleancache can be found in the introductory
> comment in mm/cleancache.c (in PATCH 2/7) which is included below
> for convenience.
>
> Note that an earlier version of this patch is now shipping in OpenSuSE 11.2
> and will soon ship in a release of Oracle Enterprise Linux.  Underlying
> tmem technology is now shipping in Oracle VM 2.2 and was just released
> in Xen 4.0 on April 15, 2010.  (Search news.google.com for Transcendent
> Memory)
>
> Signed-off-by: Dan Magenheimer <dan.magenheimer(a)oracle.com>
> Reviewed-by: Jeremy Fitzhardinge <jeremy(a)goop.org>
>
>  fs/btrfs/extent_io.c       |    9 +
>  fs/btrfs/super.c           |    2
>  fs/buffer.c                |    5 +
>  fs/ext3/super.c            |    2
>  fs/ext4/super.c            |    2
>  fs/mpage.c                 |    7 +
>  fs/ocfs2/super.c           |    3
>  fs/super.c                 |    8 +
>  include/linux/cleancache.h |   90 +++++++++++++++++++
>  include/linux/fs.h         |    5 +
>  mm/Kconfig                 |   22 ++++
>  mm/Makefile                |    1
>  mm/cleancache.c            |  203 +++++++++++++++++++++++++++++++++++++++++++++
>  mm/filemap.c               |   11 ++
>  mm/truncate.c              |   10 ++
>  15 files changed, 380 insertions(+)
>
> Cleancache can be thought of as a page-granularity victim cache for clean
> pages that the kernel's pageframe replacement algorithm (PFRA) would like
> to keep around, but can't since there isn't enough memory.  So when the
> PFRA "evicts" a page, it first attempts to put it into a synchronous
> concurrency-safe page-oriented pseudo-RAM device (such as Xen's Transcendent
> Memory, aka "tmem", or in-kernel compressed memory, aka "zmem", or other
> RAM-like devices) which is not directly accessible or addressable by the
> kernel and is of unknown and possibly time-varying size.  And when a
> cleancache-enabled filesystem wishes to access a page in a file on disk,
> it first checks cleancache to see if it already contains it; if it does,
> the page is copied into the kernel and a disk access is avoided.
> This pseudo-RAM device links itself to cleancache by setting the
> cleancache_ops pointer appropriately and the functions it provides must
> conform to certain semantics as follows:
>
> Most important, cleancache is "ephemeral".  Pages which are copied into
> cleancache have an indefinite lifetime which is completely unknowable
> by the kernel and so may or may not still be in cleancache at any later time.
> Thus, as its name implies, cleancache is not suitable for dirty pages.  The
> pseudo-RAM has complete discretion over what pages to preserve and what
> pages to discard and when.
>
> A filesystem calls "init_fs" to obtain a pool id which, if positive, must be
> saved in the filesystem's superblock; a negative return value indicates
> failure.  A "put_page" will copy a (presumably about-to-be-evicted) page into
> pseudo-RAM and associate it with the pool id, the file inode, and a page
> index into the file.  (The combination of a pool id, an inode, and an index
> is called a "handle".)  A "get_page" will copy the page, if found, from
> pseudo-RAM into kernel memory.  A "flush_page" will ensure the page no longer
> is present in pseudo-RAM; a "flush_inode" will flush all pages associated
> with the specified inode; and a "flush_fs" will flush all pages in all
> inodes specified by the given pool id.
>
> A "init_shared_fs", like init, obtains a pool id but tells the pseudo-RAM
> to treat the pool as shared using a 128-bit UUID as a key.  On systems
> that may run multiple kernels (such as hard partitioned or virtualized
> systems) that may share a clustered filesystem, and where the pseudo-RAM
> may be shared among those kernels, calls to init_shared_fs that specify the
> same UUID will receive the same pool id, thus allowing the pages to
> be shared.  Note that any security requirements must be imposed outside
> of the kernel (e.g. by "tools" that control the pseudo-RAM).  Or a
> pseudo-RAM implementation can simply disable shared_init by always
> returning a negative value.
>
> If a get_page is successful on a non-shared pool, the page is flushed (thus
> making cleancache an "exclusive" cache).  On a shared pool, the page

Do you have any reason about force "exclusive" on a non-shared pool?
To free memory on pesudo-RAM?
I want to make it "inclusive" by some reason but unfortunately I can't
say why I want it now.

While you mentioned it's "exclusive", cleancache_get_page doesn't
flush the page at below code.
Is it a role of user who implement cleancache_ops->get_page?

+int __cleancache_get_page(struct page *page)
+{
+ int ret = 0;
+ int pool_id = page->mapping->host->i_sb->cleancache_poolid;
+
+ if (pool_id >= 0) {
+ ret = (*cleancache_ops->get_page)(pool_id,
+ page->mapping->host->i_ino,
+ page->index,
+ page);
+ if (ret == CLEANCACHE_GET_PAGE_SUCCESS)
+ succ_gets++;
+ else
+ failed_gets++;
+ }
+ return ret;
+}
+EXPORT_SYMBOL(__cleancache_get_page);

If backed device is ram(ie), Could we _move_ the pages from page cache
to cleancache?
I mean I don't want to copy page when get/put operation. we can just
move page in case of backed device "ram". Is it possible?

You send the patches which is core of cleancache but I don't see any use case.
Could you send use case patches with this series?
It could help understand cleancache's benefit.

--
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Jamie Lokier on
Dan Magenheimer wrote:
> Most important, cleancache is "ephemeral". Pages which are copied into
> cleancache have an indefinite lifetime which is completely unknowable
> by the kernel and so may or may not still be in cleancache at any later time.
> Thus, as its name implies, cleancache is not suitable for dirty pages. The
> pseudo-RAM has complete discretion over what pages to preserve and what
> pages to discard and when.

Fwiw, the feature sounds useful to userspace too, for those things
with memory hungry caches like web browsers. Any plans to make it
available to userspace?

Thanks,
-- Jamie
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Christoph Hellwig on
Please give your patches some semi-resonable subject line.

> fs/btrfs/super.c | 2
> fs/buffer.c | 5 +
> fs/ext3/super.c | 2
> fs/ext4/super.c | 2
> fs/mpage.c | 7 +
> fs/ocfs2/super.c | 3
> fs/super.c | 8 +

This is missing out a whole lot of filesystems. Even more so why the
hell do you need hooks into the filesystem?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Dan Magenheimer on
Hi Minchan --

> I think cleancache approach is cool. :)
> I have some suggestions and questions.

Thanks for your interest!

> > If a get_page is successful on a non-shared pool, the page is flushed
> (thus
> > making cleancache an "exclusive" cache).  On a shared pool, the page
>
> Do you have any reason about force "exclusive" on a non-shared pool?
> To free memory on pesudo-RAM?
> I want to make it "inclusive" by some reason but unfortunately I can't
> say why I want it now.

The main reason is to free up memory in pseudo-RAM and to
avoid unnecessary cleancache_flush calls. If you want
inclusive, the page can be put immediately following
the get. If put-after-get for inclusive becomes common,
the interface could easily be extended to add a "get_no_flush"
call.

> While you mentioned it's "exclusive", cleancache_get_page doesn't
> flush the page at below code.
> Is it a role of user who implement cleancache_ops->get_page?

Yes, the flush is done by the cleancache implementation.

> If backed device is ram(ie), Could we _move_ the pages from page cache
> to cleancache?
> I mean I don't want to copy page when get/put operation. we can just
> move page in case of backed device "ram". Is it possible?

By "move", do you mean changing the virtual mappings? Yes,
this could be done as long as the source and destination are
both directly addressable (that is, true physical RAM), but
requires TLB manipulation and has some complicated corner
cases. The copy semantics simplifies the implementation on
both the "frontend" and the "backend" and also allows the
backend to do fancy things on-the-fly like page compression
and page deduplication.

> You send the patches which is core of cleancache but I don't see any
> use case.
> Could you send use case patches with this series?
> It could help understand cleancache's benefit.

Do you mean the Xen Transcendent Memory ("tmem") implementation?
If so, this is four files in the Xen source tree (common/tmem.c,
common/tmem_xen.c, include/xen/tmem.h, include/xen/tmem_xen.h).
There is also an html document in the Xen source tree, which can
be viewed here:
http://oss.oracle.com/projects/tmem/dist/documentation/internals/xen4-internals-v01.html

Or did you mean a cleancache_ops "backend"? For tmem, there
is one file linux/drivers/xen/tmem.c and it interfaces between
the cleancache_ops calls and Xen hypercalls. It should be in
a Xenlinux pv_ops tree soon, or I can email it sooner.

I am also eagerly awaiting Nitin Gupta's cleancache backend
and implementation to do in-kernel page cache compression.

Thanks,
Dan
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Dan Magenheimer on
> From: Jamie Lokier [mailto:jamie(a)shareable.org]
> Subject: Re: [PATCH V2 0/7] Cleancache (was Transcendent Memory):
> overview
>
> Dan Magenheimer wrote:
> > Most important, cleancache is "ephemeral". Pages which are copied
> into
> > cleancache have an indefinite lifetime which is completely unknowable
> > by the kernel and so may or may not still be in cleancache at any
> later time.
> > Thus, as its name implies, cleancache is not suitable for dirty
> pages. The
> > pseudo-RAM has complete discretion over what pages to preserve and
> what
> > pages to discard and when.
>
> Fwiw, the feature sounds useful to userspace too, for those things
> with memory hungry caches like web browsers. Any plans to make it
> available to userspace?

No plans yet, though we agree it sounds useful, at least for
apps that bypass the page cache (e.g. O_DIRECT). If you have
time and interest to investigate this further, I'd be happy
to help. Send email offlist.

Thanks,
Dan
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/