From: Christoph Hellwig on
What all this fails to explain is that this actually is useful for?

Your series adds lots of crappy code, entiely stupid interactions with a
handfull filesystems, but no actual users.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Konrad Rzeszutek Wilk on
On Mon, Jun 21, 2010 at 04:18:09PM -0700, Dan Magenheimer wrote:
> [PATCH V3 0/8] Cleancache: overview

Dan,

Two comments:
- Mention where one can get the implementor of the cleancache API.
Either a link to where the patches reside or a git branch.
If you need pointers on branch names:
http://lkml.org/lkml/2010/6/7/269

- Point out the presentation you did on this. It has an excellent
overview of how this API works, and most importantly: a) images
and b). performance numbers.

Otherwise, please consider all of these patches to have
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk(a)oracle.com>

tag.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Nitin Gupta on
On 06/22/2010 04:48 AM, Dan Magenheimer wrote:
> [PATCH V3 0/8] Cleancache: overview
>
<snip>
>
> Documentation/ABI/testing/sysfs-kernel-mm-cleancache | 11 +
> Documentation/vm/cleancache.txt | 194 +++++++++++++++++++
> fs/btrfs/extent_io.c | 9
> fs/btrfs/super.c | 2
> fs/buffer.c | 5
> fs/ext3/super.c | 2
> fs/ext4/super.c | 2
> fs/mpage.c | 7
> fs/ocfs2/super.c | 3
> fs/super.c | 7
> include/linux/cleancache.h | 88 ++++++++
> include/linux/fs.h | 5
> mm/Kconfig | 22 ++
> mm/Makefile | 1
> mm/cleancache.c | 169 ++++++++++++++++
> mm/filemap.c | 11 +
> mm/truncate.c | 10
> 17 files changed, 548 insertions(+)
>
> (following is a copy of Documentation/vm/cleancache.txt)
>
> MOTIVATION
>
> Cleancache can be thought of as a page-granularity victim cache for clean
> pages that the kernel's pageframe replacement algorithm (PFRA) would like
> to keep around, but can't since there isn't enough memory. So when the
> PFRA "evicts" a page, it first attempts to put it into a synchronous
> concurrency-safe page-oriented "pseudo-RAM" device (such as Xen's Transcendent
> Memory, aka "tmem", or in-kernel compressed memory, aka "zmem", or other
> RAM-like devices) which is not directly accessible or addressable by the
> kernel and is of unknown and possibly time-varying size. And when a
> cleancache-enabled filesystem wishes to access a page in a file on disk,
> it first checks cleancache to see if it already contains it; if it does,
> the page is copied into the kernel and a disk access is avoided.
>


Since zcache is now one of its use cases, I think the major objection that
remains against cleancache is its intrusiveness -- in particular, need to
change individual filesystems (even though one liners). Changes below should
help avoid these per-fs changes and make it more self contained. I haven't
tested these changes myself, so there might be missed cases or other mysterious
problems:

1. Cleancache requires filesystem specific changes primarily to make a call to
cleancache init and store (per-fs instance) pool_id. I think we can get rid of
these by directly passing 'struct super_block' pointer which is also
sufficient to identify FS instance a page belongs to. This should then be used
as a 'handle' by cleancache_ops provider to find corresponding memory pool or
create a new pool when a new handle is encountered.

This leaves out case of ocfs2 for which cleancache needs 'uuid' to decide if a
shared pool should be created. IMHO, this case (and cleancache.init_shared_fs)
should be removed from cleancache_ops since it is applicable only for Xen's
cleancache_ops provider.

2. I think change in btrfs can be avoided by moving cleancache_get_page()
from do_mpage_reapage() to filemap_fault() and this should work for all
filesystems. See:

handle_pte_fault() -> do_(non)linear_fault() -> __do_fault()
-> vma->vm_ops->fault()

which is defined as filemap_fault() for all filesystems. If some future
filesystem uses its own custom function (why?) then it will have to arrange for
call to cleancache_get_page(), if it wants this feature.

With above changes, cleancache will be fairly self-contained:
- cleancache_put_page() when page is removed from page-cache
- cleacacache_get_page() when PF occurs (and after page-cache is searched)
- cleancache_flush_*() on truncate_*()

Thanks,
Nitin
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Minchan Kim on
On Fri, Jul 23, 2010 at 4:36 PM, Nitin Gupta <ngupta(a)vflare.org> wrote:
>
> 2. I think change in btrfs can be avoided by moving cleancache_get_page()
> from do_mpage_reapage() to filemap_fault() and this should work for all
> filesystems. See:
>
> handle_pte_fault() -> do_(non)linear_fault() -> __do_fault()
> � � � � � � � � � � � � � � � � � � � � � � � �-> vma->vm_ops->fault()
>
> which is defined as filemap_fault() for all filesystems. If some future
> filesystem uses its own custom function (why?) then it will have to arrange for
> call to cleancache_get_page(), if it wants this feature.


filemap fault works only in case of file-backed page which is mapped
but don't work not-mapped cache page. So we could miss cache page by
read system call if we move it into filemap_fault.


--
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Christoph Hellwig on
On Fri, Jul 23, 2010 at 06:58:03AM -0700, Dan Magenheimer wrote:
> CHRISTOPH AND ANDREW, if you disagree and your concerns have
> not been resolved, please speak up.

Anything that need modification of a normal non-shared fs is utterly
broken and you'll get a clear NAK, so the propsal before is a good
one. There's a couple more issues like the still weird prototypes,
e.g. and i_ino might not be enoug to uniquely identify an inode
on serveral filesystems that use 64-bit inode inode numbers on 32-bit
systems. Also making the ops vector global is just a bad idea.
There is nothing making this sort of caching inherently global.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/