Cleancache: overview [Kernel]

Prev: [BUGFIX] kvm, Fix a race condition for usage of is_hwpoison_address
Next: [PATCH v2.1 1/4] netfilter: xt_ipvs (netfilter matcher for IPVS)

From: Dan Magenheimer on 23 Jul 2010 10:10

> Since zcache is now one of its use cases, I think the major
> objection that remains against cleancache is its intrusiveness
> -- in particular, need to change individual filesystems (even
> though one liners). Changes below should help avoid these
> per-fs changes and make it more self contained.

Hi Nitin --

I think my reply at http://lkml.org/lkml/2010/6/22/202 adequately
refutes the claim of intrusiveness (43 lines!). And FAQ #2 near
the end of the original posting at http://lkml.org/lkml/2010/6/21/411
explains why the per-fs "opt-in" approach is sensible and necessary.

CHRISTOPH AND ANDREW, if you disagree and your concerns have
not been resolved, please speak up.

Further, the maintainers of the changed filesystems have acked
the very minor cleancache patches; and maintainers of other
filesystems are not affected unless they choose to opt-in,
whereas these other filesystems MAY be affected with your
suggested changes to the patches.

So I think it's just a matter of waiting for the Linux wheels
to turn for a patch that (however lightly) touches a number of
maintainers' code, though I would very much welcome any
input on anything I can do to make those wheels turn faster.

Thanks,
Dan
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Dan Magenheimer on 23 Jul 2010 10:50

> From: Christoph Hellwig [mailto:hch(a)infradead.org]
> Subject: Re: [PATCH V3 0/8] Cleancache: overview
>
> On Fri, Jul 23, 2010 at 06:58:03AM -0700, Dan Magenheimer wrote:
> > CHRISTOPH AND ANDREW, if you disagree and your concerns have
> > not been resolved, please speak up.

Hi Christoph --

Thanks very much for the quick (instantaneous?) reply!

> Anything that need modification of a normal non-shared fs is utterly
> broken and you'll get a clear NAK, so the propsal before is a good
> one.

Unless/until all filesystems are 100% built on top of VFS,
I have to disagree. Abstractions (e.g. VFS) are never perfect.
And the relevant filesystem maintainers have acked, so I'm
wondering who you are NAK'ing for?

Nitin's proposal attempts to move the VFS hooks around to fix
usage for one fs (btrfs) that, for whatever reason, has
chosen to not layer itself completely on top of VFS; this
sounds to me like a recipe for disaster.
I think Minchan's reply quickly pointed out one issue...
what other filesystems that haven't been changed might
encounter a rare data corruption issue because cleancache is
transparently enabled for its page cache pages?

It also drops requires support to be dropped entirely for
another fs (ocfs2) which one user (zcache) can't use, but
the other (tmem) makes very good use of.

No, the per-fs opt-in is very sensible; and its design is
very minimal.

Could you please explain your objection further?

> There's a couple more issues like the still weird prototypes,
> e.g. and i_ino might not be enoug to uniquely identify an inode
> on serveral filesystems that use 64-bit inode inode numbers on 32-bit
> systems.

This reinforces my per-fs opt-in point. Such filesystems
should not enable cleancache (or enable them only on the
appropriate systems).

> Also making the ops vector global is just a bad idea.
> There is nothing making this sort of caching inherently global.

I'm not sure I understand your point, but two very different
users of cleancache have been provided, and more will be
discussed at the MM summit next month.

Do you have a suggestion on how to avoid a global ops
vector while still serving the needs of both existing
users?

Thanks,
Dan

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Nitin Gupta on 23 Jul 2010 11:00

On 07/23/2010 01:46 PM, Minchan Kim wrote:
> On Fri, Jul 23, 2010 at 4:36 PM, Nitin Gupta <ngupta(a)vflare.org> wrote:
>>
>> 2. I think change in btrfs can be avoided by moving cleancache_get_page()
>> from do_mpage_reapage() to filemap_fault() and this should work for all
>> filesystems. See:
>>
>> handle_pte_fault() -> do_(non)linear_fault() -> __do_fault()
>> -> vma->vm_ops->fault()
>>
>> which is defined as filemap_fault() for all filesystems. If some future
>> filesystem uses its own custom function (why?) then it will have to arrange for
>> call to cleancache_get_page(), if it wants this feature.
>
>
> filemap fault works only in case of file-backed page which is mapped
> but don't work not-mapped cache page. So we could miss cache page by
> read system call if we move it into filemap_fault.
>
>

Oh, yes. Then we need cleancache_get_page() call in do_generic_file_read() too.
So, if I am missing anything now, we should now be able to get rid of per-fs
changes.

Thanks,
Nitin
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Nitin Gupta on 23 Jul 2010 11:10

On 07/23/2010 08:14 PM, Dan Magenheimer wrote:
>> From: Christoph Hellwig [mailto:hch(a)infradead.org]

>> Also making the ops vector global is just a bad idea.
>> There is nothing making this sort of caching inherently global.
>
> I'm not sure I understand your point, but two very different
> users of cleancache have been provided, and more will be
> discussed at the MM summit next month.
>
> Do you have a suggestion on how to avoid a global ops
> vector while still serving the needs of both existing
> users?

Maybe introduce cleancache_register(struct cleancache_ops *ops)?
This will allow making cleancache_ops non-global. No value add
but maybe that's cleaner?

Thanks,
Nitin
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Dan Magenheimer on 23 Jul 2010 13:40

> From: Dan Magenheimer
> Subject: RE: [PATCH V3 0/8] Cleancache: overview
>
> > From: Christoph Hellwig [mailto:hch(a)infradead.org]
> > Subject: Re: [PATCH V3 0/8] Cleancache: overview
> >
> > On Fri, Jul 23, 2010 at 06:58:03AM -0700, Dan Magenheimer wrote:
> > > CHRISTOPH AND ANDREW, if you disagree and your concerns have
> > > not been resolved, please speak up.
>
> Hi Christoph --
>
> Thanks very much for the quick (instantaneous?) reply!
>
> > Anything that need modification of a normal non-shared fs is utterly
> > broken and you'll get a clear NAK, so the propsal before is a good
> > one.
>
> Unless/until all filesystems are 100% built on top of VFS,
> I have to disagree. Abstractions (e.g. VFS) are never perfect.

After thinking about this some more, I can see a way
to enforce "opt-in" in the cleancache backend without
any changes to non-generic fs code. I think it's a horrible
hack and we can try it, but I expect fs maintainers
would prefer the explicit one-line-patch opt-in.

1) Cleancache backend maintains a list of "known working"
filesystems (those that have been tested).

2) Nitin's proposed changes pass the *sb as a parameter.
The string name of the filesystem type is available via
sb->s_type->name. This can be compared against
the "known working" list.

Using the sb pointer as a "handle" requires an extra
table search on every cleancache get/put/flush,
and fs/super.c changes are required for fs unmount
notification anyway (e.g. to call cleancache_flush_fs)
so I'd prefer to keep the cleancache_poolid addition
to the sb. I'll assume this is OK since this is in generic
fs code.

Dan
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

First | Prev | Next | Last
Pages: 1 2 3 4
Prev: [BUGFIX] kvm, Fix a race condition for usage of is_hwpoison_address
Next: [PATCH v2.1 1/4] netfilter: xt_ipvs (netfilter matcher for IPVS)