Cleancache: overview [Kernel]

Prev: [BUGFIX] kvm, Fix a race condition for usage of is_hwpoison_address
Next: [PATCH v2.1 1/4] netfilter: xt_ipvs (netfilter matcher for IPVS)

From: Dan Magenheimer on 23 Jul 2010 13:50

> From: Nitin Gupta [mailto:ngupta(a)vflare.org]
> Sent: Friday, July 23, 2010 9:05 AM
> To: Dan Magenheimer
> Cc: Christoph Hellwig; akpm(a)linux-foundation.org; Chris Mason;
> viro(a)zeniv.linux.org.uk; adilger(a)sun.com; tytso(a)mit.edu;
> mfasheh(a)suse.com; Joel Becker; matthew(a)wil.cx; linux-
> btrfs(a)vger.kernel.org; linux-kernel(a)vger.kernel.org; linux-
> fsdevel(a)vger.kernel.org; linux-ext4(a)vger.kernel.org; ocfs2-
> devel(a)oss.oracle.com; linux-mm(a)kvack.org; jeremy(a)goop.org;
> JBeulich(a)novell.com; Kurt Hackel; npiggin(a)suse.de; Dave Mccracken;
> riel(a)redhat.com; avi(a)redhat.com; Konrad Wilk
> Subject: Re: [PATCH V3 0/8] Cleancache: overview
>
> On 07/23/2010 08:14 PM, Dan Magenheimer wrote:
> >> From: Christoph Hellwig [mailto:hch(a)infradead.org]
>
>
> >> Also making the ops vector global is just a bad idea.
> >> There is nothing making this sort of caching inherently global.
> >
> > I'm not sure I understand your point, but two very different
> > users of cleancache have been provided, and more will be
> > discussed at the MM summit next month.
> >
> > Do you have a suggestion on how to avoid a global ops
> > vector while still serving the needs of both existing
> > users?
>
> Maybe introduce cleancache_register(struct cleancache_ops *ops)?
> This will allow making cleancache_ops non-global. No value add
> but maybe that's cleaner?

Oh, OK, that seems reasonable.

Dan
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Nitin Gupta on 23 Jul 2010 14:40

On 07/23/2010 11:07 PM, Dan Magenheimer wrote:
>> From: Dan Magenheimer
>> Subject: RE: [PATCH V3 0/8] Cleancache: overview
>>
>>> From: Christoph Hellwig [mailto:hch(a)infradead.org]
>>> Subject: Re: [PATCH V3 0/8] Cleancache: overview
>>>
>>> On Fri, Jul 23, 2010 at 06:58:03AM -0700, Dan Magenheimer wrote:
>>>> CHRISTOPH AND ANDREW, if you disagree and your concerns have
>>>> not been resolved, please speak up.
>>
>> Hi Christoph --
>>
>> Thanks very much for the quick (instantaneous?) reply!
>>
>>> Anything that need modification of a normal non-shared fs is utterly
>>> broken and you'll get a clear NAK, so the propsal before is a good
>>> one.
>>
>> Unless/until all filesystems are 100% built on top of VFS,
>> I have to disagree. Abstractions (e.g. VFS) are never perfect.
>
> After thinking about this some more, I can see a way
> to enforce "opt-in" in the cleancache backend without
> any changes to non-generic fs code. I think it's a horrible
> hack and we can try it, but I expect fs maintainers
> would prefer the explicit one-line-patch opt-in.
>
> 1) Cleancache backend maintains a list of "known working"
> filesystems (those that have been tested).

Checks against "known working list" indeed looks horrible.
Isn't there any way to identify pagecache -> disk I/O boundaries
which every filesystem obeys? I'm not yet sure but if this is
doable, then we won't require such hacks.

>
> 2) Nitin's proposed changes pass the *sb as a parameter.
> The string name of the filesystem type is available via
> sb->s_type->name. This can be compared against
> the "known working" list.
>

sb->s_magic could also be used, or better if we can somehow
get rid of these checks :)

> Using the sb pointer as a "handle" requires an extra
> table search on every cleancache get/put/flush,
> and fs/super.c changes are required for fs unmount
> notification anyway (e.g. to call cleancache_flush_fs)
> so I'd prefer to keep the cleancache_poolid addition
> to the sb. I'll assume this is OK since this is in generic
> fs code.
>

I will also try making changes to cleancache so it does not
touch any fs specific code. Though IMHO one liners to fs-code
should really be acceptable but unfortunately this doesn't seem
to be the case. Maybe generic cleancache will have better
chances.

Thanks,
Nitin
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Dan Magenheimer on 23 Jul 2010 17:20

> > On Fri, Jul 23, 2010 at 06:58:03AM -0700, Dan Magenheimer wrote:
> > > CHRISTOPH AND ANDREW, if you disagree and your concerns have
> > > not been resolved, please speak up.
>
> Hi Christoph --
>
> Thanks very much for the quick (instantaneous?) reply!
>
> > Anything that need modification of a normal non-shared fs is utterly
> > broken and you'll get a clear NAK, so the propsal before is a good
> > one.
>
> No, the per-fs opt-in is very sensible; and its design is
> very minimal.

Not to belabor the point, but maybe the right way to think about
this is:

Cleancache is a new optional feature provided by the VFS layer
that potentially dramatically increases page cache effectiveness
for many workloads in many environments at a negligible cost.

Filesystems that are well-behaved and conform to certain restrictions
can utilize cleancache simply by making a call to cleancache_init_fs
at mount time. Unusual, misbehaving, or poorly layered filesystems
must either add additional hooks and/or undergo extensive additional
testing... or should just not enable the optional cleancache.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Boaz Harrosh on 3 Aug 2010 12:30

On 07/24/2010 12:17 AM, Dan Magenheimer wrote:
>>> On Fri, Jul 23, 2010 at 06:58:03AM -0700, Dan Magenheimer wrote:
>>>> CHRISTOPH AND ANDREW, if you disagree and your concerns have
>>>> not been resolved, please speak up.
>>
>> Hi Christoph --
>>
>> Thanks very much for the quick (instantaneous?) reply!
>>
>>> Anything that need modification of a normal non-shared fs is utterly
>>> broken and you'll get a clear NAK, so the propsal before is a good
>>> one.
>>
>> No, the per-fs opt-in is very sensible; and its design is
>> very minimal.
>
> Not to belabor the point, but maybe the right way to think about
> this is:
>
> Cleancache is a new optional feature provided by the VFS layer
> that potentially dramatically increases page cache effectiveness
> for many workloads in many environments at a negligible cost.
>
> Filesystems that are well-behaved and conform to certain restrictions
> can utilize cleancache simply by making a call to cleancache_init_fs
> at mount time. Unusual, misbehaving, or poorly layered filesystems
> must either add additional hooks and/or undergo extensive additional
> testing... or should just not enable the optional cleancache.

OK, So I maintain a filesystem in Kernel. How do I know if my FS
is not "Unusual, misbehaving, or poorly layered"

Thanks
Boaz
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Dan Magenheimer on 3 Aug 2010 13:40

> From: Boaz Harrosh [mailto:bharrosh(a)panasas.com]
> Sent: Tuesday, August 03, 2010 10:23 AM
> To: Dan Magenheimer
> Subject: Re: [PATCH V3 0/8] Cleancache: overview
>
> On 07/24/2010 12:17 AM, Dan Magenheimer wrote:
> >>> On Fri, Jul 23, 2010 at 06:58:03AM -0700, Dan Magenheimer wrote:
> >>>> CHRISTOPH AND ANDREW, if you disagree and your concerns have
> >>>> not been resolved, please speak up.
> >>
> >> Hi Christoph --
> >>
> >> Thanks very much for the quick (instantaneous?) reply!
> >>
> >>> Anything that need modification of a normal non-shared fs is
> utterly
> >>> broken and you'll get a clear NAK, so the propsal before is a good
> >>> one.
> >>
> >> No, the per-fs opt-in is very sensible; and its design is
> >> very minimal.
> >
> > Not to belabor the point, but maybe the right way to think about
> > this is:
> >
> > Cleancache is a new optional feature provided by the VFS layer
> > that potentially dramatically increases page cache effectiveness
> > for many workloads in many environments at a negligible cost.
> >
> > Filesystems that are well-behaved and conform to certain restrictions
> > can utilize cleancache simply by making a call to cleancache_init_fs
> > at mount time. Unusual, misbehaving, or poorly layered filesystems
> > must either add additional hooks and/or undergo extensive additional
> > testing... or should just not enable the optional cleancache.
>
> OK, So I maintain a filesystem in Kernel. How do I know if my FS
> is not "Unusual, misbehaving, or poorly layered"

A reasonable question. I'm not a FS expert so this may not be
a complete answer, but please consider it a start:

- The FS should be block-device-based (e.g. a ram-based FS
such as tmpfs should not enable cleancache)

- To ensure coherency/correctness, the FS must ensure that all
file removal or truncation operations either go through VFS
or add hooks to do the equivalent "flush" operations (e.g.
I started looking at FS-cache-based net FS's and was concerned
there might be problems, dunno for sure)

- To ensure coherency/correctness, inode numbers must be unique
(e.g. no emulating 64-bit inode space on 32-bit inode numbers)

- The FS must call the VFS superblock alloc and deactivate routines
or add hooks to do the equivalent cleancache calls done there.

- To maximize performance, all pages fetched from the FS should
go through the do_mpage_readpage routine or the FS should add
hooks to do the equivalent (e.g. btrfs requires a hook for this)

- Currently, the FS blocksize must be the same as PAGESIZE. This
is not an architectural restriction, but no backends currently
support anything different (e.g. hugetlbfs? should not enable
cleancache)

- A clustered FS should invoke the "shared_init_fs" cleancache
hook to get best performance for some backends.

Does that help?

Thanks,
Dan
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

First | Prev | Next | Last
Pages: 1 2 3 4
Prev: [BUGFIX] kvm, Fix a race condition for usage of is_hwpoison_address
Next: [PATCH v2.1 1/4] netfilter: xt_ipvs (netfilter matcher for IPVS)