From: Stephen Rothwell on
Hi Greg,

After merging the driver-core tree, today's linux-next build (x86_64
allmodconfig) failed like this:

fs/ceph/msgpool.c: In function 'ceph_msgpool_put':
fs/ceph/msgpool.c:173: error: implicit declaration of function 'kref_set'

Caused by commit 10c5d9fdc9ba89606b34f01cbe6ea287abba7395 ("kref: remove
kref_set") from the driver-core tree interacting with commit
c2e552e76e2c6907ca50cd9a4b747a2e2e8c615e ("ceph: use kref for ceph_msg")
from the ceph tree.

I applied the following patch for today (which may not be correct):

[Sage, if this patch is correct, it should be applied to the ceph tree.]

From: Stephen Rothwell <sfr(a)canb.auug.org.au>
Date: Wed, 17 Mar 2010 15:35:22 +1100
Subject: [PATCH] ceph: update for removal of kref_set

Signed-off-by: Stephen Rothwell <sfr(a)canb.auug.org.au>
---
fs/ceph/msgpool.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/fs/ceph/msgpool.c b/fs/ceph/msgpool.c
index ca3b44a..030297f 100644
--- a/fs/ceph/msgpool.c
+++ b/fs/ceph/msgpool.c
@@ -170,7 +170,7 @@ void ceph_msgpool_put(struct ceph_msgpool *pool, struct ceph_msg *msg)
msg->front.iov_len = pool->front_len;
msg->hdr.front_len = cpu_to_le32(pool->front_len);

- kref_set(&msg->kref, 1); /* retake a single ref */
+ kref_init(&msg->kref); /* retake a single ref */
list_add(&msg->list_head, &pool->msgs);
pool->num++;
dout("msgpool_put %p reclaim %p, now %d/%d\n", pool, msg,
--
1.7.0

--
Cheers,
Stephen Rothwell sfr(a)canb.auug.org.au
http://www.canb.auug.org.au/~sfr/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Neil Brown on
On Wed, 17 Mar 2010 15:41:45 +1100
Stephen Rothwell <sfr(a)canb.auug.org.au> wrote:

> Hi Greg,
>
> After merging the driver-core tree, today's linux-next build (x86_64
> allmodconfig) failed like this:
>
> fs/ceph/msgpool.c: In function 'ceph_msgpool_put':
> fs/ceph/msgpool.c:173: error: implicit declaration of function 'kref_set'
>
> Caused by commit 10c5d9fdc9ba89606b34f01cbe6ea287abba7395 ("kref: remove
> kref_set") from the driver-core tree interacting with commit
> c2e552e76e2c6907ca50cd9a4b747a2e2e8c615e ("ceph: use kref for ceph_msg")
> from the ceph tree.
>
> I applied the following patch for today (which may not be correct):

I would say this is correct.

The msg has seen it's last_put and is being placed in the pool of
free messages, so it needs to be in the same state that ceph_msg_new
(called in __fill_msgpool) leaves it in.
ceph_msg_new used kref_init, so ceph_msgpool_put should use kref_init too to
match.

It is a pity that this code cannot use mempool_t....
What if mempool_t were changed to only re-alloc the vector of pointers when
it grew, or when it shrank to less than 1/2 it's current size. Would that
reduce the frequency of allocations enough for you to be comfortable with it?
i.e. always make the vector a power-of-2 size (which is what is probably
allocated anyway) while the pool size might be less.
??

NeilBrown

>
> [Sage, if this patch is correct, it should be applied to the ceph tree.]
>
> From: Stephen Rothwell <sfr(a)canb.auug.org.au>
> Date: Wed, 17 Mar 2010 15:35:22 +1100
> Subject: [PATCH] ceph: update for removal of kref_set
>
> Signed-off-by: Stephen Rothwell <sfr(a)canb.auug.org.au>
> ---
> fs/ceph/msgpool.c | 2 +-
> 1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/fs/ceph/msgpool.c b/fs/ceph/msgpool.c
> index ca3b44a..030297f 100644
> --- a/fs/ceph/msgpool.c
> +++ b/fs/ceph/msgpool.c
> @@ -170,7 +170,7 @@ void ceph_msgpool_put(struct ceph_msgpool *pool, struct ceph_msg *msg)
> msg->front.iov_len = pool->front_len;
> msg->hdr.front_len = cpu_to_le32(pool->front_len);
>
> - kref_set(&msg->kref, 1); /* retake a single ref */
> + kref_init(&msg->kref); /* retake a single ref */
> list_add(&msg->list_head, &pool->msgs);
> pool->num++;
> dout("msgpool_put %p reclaim %p, now %d/%d\n", pool, msg,

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Sage Weil on
On Wed, 17 Mar 2010, Neil Brown wrote:

> On Wed, 17 Mar 2010 15:41:45 +1100
> Stephen Rothwell <sfr(a)canb.auug.org.au> wrote:
>
> > Hi Greg,
> >
> > After merging the driver-core tree, today's linux-next build (x86_64
> > allmodconfig) failed like this:
> >
> > fs/ceph/msgpool.c: In function 'ceph_msgpool_put':
> > fs/ceph/msgpool.c:173: error: implicit declaration of function 'kref_set'
> >
> > Caused by commit 10c5d9fdc9ba89606b34f01cbe6ea287abba7395 ("kref: remove
> > kref_set") from the driver-core tree interacting with commit
> > c2e552e76e2c6907ca50cd9a4b747a2e2e8c615e ("ceph: use kref for ceph_msg")
> > from the ceph tree.
> >
> > I applied the following patch for today (which may not be correct):
>
> I would say this is correct.

Yeah, the fix is good, thanks Stephen! I'll add it to my tree shortly.

> It is a pity that this code cannot use mempool_t....
> What if mempool_t were changed to only re-alloc the vector of pointers when
> it grew, or when it shrank to less than 1/2 it's current size. Would that
> reduce the frequency of allocations enough for you to be comfortable with it?
> i.e. always make the vector a power-of-2 size (which is what is probably
> allocated anyway) while the pool size might be less.
> ??

That would improve the situation, but still mean potentially large
allocations (the pools can grow pretty big) that aren't strictly
necessary. I can imagine a more modular mempool_t with an ops vector for
adding/removing from the pool to cope with situations like this, but I'm
not sure it's worth the effort?

sage
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Stephen Rothwell on
Hi guys,

On Wed, 17 Mar 2010 08:51:49 -0700 (PDT) Sage Weil <sage(a)newdream.net> wrote:
>
> On Wed, 17 Mar 2010, Neil Brown wrote:
>
> > I would say this is correct.
>
> Yeah, the fix is good, thanks Stephen! I'll add it to my tree shortly.

Thanks for the confirmation.
--
Cheers,
Stephen Rothwell sfr(a)canb.auug.org.au
http://www.canb.auug.org.au/~sfr/
From: Neil Brown on
On Wed, 17 Mar 2010 08:51:49 -0700 (PDT)
Sage Weil <sage(a)newdream.net> wrote:


> > It is a pity that this code cannot use mempool_t....
> > What if mempool_t were changed to only re-alloc the vector of pointers when
> > it grew, or when it shrank to less than 1/2 it's current size. Would that
> > reduce the frequency of allocations enough for you to be comfortable with it?
> > i.e. always make the vector a power-of-2 size (which is what is probably
> > allocated anyway) while the pool size might be less.
> > ??
>
> That would improve the situation, but still mean potentially large
> allocations (the pools can grow pretty big) that aren't strictly
> necessary. I can imagine a more modular mempool_t with an ops vector for
> adding/removing from the pool to cope with situations like this, but I'm
> not sure it's worth the effort?

How big?
mempools (and equivalents) should just be large enough to get you through a
tight spot. The working assumption is that they will not normally be used.
So 2 or 3 should normally be plenty.

(looks at code)

The only time you resize a ceph_mempool is in ceph_monc_do_statfs
where you increment it, perform a synchronous statfs call on the
network, then decrement the size of the mempool.
How many concurrent statfs calls does it even make sense to make.
I'm probably missing something obvious, but wouldn't it make sense to
put that all under a mutex so there was only ever one outstanding statfs (per
filesystem) - or maybe under a counting semaphore to allow some small number,
and make sure to prime the mempool to cover that number.
Then you would never resize a mempool at all.

I notice that all other mempools that ceph uses are sensibly quite small and
stay that way.

NeilBrown
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/