mm: implement per-zone shrinker [Kernel]

Prev: fs: icache lazy lru
Next: mm: iommu: An API to unify IOMMU, CPU and device memory management

From: Dave Chinner on 30 Jun 2010 02:30

On Thu, Jun 24, 2010 at 01:03:02PM +1000, npiggin(a)suse.de wrote:
> Allow the shrinker to do per-zone shrinking. This means it is called for
> each zone scanned. The shrinker is now completely responsible for calculating
> and batching (given helpers), which provides better flexibility.
>
> Finding the ratio of objects to scan requires scaling the ratio of pagecache
> objects scanned. By passing down both the per-zone and the global reclaimable
> pages, per-zone caches and global caches can be calculated correctly.
>
> Finally, add some fixed-point scaling to the ratio, which helps calculations.
>
> Signed-off-by: Nick Piggin <npiggin(a)suse.de>
> ---
> fs/dcache.c | 2
> fs/drop_caches.c | 2
> fs/inode.c | 2
> fs/mbcache.c | 4 -
> fs/nfs/dir.c | 2
> fs/nfs/internal.h | 2
> fs/quota/dquot.c | 2
> include/linux/mm.h | 6 +-
> mm/vmscan.c | 131 ++++++++++++++---------------------------------------
> 9 files changed, 47 insertions(+), 106 deletions(-)

The diffstat doesn't match the patch ;)

> Index: linux-2.6/include/linux/mm.h
> ===================================================================
> --- linux-2.6.orig/include/linux/mm.h
> +++ linux-2.6/include/linux/mm.h
> @@ -999,16 +999,19 @@ static inline void sync_mm_rss(struct ta
> * querying the cache size, so a fastpath for that case is appropriate.
> */
> struct shrinker {
> - int (*shrink)(int nr_to_scan, gfp_t gfp_mask);
> - int seeks; /* seeks to recreate an obj */
> -
> + int (*shrink)(struct zone *zone, unsigned long scanned, unsigned long total,
> + unsigned long global, gfp_t gfp_mask);

Can we add the shrinker structure to taht callback, too, so that we
can get away from needing global context for the shrinker?

> +unsigned long shrinker_do_scan(unsigned long *dst, unsigned long batch)
> +{
> + unsigned long nr = ACCESS_ONCE(*dst);

What's the point of ACCESS_ONCE() here?

/me gets most of the way into the patch

Oh, it's because you are using static variables for nr_to_scan and
hence when concurrent shrinkers are running they are all
incrementing and decrementing the same variable. That doesn't sound
like a good idea to me - concurrent shrinkers are much more likely
with per-zone shrinker callouts. It seems to me that a reclaim
thread could be kept in a shrinker long after it has run it's
scan count if new shrinker calls from a different reclaim context
occur before the first has finished....

As a further question - why do some shrinkerѕ get converted to a
single global nr_to_scan, and others get converted to a private
nr_to_scan? Shouldn't they all use the same method? The static
variable method looks to me to be full of races - concurrent callers
to shrinker_add_scan() does not look at all thread safe to me.

> + if (nr < batch)
> + return 0;

Why wouldn't we return nr here to drain the remaining objects?
Doesn't this mean we can't shrink caches that have a scan count of
less than SHRINK_BATCH?

> + *dst = nr - batch;

Similarly, that is not a threadsafe update.

> + return batch;
> +}
> +EXPORT_SYMBOL(shrinker_do_scan);
> +
> /*
> * Call the shrink functions to age shrinkable caches
> *
> @@ -198,8 +228,8 @@ EXPORT_SYMBOL(unregister_shrinker);
> *
> * Returns the number of slab objects which we shrunk.
> */
> -unsigned long shrink_slab(unsigned long scanned, gfp_t gfp_mask,
> - unsigned long lru_pages)
> +static unsigned long shrink_slab(struct zone *zone, unsigned long scanned, unsigned long total,
> + unsigned long global, gfp_t gfp_mask)
> {
> struct shrinker *shrinker;
> unsigned long ret = 0;
> @@ -211,55 +241,25 @@ unsigned long shrink_slab(unsigned long
> return 1; /* Assume we'll be able to shrink next time */
>
> list_for_each_entry(shrinker, &shrinker_list, list) {
> - unsigned long long delta;
> - unsigned long total_scan;
> - unsigned long max_pass = (*shrinker->shrink)(0, gfp_mask);
> -
> - delta = (4 * scanned) / shrinker->seeks;
> - delta *= max_pass;
> - do_div(delta, lru_pages + 1);
> - shrinker->nr += delta;
> - if (shrinker->nr < 0) {
> - printk(KERN_ERR "shrink_slab: %pF negative objects to "
> - "delete nr=%ld\n",
> - shrinker->shrink, shrinker->nr);
> - shrinker->nr = max_pass;
> - }
> -
> - /*
> - * Avoid risking looping forever due to too large nr value:
> - * never try to free more than twice the estimate number of
> - * freeable entries.
> - */
> - if (shrinker->nr > max_pass * 2)
> - shrinker->nr = max_pass * 2;
> -
> - total_scan = shrinker->nr;
> - shrinker->nr = 0;
> -
> - while (total_scan >= SHRINK_BATCH) {
> - long this_scan = SHRINK_BATCH;
> - int shrink_ret;
> - int nr_before;
> -
> - nr_before = (*shrinker->shrink)(0, gfp_mask);
> - shrink_ret = (*shrinker->shrink)(this_scan, gfp_mask);
> - if (shrink_ret == -1)
> - break;
> - if (shrink_ret < nr_before)
> - ret += nr_before - shrink_ret;
> - count_vm_events(SLABS_SCANNED, this_scan);
> - total_scan -= this_scan;
> -
> - cond_resched();

Removing this means we need cond_resched() in all shrinker loops now
to maintain the same latencies as we currently have. I note that
you've done this for most of the shrinkers, but the documentation
needs to be updated to mention this...

> - }
> -
> - shrinker->nr += total_scan;

And dropping this means we do not carry over the remainder of the
previous scan into the next scan. This means we could be scanning a
lot less with this new code.

> + (*shrinker->shrink)(zone, scanned, total, global, gfp_mask);
> }
> up_read(&shrinker_rwsem);
> return ret;
> }
>
> +void shrink_all_slab(void)
> +{
> + struct zone *zone;
> + unsigned long nr;
> +
> +again:
> + nr = 0;
> + for_each_zone(zone)
> + nr += shrink_slab(zone, 1, 1, 1, GFP_KERNEL);
> + if (nr >= 10)
> + goto again;

do {
nr = 0;
for_each_zone(zone)
nr += shrink_slab(zone, 1, 1, 1, GFP_KERNEL);
} while (nr >= 10);

> @@ -1705,6 +1708,23 @@ static void shrink_zone(int priority, st
> if (inactive_anon_is_low(zone, sc) && nr_swap_pages > 0)
> shrink_active_list(SWAP_CLUSTER_MAX, zone, sc, priority, 0);
>
> + /*
> + * Don't shrink slabs when reclaiming memory from
> + * over limit cgroups
> + */
> + if (scanning_global_lru(sc)) {
> + struct reclaim_state *reclaim_state = current->reclaim_state;
> +
> + shrink_slab(zone, sc->nr_scanned - nr_scanned,
> + lru_pages, global_lru_pages, sc->gfp_mask);
> + if (reclaim_state) {
> + nr_reclaimed += reclaim_state->reclaimed_slab;
> + reclaim_state->reclaimed_slab = 0;
> + }
> + }

So effectively we are going to be calling shrink_slab() once per
zone instead of once per priority loop, right? That means we are
going to be doing a lot more concurrent shrink_slab() calls that the
current code. Combine that with the removal of residual aggregation,
I think this will alter the reclaim balance somewhat. Have you tried
to quantify this?

> Index: linux-2.6/fs/dcache.c
> ===================================================================
> --- linux-2.6.orig/fs/dcache.c
> +++ linux-2.6/fs/dcache.c
> @@ -748,20 +748,26 @@ again2:
> *
> * This function may fail to free any resources if all the dentries are in use.
> */
> -static void prune_dcache(int count)
> +static void prune_dcache(struct zone *zone, unsigned long scanned,
> + unsigned long total, gfp_t gfp_mask)
> +
> {
> + unsigned long nr_to_scan;
> struct super_block *sb, *n;
> int w_count;
> - int unused = dentry_stat.nr_unused;
> int prune_ratio;
> - int pruned;
> + int count, pruned;
>
> - if (unused == 0 || count == 0)
> + shrinker_add_scan(&nr_to_scan, scanned, total, dentry_stat.nr_unused,
> + DEFAULT_SEEKS * sysctl_vfs_cache_pressure / 100);
> +done:
> + count = shrinker_do_scan(&nr_to_scan, SHRINK_BATCH);
> + if (dentry_stat.nr_unused == 0 || count == 0)
> return;
> - if (count >= unused)
> + if (count >= dentry_stat.nr_unused)
> prune_ratio = 1;
> else
> - prune_ratio = unused / count;
> + prune_ratio = dentry_stat.nr_unused / count;
> spin_lock(&sb_lock);
> list_for_each_entry_safe(sb, n, &super_blocks, s_list) {
> if (list_empty(&sb->s_instances))
> @@ -810,6 +816,10 @@ static void prune_dcache(int count)
> break;
> }
> spin_unlock(&sb_lock);
> + if (count <= 0) {
> + cond_resched();
> + goto done;
> + }
> }
>
> /**
> @@ -1176,19 +1186,15 @@ EXPORT_SYMBOL(shrink_dcache_parent);
> *
> * In this case we return -1 to tell the caller that we baled.
> */
> -static int shrink_dcache_memory(int nr, gfp_t gfp_mask)
> +static int shrink_dcache_memory(struct zone *zone, unsigned long scanned,
> + unsigned long total, unsigned long global, gfp_t gfp_mask)
> {
> - if (nr) {
> - if (!(gfp_mask & __GFP_FS))
> - return -1;
> - prune_dcache(nr);
> - }
> - return (dentry_stat.nr_unused / 100) * sysctl_vfs_cache_pressure;
> + prune_dcache(zone, scanned, global, gfp_mask);
> + return 0;
> }

I would have thought that putting the shrinker_add_scan/
shrinker_do_scan loop in shrink_dcache_memory() and leaving
prune_dcache untouched would have been a better separation.
I note that this is what you did with prune_icache(), so consistency
between the two would be good ;)

Also, this patch drops the __GFP_FS check from the dcache shrinker -
not intentional, right?

> @@ -211,28 +215,38 @@ mb_cache_shrink_fn(int nr_to_scan, gfp_t
> atomic_read(&cache->c_entry_count));
> count += atomic_read(&cache->c_entry_count);
> }
> + shrinker_add_scan(&nr_to_scan, scanned, global, count,
> + DEFAULT_SEEKS * sysctl_vfs_cache_pressure / 100);
> mb_debug("trying to free %d entries", nr_to_scan);
> - if (nr_to_scan == 0) {
> +
> +again:
> + nr = shrinker_do_scan(&nr_to_scan, SHRINK_BATCH);
> + if (!nr) {
> spin_unlock(&mb_cache_spinlock);
> - goto out;
> + return 0;
> }
> - while (nr_to_scan-- && !list_empty(&mb_cache_lru_list)) {
> + while (!list_empty(&mb_cache_lru_list)) {
> struct mb_cache_entry *ce =
> list_entry(mb_cache_lru_list.next,
> struct mb_cache_entry, e_lru_list);
> list_move_tail(&ce->e_lru_list, &free_list);
> __mb_cache_entry_unhash(ce);
> + cond_resched_lock(&mb_cache_spinlock);
> + if (!--nr)
> + break;
> }
> spin_unlock(&mb_cache_spinlock);
> list_for_each_safe(l, ltmp, &free_list) {
> __mb_cache_entry_forget(list_entry(l, struct mb_cache_entry,
> e_lru_list), gfp_mask);
> }
> -out:
> - return (count / 100) * sysctl_vfs_cache_pressure;
> + if (!nr) {
> + spin_lock(&mb_cache_spinlock);
> + goto again;
> + }

Another candidate for a do-while loop.

> + return 0;
> }
>
> -
> /*
> * mb_cache_create() create a new cache
> *
> Index: linux-2.6/fs/nfs/dir.c
> ===================================================================
> --- linux-2.6.orig/fs/nfs/dir.c
> +++ linux-2.6/fs/nfs/dir.c
> @@ -1709,21 +1709,31 @@ static void nfs_access_free_list(struct
> }
> }
>
> -int nfs_access_cache_shrinker(int nr_to_scan, gfp_t gfp_mask)
> +int nfs_access_cache_shrinker(struct zone *zone, unsigned long scanned,
> + unsigned long total, unsigned long global, gfp_t gfp_mask)
> {
> + static unsigned long nr_to_scan;
> LIST_HEAD(head);
> - struct nfs_inode *nfsi;
> struct nfs_access_entry *cache;
> -
> - if ((gfp_mask & GFP_KERNEL) != GFP_KERNEL)
> - return (nr_to_scan == 0) ? 0 : -1;
> + unsigned long nr;
>
> spin_lock(&nfs_access_lru_lock);
> - list_for_each_entry(nfsi, &nfs_access_lru_list, access_cache_inode_lru) {
> + shrinker_add_scan(&nr_to_scan, scanned, global,
> + atomic_long_read(&nfs_access_nr_entries),
> + DEFAULT_SEEKS * sysctl_vfs_cache_pressure / 100);
> + if (!(gfp_mask & __GFP_FS) || nr_to_scan < SHRINK_BATCH) {
> + spin_unlock(&nfs_access_lru_lock);
> + return 0;
> + }
> + nr = ACCESS_ONCE(nr_to_scan);
> + nr_to_scan = 0;

That's not safe for concurrent callers. Both could get nr =
nr_to_scan rather than nr(1) = nr_to_scan and nr(2) = 0 which I
think is the intent....

> Index: linux-2.6/arch/x86/kvm/mmu.c
> ===================================================================
> --- linux-2.6.orig/arch/x86/kvm/mmu.c
> +++ linux-2.6/arch/x86/kvm/mmu.c
> @@ -2924,14 +2924,29 @@ static int kvm_mmu_remove_some_alloc_mmu
> return kvm_mmu_zap_page(kvm, page) + 1;
> }
>
> -static int mmu_shrink(int nr_to_scan, gfp_t gfp_mask)
> +static int mmu_shrink(struct zone *zone, unsigned long scanned,
> + unsigned long total, unsigned long global, gfp_t gfp_mask)
> {
> + static unsigned long nr_to_scan;
> struct kvm *kvm;
> struct kvm *kvm_freed = NULL;
> - int cache_count = 0;
> + unsigned long cache_count = 0;
>
> spin_lock(&kvm_lock);
> + list_for_each_entry(kvm, &vm_list, vm_list) {
> + cache_count += kvm->arch.n_alloc_mmu_pages -
> + kvm->arch.n_free_mmu_pages;
> + }
>
> + shrinker_add_scan(&nr_to_scan, scanned, global, cache_count,
> + DEFAULT_SEEKS*10);
> +
> +done:
> + cache_count = shrinker_do_scan(&nr_to_scan, SHRINK_BATCH);
> + if (!cache_count) {
> + spin_unlock(&kvm_lock);
> + return 0;
> + }

I note that this use of a static scan count is thread safe because
all the calculations are done under the kvm_lock. THat's three
different ways the shrinkers implement the same functionality
now....

> Index: linux-2.6/fs/xfs/linux-2.6/xfs_sync.c
> ===================================================================
> --- linux-2.6.orig/fs/xfs/linux-2.6/xfs_sync.c
> +++ linux-2.6/fs/xfs/linux-2.6/xfs_sync.c
> @@ -838,43 +838,52 @@ static struct rw_semaphore xfs_mount_lis
>
> static int
> xfs_reclaim_inode_shrink(
> - int nr_to_scan,
> + struct zone *zone,
> + unsigned long scanned,
> + unsigned long total,
> + unsigned long global,
> gfp_t gfp_mask)
> {
> + static unsigned long nr_to_scan;
> + int nr;
> struct xfs_mount *mp;
> struct xfs_perag *pag;
> xfs_agnumber_t ag;
> - int reclaimable = 0;
> -
> - if (nr_to_scan) {
> - if (!(gfp_mask & __GFP_FS))
> - return -1;
> -
> - down_read(&xfs_mount_list_lock);
> - list_for_each_entry(mp, &xfs_mount_list, m_mplist) {
> - xfs_inode_ag_iterator(mp, xfs_reclaim_inode, 0,
> - XFS_ICI_RECLAIM_TAG, 1, &nr_to_scan);
> - if (nr_to_scan <= 0)
> - break;
> - }
> - up_read(&xfs_mount_list_lock);
> - }
> + unsigned long nr_reclaimable = 0;
>
> down_read(&xfs_mount_list_lock);
> list_for_each_entry(mp, &xfs_mount_list, m_mplist) {
> for (ag = 0; ag < mp->m_sb.sb_agcount; ag++) {
> pag = xfs_perag_get(mp, ag);
> - reclaimable += pag->pag_ici_reclaimable;
> + nr_reclaimable += pag->pag_ici_reclaimable;
> xfs_perag_put(pag);
> }
> }
> + shrinker_add_scan(&nr_to_scan, scanned, global, nr_reclaimable,
> + DEFAULT_SEEKS);

That's not thread safe - it's under a read lock. This code really
needs a shrinker context....

> + if (!(gfp_mask & __GFP_FS)) {
> + up_read(&xfs_mount_list_lock);
> + return 0;
> + }
> +
> +done:
> + nr = shrinker_do_scan(&nr_to_scan, SHRINK_BATCH);
> + if (!nr) {
> + up_read(&xfs_mount_list_lock);
> + return 0;
> + }
> + list_for_each_entry(mp, &xfs_mount_list, m_mplist) {
> + xfs_inode_ag_iterator(mp, xfs_reclaim_inode, 0,
> + XFS_ICI_RECLAIM_TAG, 1, &nr);
> + if (nr <= 0)
> + goto done;
> + }

That's missing conditional reschedules....

Cheers,

Dave.
--
Dave Chinner
david(a)fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Nick Piggin on 30 Jun 2010 10:40

Wow, some reviewing! Thanks Dave.

On Wed, Jun 30, 2010 at 04:28:58PM +1000, Dave Chinner wrote:
> On Thu, Jun 24, 2010 at 01:03:02PM +1000, npiggin(a)suse.de wrote:
> > 9 files changed, 47 insertions(+), 106 deletions(-)
>
> The diffstat doesn't match the patch ;)

Bah, sorry.

> > Index: linux-2.6/include/linux/mm.h
> > ===================================================================
> > --- linux-2.6.orig/include/linux/mm.h
> > +++ linux-2.6/include/linux/mm.h
> > @@ -999,16 +999,19 @@ static inline void sync_mm_rss(struct ta
> > * querying the cache size, so a fastpath for that case is appropriate.
> > */
> > struct shrinker {
> > - int (*shrink)(int nr_to_scan, gfp_t gfp_mask);
> > - int seeks; /* seeks to recreate an obj */
> > -
> > + int (*shrink)(struct zone *zone, unsigned long scanned, unsigned long total,
> > + unsigned long global, gfp_t gfp_mask);
>
> Can we add the shrinker structure to taht callback, too, so that we
> can get away from needing global context for the shrinker?

I was planning to merge this on top of your shrinker change (which I
like how the locking / refcounting worked out). So I was just going to
leave that part for you :)

> > +unsigned long shrinker_do_scan(unsigned long *dst, unsigned long batch)
> > +{
> > + unsigned long nr = ACCESS_ONCE(*dst);
>
> What's the point of ACCESS_ONCE() here?
>
> /me gets most of the way into the patch
>
> Oh, it's because you are using static variables for nr_to_scan and
> hence when concurrent shrinkers are running they are all
> incrementing and decrementing the same variable. That doesn't sound
> like a good idea to me - concurrent shrinkers are much more likely
> with per-zone shrinker callouts. It seems to me that a reclaim
> thread could be kept in a shrinker long after it has run it's
> scan count if new shrinker calls from a different reclaim context
> occur before the first has finished....

I don't think parallelism will be much changed. The existing shrinker
didn't provide any serialisation. It likewise did not serialise any
updates to shrinker->nr accumulator (reclaim is a crappy heuristic
anyway so it apparently doesn't matter too much that it is racy). So
a lot of your criticism of racy access to the accumulators isn't really
inherent to this patch

(where it's easy, I did put them under locks, but I didn't go out of my
way -- a subsequent patch could do that if we really wanted)

> As a further question - why do some shrinkerѕ get converted to a
> single global nr_to_scan, and others get converted to a private
> nr_to_scan? Shouldn't they all use the same method? The static
> variable method looks to me to be full of races - concurrent callers
> to shrinker_add_scan() does not look at all thread safe to me.

Hmm, they should all have their own nr_to_scan.

> > + if (nr < batch)
> > + return 0;
>
> Why wouldn't we return nr here to drain the remaining objects?

I was thinking, because it's not worth taking locks for a small
number of objects.

> Doesn't this mean we can't shrink caches that have a scan count of
> less than SHRINK_BATCH?

No, they just accumulate slowly until hitting the batch size.

> > - count_vm_events(SLABS_SCANNED, this_scan);
> > - total_scan -= this_scan;
> > -
> > - cond_resched();
>
> Removing this means we need cond_resched() in all shrinker loops now
> to maintain the same latencies as we currently have. I note that
> you've done this for most of the shrinkers, but the documentation
> needs to be updated to mention this...

That's true, yes.

> > - }
> > -
> > - shrinker->nr += total_scan;
>
> And dropping this means we do not carry over the remainder of the
> previous scan into the next scan. This means we could be scanning a
> lot less with this new code.

We do because they accumulate to static variables. It's effectively
the same as accumulating to shrinker->nr, but it allows the per-zone
patches to change to accumulate to per-zone counters.

> > +again:
> > + nr = 0;
> > + for_each_zone(zone)
> > + nr += shrink_slab(zone, 1, 1, 1, GFP_KERNEL);
> > + if (nr >= 10)
> > + goto again;
>
> do {
> nr = 0;
> for_each_zone(zone)
> nr += shrink_slab(zone, 1, 1, 1, GFP_KERNEL);
> } while (nr >= 10);

OK.

> > @@ -1705,6 +1708,23 @@ static void shrink_zone(int priority, st
> > if (inactive_anon_is_low(zone, sc) && nr_swap_pages > 0)
> > shrink_active_list(SWAP_CLUSTER_MAX, zone, sc, priority, 0);
> >
> > + /*
> > + * Don't shrink slabs when reclaiming memory from
> > + * over limit cgroups
> > + */
> > + if (scanning_global_lru(sc)) {
> > + struct reclaim_state *reclaim_state = current->reclaim_state;
> > +
> > + shrink_slab(zone, sc->nr_scanned - nr_scanned,
> > + lru_pages, global_lru_pages, sc->gfp_mask);
> > + if (reclaim_state) {
> > + nr_reclaimed += reclaim_state->reclaimed_slab;
> > + reclaim_state->reclaimed_slab = 0;
> > + }
> > + }
>
> So effectively we are going to be calling shrink_slab() once per
> zone instead of once per priority loop, right? That means we are

Yes.

> going to be doing a lot more concurrent shrink_slab() calls that the
> current code. Combine that with the removal of residual aggregation,
> I think this will alter the reclaim balance somewhat. Have you tried
> to quantify this?

It will alter reclaim a bit. I don't think it will change the
concurrency too much (per-prio which gets chopped into batch
size calls into shrinker versus per-zone call which the shrinker
chops up itself).

Basically, the number of items to scan should be about the same,
and chopped into the same number of batches. It just depends on
exactly when it gets done.

> > -static int shrink_dcache_memory(int nr, gfp_t gfp_mask)
> > +static int shrink_dcache_memory(struct zone *zone, unsigned long scanned,
> > + unsigned long total, unsigned long global, gfp_t gfp_mask)
> > {
> > - if (nr) {
> > - if (!(gfp_mask & __GFP_FS))
> > - return -1;
> > - prune_dcache(nr);
> > - }
> > - return (dentry_stat.nr_unused / 100) * sysctl_vfs_cache_pressure;
> > + prune_dcache(zone, scanned, global, gfp_mask);
> > + return 0;
> > }
>
> I would have thought that putting the shrinker_add_scan/
> shrinker_do_scan loop in shrink_dcache_memory() and leaving
> prune_dcache untouched would have been a better separation.
> I note that this is what you did with prune_icache(), so consistency
> between the two would be good ;)

You're probably right, I'll go back and take a look.

> Also, this patch drops the __GFP_FS check from the dcache shrinker -
> not intentional, right?

Right, thanks.

> > +again:
> > + nr = shrinker_do_scan(&nr_to_scan, SHRINK_BATCH);
> > + if (!nr) {
> > spin_unlock(&mb_cache_spinlock);
> > - goto out;
> > + return 0;
> > }
> > - while (nr_to_scan-- && !list_empty(&mb_cache_lru_list)) {
> > + while (!list_empty(&mb_cache_lru_list)) {
> > struct mb_cache_entry *ce =
> > list_entry(mb_cache_lru_list.next,
> > struct mb_cache_entry, e_lru_list);
> > list_move_tail(&ce->e_lru_list, &free_list);
> > __mb_cache_entry_unhash(ce);
> > + cond_resched_lock(&mb_cache_spinlock);
> > + if (!--nr)
> > + break;
> > }
> > spin_unlock(&mb_cache_spinlock);
> > list_for_each_safe(l, ltmp, &free_list) {
> > __mb_cache_entry_forget(list_entry(l, struct mb_cache_entry,
> > e_lru_list), gfp_mask);
> > }
> > -out:
> > - return (count / 100) * sysctl_vfs_cache_pressure;
> > + if (!nr) {
> > + spin_lock(&mb_cache_spinlock);
> > + goto again;
> > + }
>
> Another candidate for a do-while loop.

Maybe. I prefer not to indent so much (then one would argue to put the
body in a seperate function :), but meh)

> > + nr = ACCESS_ONCE(nr_to_scan);
> > + nr_to_scan = 0;
>
> That's not safe for concurrent callers. Both could get nr =
> nr_to_scan rather than nr(1) = nr_to_scan and nr(2) = 0 which I
> think is the intent....

....

> I note that this use of a static scan count is thread safe because
> all the calculations are done under the kvm_lock. THat's three
> different ways the shrinkers implement the same functionality
> now....

....

> That's not thread safe - it's under a read lock. This code really
> needs a shrinker context....

So as I said above, lost updates are not cared about.

> > + if (!(gfp_mask & __GFP_FS)) {
> > + up_read(&xfs_mount_list_lock);
> > + return 0;
> > + }
> > +
> > +done:
> > + nr = shrinker_do_scan(&nr_to_scan, SHRINK_BATCH);
> > + if (!nr) {
> > + up_read(&xfs_mount_list_lock);
> > + return 0;
> > + }
> > + list_for_each_entry(mp, &xfs_mount_list, m_mplist) {
> > + xfs_inode_ag_iterator(mp, xfs_reclaim_inode, 0,
> > + XFS_ICI_RECLAIM_TAG, 1, &nr);
> > + if (nr <= 0)
> > + goto done;
> > + }
>
> That's missing conditional reschedules....

Thanks

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

First | Prev |
Pages: 1 2
Prev: fs: icache lazy lru
Next: mm: iommu: An API to unify IOMMU, CPU and device memory management