From: Wu Fengguang on
On Tue, Apr 06, 2010 at 09:25:36AM +0800, Li, Shaohua wrote:
> On Sun, Apr 04, 2010 at 10:19:06PM +0800, KOSAKI Motohiro wrote:
> > > On Fri, Apr 02, 2010 at 05:14:38PM +0800, KOSAKI Motohiro wrote:
> > > > > > > This patch makes a lot of sense than previous. however I think <1% anon ratio
> > > > > > > shouldn't happen anyway because file lru doesn't have reclaimable pages.
> > > > > > > <1% seems no good reclaim rate.
> > > > > >
> > > > > > Oops, the above mention is wrong. sorry. only 1 page is still too big.
> > > > > > because under streaming io workload, the number of scanning anon pages should
> > > > > > be zero. this is very strong requirement. if not, backup operation will makes
> > > > > > a lot of swapping out.
> > > > > Sounds there is no big impact for the workload which you mentioned with the patch.
> > > > > please see below descriptions.
> > > > > I updated the description of the patch as fengguang suggested.
> > > >
> > > > Umm.. sorry, no.
> > > >
> > > > "one fix but introduce another one bug" is not good deal. instead,
> > > > I'll revert the guilty commit at first as akpm mentioned.
> > > Even we revert the commit, the patch still has its benefit, as it increases
> > > calculation precision, right?
> >
> > no, you shouldn't ignore the regression case.

> I don't think this is serious. In my calculation, there is only 1 page swapped out
> for 6G anonmous memory. 1 page should haven't any performance impact.

1 anon page scanned for every N file pages scanned?

Is N a _huge_ enough ratio so that the anon list will be very light scanned?

Rik: here is a little background.

Under streaming IO, the current get_scan_ratio() will get a percent[0]
that is (much) less than 1, so underflows to 0.

It has the bad effect of completely disabling the scan of anon list,
which leads to OOM in Shaohua's test case. OTOH, it also has the good
side effect of keeping anon pages in memory and totally prevent swap
IO.

Shaohua's patch improves the computation precision by computing nr[]
directly in get_scan_ratio(). This is good in general, however will
enable light scan of the anon list on streaming IO.

Thanks,
Fengguang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: KOSAKI Motohiro on
> On Tue, Apr 06, 2010 at 09:25:36AM +0800, Li, Shaohua wrote:
> > On Sun, Apr 04, 2010 at 10:19:06PM +0800, KOSAKI Motohiro wrote:
> > > > On Fri, Apr 02, 2010 at 05:14:38PM +0800, KOSAKI Motohiro wrote:
> > > > > > > > This patch makes a lot of sense than previous. however I think <1% anon ratio
> > > > > > > > shouldn't happen anyway because file lru doesn't have reclaimable pages.
> > > > > > > > <1% seems no good reclaim rate.
> > > > > > >
> > > > > > > Oops, the above mention is wrong. sorry. only 1 page is still too big.
> > > > > > > because under streaming io workload, the number of scanning anon pages should
> > > > > > > be zero. this is very strong requirement. if not, backup operation will makes
> > > > > > > a lot of swapping out.
> > > > > > Sounds there is no big impact for the workload which you mentioned with the patch.
> > > > > > please see below descriptions.
> > > > > > I updated the description of the patch as fengguang suggested.
> > > > >
> > > > > Umm.. sorry, no.
> > > > >
> > > > > "one fix but introduce another one bug" is not good deal. instead,
> > > > > I'll revert the guilty commit at first as akpm mentioned.
> > > > Even we revert the commit, the patch still has its benefit, as it increases
> > > > calculation precision, right?
> > >
> > > no, you shouldn't ignore the regression case.
>
> > I don't think this is serious. In my calculation, there is only 1 page swapped out
> > for 6G anonmous memory. 1 page should haven't any performance impact.
>
> 1 anon page scanned for every N file pages scanned?
>
> Is N a _huge_ enough ratio so that the anon list will be very light scanned?
>
> Rik: here is a little background.

The problem is, the VM are couteniously discarding no longer used file
cache. if we are scan extra anon 1 page, we will observe tons swap usage
after few days.

please don't only think benchmark.


> Under streaming IO, the current get_scan_ratio() will get a percent[0]
> that is (much) less than 1, so underflows to 0.
>
> It has the bad effect of completely disabling the scan of anon list,
> which leads to OOM in Shaohua's test case. OTOH, it also has the good
> side effect of keeping anon pages in memory and totally prevent swap
> IO.
>
> Shaohua's patch improves the computation precision by computing nr[]
> directly in get_scan_ratio(). This is good in general, however will
> enable light scan of the anon list on streaming IO.

In such case, percent[0] should be big. I think underflowing is not point.
His test case is merely streaming io copy, why can't we drop tmpfs cached
page? his /proc/meminfo describe his machine didn't have droppable file cache.
so, big percent[1] value seems makes no sense. no?

I'm not sure we need either below detection. I need more investigate.
1) detect no discardable file cache
2) detect streaming io on tmpfs (as regular file)




--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Wu Fengguang on
On Tue, Apr 06, 2010 at 10:06:19AM +0800, KOSAKI Motohiro wrote:
> > On Tue, Apr 06, 2010 at 09:25:36AM +0800, Li, Shaohua wrote:
> > > On Sun, Apr 04, 2010 at 10:19:06PM +0800, KOSAKI Motohiro wrote:
> > > > > On Fri, Apr 02, 2010 at 05:14:38PM +0800, KOSAKI Motohiro wrote:
> > > > > > > > > This patch makes a lot of sense than previous. however I think <1% anon ratio
> > > > > > > > > shouldn't happen anyway because file lru doesn't have reclaimable pages.
> > > > > > > > > <1% seems no good reclaim rate.
> > > > > > > >
> > > > > > > > Oops, the above mention is wrong. sorry. only 1 page is still too big.
> > > > > > > > because under streaming io workload, the number of scanning anon pages should
> > > > > > > > be zero. this is very strong requirement. if not, backup operation will makes
> > > > > > > > a lot of swapping out.
> > > > > > > Sounds there is no big impact for the workload which you mentioned with the patch.
> > > > > > > please see below descriptions.
> > > > > > > I updated the description of the patch as fengguang suggested.
> > > > > >
> > > > > > Umm.. sorry, no.
> > > > > >
> > > > > > "one fix but introduce another one bug" is not good deal. instead,
> > > > > > I'll revert the guilty commit at first as akpm mentioned.
> > > > > Even we revert the commit, the patch still has its benefit, as it increases
> > > > > calculation precision, right?
> > > >
> > > > no, you shouldn't ignore the regression case.
> >
> > > I don't think this is serious. In my calculation, there is only 1 page swapped out
> > > for 6G anonmous memory. 1 page should haven't any performance impact.
> >
> > 1 anon page scanned for every N file pages scanned?
> >
> > Is N a _huge_ enough ratio so that the anon list will be very light scanned?
> >
> > Rik: here is a little background.
>
> The problem is, the VM are couteniously discarding no longer used file
> cache. if we are scan extra anon 1 page, we will observe tons swap usage
> after few days.
>
> please don't only think benchmark.

OK the days-of-streaming-io typically happen in file servers. Suppose
a file server with 16GB memory, 1GB of which is consumed by anonymous
pages, others are for page cache.

Assume that the exact file:anon ratio computed by the get_scan_ratio()
algorithm is 1000:1. In that case percent[0]=0.1 and is rounded down
to 0, which keeps the anon pages in memory for the few days.

Now with Shaohua's patch, nr[0] = (262144/4096)/1000 = 0.06 will also
be rounded down to 0. It only becomes >=1 when
- reclaim runs into trouble and priority goes low
- anon list goes huge

So I guess Shaohua's patch still has reasonable "underflow" threshold :)

Thanks,
Fengguang

>
> > Under streaming IO, the current get_scan_ratio() will get a percent[0]
> > that is (much) less than 1, so underflows to 0.
> >
> > It has the bad effect of completely disabling the scan of anon list,
> > which leads to OOM in Shaohua's test case. OTOH, it also has the good
> > side effect of keeping anon pages in memory and totally prevent swap
> > IO.
> >
> > Shaohua's patch improves the computation precision by computing nr[]
> > directly in get_scan_ratio(). This is good in general, however will
> > enable light scan of the anon list on streaming IO.
>
> In such case, percent[0] should be big. I think underflowing is not point.
> His test case is merely streaming io copy, why can't we drop tmpfs cached
> page? his /proc/meminfo describe his machine didn't have droppable file cache.
> so, big percent[1] value seems makes no sense. no?
>
> I'm not sure we need either below detection. I need more investigate.
> 1) detect no discardable file cache
> 2) detect streaming io on tmpfs (as regular file)
>
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: KOSAKI Motohiro on
> On Tue, Apr 06, 2010 at 10:06:19AM +0800, KOSAKI Motohiro wrote:
> > > On Tue, Apr 06, 2010 at 09:25:36AM +0800, Li, Shaohua wrote:
> > > > On Sun, Apr 04, 2010 at 10:19:06PM +0800, KOSAKI Motohiro wrote:
> > > > > > On Fri, Apr 02, 2010 at 05:14:38PM +0800, KOSAKI Motohiro wrote:
> > > > > > > > > > This patch makes a lot of sense than previous. however I think <1% anon ratio
> > > > > > > > > > shouldn't happen anyway because file lru doesn't have reclaimable pages.
> > > > > > > > > > <1% seems no good reclaim rate.
> > > > > > > > >
> > > > > > > > > Oops, the above mention is wrong. sorry. only 1 page is still too big.
> > > > > > > > > because under streaming io workload, the number of scanning anon pages should
> > > > > > > > > be zero. this is very strong requirement. if not, backup operation will makes
> > > > > > > > > a lot of swapping out.
> > > > > > > > Sounds there is no big impact for the workload which you mentioned with the patch.
> > > > > > > > please see below descriptions.
> > > > > > > > I updated the description of the patch as fengguang suggested.
> > > > > > >
> > > > > > > Umm.. sorry, no.
> > > > > > >
> > > > > > > "one fix but introduce another one bug" is not good deal. instead,
> > > > > > > I'll revert the guilty commit at first as akpm mentioned.
> > > > > > Even we revert the commit, the patch still has its benefit, as it increases
> > > > > > calculation precision, right?
> > > > >
> > > > > no, you shouldn't ignore the regression case.
> > >
> > > > I don't think this is serious. In my calculation, there is only 1 page swapped out
> > > > for 6G anonmous memory. 1 page should haven't any performance impact.
> > >
> > > 1 anon page scanned for every N file pages scanned?
> > >
> > > Is N a _huge_ enough ratio so that the anon list will be very light scanned?
> > >
> > > Rik: here is a little background.
> >
> > The problem is, the VM are couteniously discarding no longer used file
> > cache. if we are scan extra anon 1 page, we will observe tons swap usage
> > after few days.
> >
> > please don't only think benchmark.
>
> OK the days-of-streaming-io typically happen in file servers. Suppose
> a file server with 16GB memory, 1GB of which is consumed by anonymous
> pages, others are for page cache.
>
> Assume that the exact file:anon ratio computed by the get_scan_ratio()
> algorithm is 1000:1. In that case percent[0]=0.1 and is rounded down
> to 0, which keeps the anon pages in memory for the few days.
>
> Now with Shaohua's patch, nr[0] = (262144/4096)/1000 = 0.06 will also
> be rounded down to 0. It only becomes >=1 when
> - reclaim runs into trouble and priority goes low
> - anon list goes huge
>
> So I guess Shaohua's patch still has reasonable "underflow" threshold :)

Again, I didn't said his patch is no worth. I only said we don't have to
ignore the downside.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Wu Fengguang on
On Tue, Apr 06, 2010 at 10:58:43AM +0800, KOSAKI Motohiro wrote:
> > On Tue, Apr 06, 2010 at 10:06:19AM +0800, KOSAKI Motohiro wrote:
> > > > On Tue, Apr 06, 2010 at 09:25:36AM +0800, Li, Shaohua wrote:
> > > > > On Sun, Apr 04, 2010 at 10:19:06PM +0800, KOSAKI Motohiro wrote:
> > > > > > > On Fri, Apr 02, 2010 at 05:14:38PM +0800, KOSAKI Motohiro wrote:
> > > > > > > > > > > This patch makes a lot of sense than previous. however I think <1% anon ratio
> > > > > > > > > > > shouldn't happen anyway because file lru doesn't have reclaimable pages.
> > > > > > > > > > > <1% seems no good reclaim rate.
> > > > > > > > > >
> > > > > > > > > > Oops, the above mention is wrong. sorry. only 1 page is still too big.
> > > > > > > > > > because under streaming io workload, the number of scanning anon pages should
> > > > > > > > > > be zero. this is very strong requirement. if not, backup operation will makes
> > > > > > > > > > a lot of swapping out.
> > > > > > > > > Sounds there is no big impact for the workload which you mentioned with the patch.
> > > > > > > > > please see below descriptions.
> > > > > > > > > I updated the description of the patch as fengguang suggested.
> > > > > > > >
> > > > > > > > Umm.. sorry, no.
> > > > > > > >
> > > > > > > > "one fix but introduce another one bug" is not good deal. instead,
> > > > > > > > I'll revert the guilty commit at first as akpm mentioned.
> > > > > > > Even we revert the commit, the patch still has its benefit, as it increases
> > > > > > > calculation precision, right?
> > > > > >
> > > > > > no, you shouldn't ignore the regression case.
> > > >
> > > > > I don't think this is serious. In my calculation, there is only 1 page swapped out
> > > > > for 6G anonmous memory. 1 page should haven't any performance impact.
> > > >
> > > > 1 anon page scanned for every N file pages scanned?
> > > >
> > > > Is N a _huge_ enough ratio so that the anon list will be very light scanned?
> > > >
> > > > Rik: here is a little background.
> > >
> > > The problem is, the VM are couteniously discarding no longer used file
> > > cache. if we are scan extra anon 1 page, we will observe tons swap usage
> > > after few days.
> > >
> > > please don't only think benchmark.
> >
> > OK the days-of-streaming-io typically happen in file servers. Suppose
> > a file server with 16GB memory, 1GB of which is consumed by anonymous
> > pages, others are for page cache.
> >
> > Assume that the exact file:anon ratio computed by the get_scan_ratio()
> > algorithm is 1000:1. In that case percent[0]=0.1 and is rounded down
> > to 0, which keeps the anon pages in memory for the few days.
> >
> > Now with Shaohua's patch, nr[0] = (262144/4096)/1000 = 0.06 will also
> > be rounded down to 0. It only becomes >=1 when
> > - reclaim runs into trouble and priority goes low
> > - anon list goes huge
> >
> > So I guess Shaohua's patch still has reasonable "underflow" threshold :)
>
> Again, I didn't said his patch is no worth. I only said we don't have to
> ignore the downside.

Right, we should document both the upside and downside.

The main difference happens when file:anon scan ratio > 100:1.

For the current percent[] based computing, percent[0]=0 hence nr[0]=0
which disables anon list scan unconditionally, for good or for bad.

For the direct nr[] computing,
- nr[0] will be 0 for typical file servers, because with priority=12
and anon lru size < 1.6GB, nr[0] = (anon_size/4096)/100 < 0
- nr[0] will be non-zero when priority=1 and anon_size > 100 pages,
this stops OOM for Shaohua's test case, however may not be enough to
guarantee safety (your previous reverting patch can provide this
guarantee).

I liked Shaohua's patch a lot -- it adapts well to both the
file-server case and the mostly-anon-pages case :)

Thanks,
Fengguang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/