From: chaboud on
Thanks for taking a look at this. As Raymond Chen posted on this technique a
few years back, I imagine that we're not the only ones using it.

"Pavel Lebedinsky [MSFT]" wrote:
> I will also try your program here. How much RAM does your system
> have?
>

I've reproduced this behavior on a number of systems. The ones personally
used by me for development are:

- A dual Opteron 2218 running XP x64 with 8GB of RAM.
- A dual Xeon 5330 running Vista x64 with 4GB of RAM.
- A Pentium-M 1.6 (Dothan) laptop running Vista 32 with 1.5GB of RAM.

The two desktops are running nVidia graphics cards, and the laptop is
running on an old 855 GME. I've also seen this on ATI-based configurations
(pretty much ruling out the graphics card).

Thanks again,
Matt

From: Dr Pizza on
"Pavel Lebedinsky [MSFT]" wrote:

> There have been some changes in Longhorn Server and Vista SP1
> to reduce flushing of modified pages in cases such as the one you
> describe. If you have access to LH beta builds you can try the
> latest (February 2007) CTP available from http://connect.microsoft.com
> to see if it makes a difference.
Still broken.

> I will also try your program here. How much RAM does your system
> have?
I can't speak for Chaboud, but I have 4 GiB in my Vista 64 system.

Same issue.
From: chaboud on
I tried this with the LH Feb '07 CTP, and that did not correct the problem.
If this is roughly the state that the kernel will be in for Vista SP1, we'll
have to find a workaround, feature limitation for Vista, or, at the very
least, warn our users about the issue.

In the context in which we use this technique (an HD video editor) we can
see a significant performance loss, from 20fps playback in XP 32 to 5fps in
Vista 32 on the same hardware. Because of virtual-address-space size-limits
and fragmentation, it is not possible to avoid unmapping without severely
limiting the amount of usable memory for these buffers. Limiting this cache
size would lead to a performance/usability limitation, so any such change
would have to be restricted to Vista.

We'd very much like to find a clean resolution to this if at all possible.

Thanks again for your help,
Matt

"Pavel Lebedinsky [MSFT]" wrote:

> There have been some changes in Longhorn Server and Vista SP1
> to reduce flushing of modified pages in cases such as the one you
> describe. If you have access to LH beta builds you can try the
> latest (February 2007) CTP available from http://connect.microsoft.com
> to see if it makes a difference.
>
> I will also try your program here. How much RAM does your system
> have?
>
> --
> This posting is provided "AS IS" with no warranties, and confers no
> rights.
>
> "chaboud" wrote:
>
> > I've been working on a performance problem that has shown up in Vista and
> > XP
> > x64.
> >
> > When using CreateFileMapping() and MapViewOfFile() with an
> > INVALID_FILE_HANDLE to use a pagefile-backed mapping, XP 32 does not write
> > heavily to the disk to sync the modified pages to the backing store
> > (pagefile.sys) when UnmapViewOfFile() is called.
> >
> > If the backing file is an *actual* file (i.e. not the pagefile), the data
> > is
> > written back to the file asynchronously, and a disk can get hit pretty
> > hard
> > (desired behavior).
> >
> > In XP x64 and Vista, the system nails the pagefile-disk in the
> > INVALID_FILE_HANDLE case, which can be a serious performance issue when
> > moving roughly 256MB of data through file mappings per second. This will
> > lead to a sluggish system for quite some time after unmaps have ceased.
> >
> > The following code demonstrates the problem. Run on XP 32, this code will
> > merely peg a core with near-nothing-work. Run on XP x64 or Vista(32 or
> > x64),
> > it will peg a core and a disk.
>
>
>
From: chaboud on
It turns out that this is a larger problem than I previously thought.

CreateFile() with the FILE_ATTRIBUTE_TEMPORARY and FILE_FLAG_DELETE_ON_CLOSE
flags should make what Larry Osterman called a "temporary" temporary file, a
file that is never written to disk.

http://blogs.msdn.com/larryosterman/archive/2004/04/19/116084.aspx

You can find a new version of the earlier code that makes use of a file made
this way here:
http://matthew.chaboud.com/junkdump/Temporary%20Temporary%20Hit.cpp

To accentuate the problem, crank the buffer count up to something like 30.
It still shows up for me with a buffer count of 5 (total file size of 50MB
because of lazy doubling) on a machine with 4GB of RAM and only Visual
Studio, Firefox, and Outlook running (i.e. very little memory pressure).

Thanks again,
Matt

From: Pavel Lebedinsky [MSFT] on
As far as I know, no further changes are planned in this area
for SP1.

The behavior you're seeing is considered to be by design.
When you unmap a view, pages from the view are removed
from your working set and placed onto the modified page list
(assuming they are not part of any other working set). At this
point the memory manager may begin flushing them to disk.
There is no mechanism by which an application can tell the
system that some of the modified pages shouldn't be written
out because they will soon be referenced back into the
working set.

So even on XP you will get some paging IO in this case
(you can see it by monitoring Memory\Pages Output/sec
counter in perfmon). When the number of modified pages
is low you may not see any paging IOs. As it gets higher,
XP will try to write out about 100 pages every second.
Finally, when available pages (free+zeroed+standby) drop
below about 1000, the modified page writer on XP will
start writing as fast as it can.

Vista has several changes in this area. When the number of
modified pagefile pages is low it should actually write less
often than XP (this helps battery life on laptops). However,
Vista also switches to the "continuous write" mode sooner
than XP, to prevent problems when misbehaving applications
fill entire memory with dirty pages. In LH/Vista SP1 the
thresholds have been somewhat relaxed compared to
Vista RTM but they are unlikely to go back to XP levels.

I can't think of a good solution to this. On 64 bit the address
space is large enough so that you can keep everything in
the working set as long as there is enough physical memory
to back it. On 32 bit however you can have more RAM than
VM space, so unmapping becomes necessary. You could
use multiple processes to hold portions of the entire region
in their working sets, but that will likely require significant
design changes and may not be a feasible solution.

If you are not actually sharing the memory among multiple
processes (in effect, using a pagefile backed section as a
cache for a single process) then you might be able to use
AWE memory instead.

Finally, you could also write a driver to lock your shared
pages in memory so they are never paged out even if you
unmap them. This is rather ugly but might help if you have
no other options.

--
This posting is provided "AS IS" with no warranties, and confers no
rights.

"chaboud" wrote:

>I tried this with the LH Feb '07 CTP, and that did not correct the problem.
> If this is roughly the state that the kernel will be in for Vista SP1,
> we'll
> have to find a workaround, feature limitation for Vista, or, at the very
> least, warn our users about the issue.
>
> In the context in which we use this technique (an HD video editor) we can
> see a significant performance loss, from 20fps playback in XP 32 to 5fps
> in
> Vista 32 on the same hardware. Because of virtual-address-space
> size-limits
> and fragmentation, it is not possible to avoid unmapping without severely
> limiting the amount of usable memory for these buffers. Limiting this
> cache
> size would lead to a performance/usability limitation, so any such change
> would have to be restricted to Vista.
>
> We'd very much like to find a clean resolution to this if at all possible.