From: Paul A. Clayton on
On Aug 4, 7:25 pm, MitchAlsup <MitchAl...(a)aol.com> wrote:
[snip]
> And The PTEs do have individual valid bits, but the problem was how
> not to store invalid PTEs in the TLB. IN the multi-PTE/store
> microarchitectures, you cannot avoid storing these. C-A-N-N-O-T

Umm, the original poster was referring to potentially stale invalid
PTE entries (which might generate a page fault repeat). If an unset
valid bit is interpreted as not-present (or one has an explicit
present bit), this problem is avoided. I.e., there is no need to
invalidate a TLB entry.


Paul A. Clayton
just a technophile
From: Terje Mathisen "terje.mathisen at on
Nick Maclaren wrote:
> In article<i3c4ni$m0r$1(a)news.eternal-september.org>,
> Stephen Fuld<SFuld(a)Alumni.cmu.edu.invalid> wrote:
>>
>>>> This is a reason not to have the hardware or microcode rewalk the page
>>>> tables when reporting a fault. Otherwise, you might end up having
>>>> walked the page tables 3 times:
>>>>
>>>> First, the speculative TLB miss page walk by hardware.
>>>>
>>>> Second, the non-speculative TLB miss page walk by hardware (or
>>>> microcode) when reporting the fault.
>>>>
>>>> Third, the page walk inside the OS page fault handler.
>>>
>>> It's an even better reason to abolish page faulting altogether!
>>> As posted before, it would be trivial to do at the hardware level,
>>> fairly easy to do at the software level, and seriously compromise
>>> only a very few, very perverse usages.
>>>
>>> But it still rocks the boat too much to be considered nowadays :-(
>>
>> How about a compromise where we just increase the page size? I know of
>> one system hat uses 16KB pages. This should reduce the number of page
>> faults, yet still require no application level changes and allow for
>> those few programs that really need large sparse address spaces.
>
> Not really, unfortunately, for two reasons. Firstly, most of the
> benefit comes from abolishing the need for transparent fixup of
> page faults. Secondly, increasing the page size often just increases
> the memory requirements for sparse address spaces.
>
> It's trivial to do the calculation for random address distributions,
> for many common ones, and the numbers are ugly - especially for the
> simple case of UUID values.

If the amount of memory needed per present node in a sparse setup is
significantly smaller than the page size, then paging virtual memory is
_not_ a solution to said programming problem!

In the case of present/not present UUIDs, hash tables are a much better
approach. (Semi-)random numbers/addresses, often 128 bits or more in size.
>
> Perhaps the best argument against it is that it has been tried, many
> times, and has failed every time (as a solution to this problem).
> The systems that use large pages to tackle it usually use very large
> ones (e.g. 4 MB) or variable ones, and use THEM to make certain
> segments effectively immune from page faults.

Yes.
>
>> But are page faults really a performance issue with today's larger memories?
>
> Yes. They often make it worse. The sole issue for many applications
> is the proportion of their memory that can be mapped at any one time.
> Consider any matrix method that has no known blocking form, and
> necessarily uses accesses 'both ways round' closely together. As
> soon as the matrix exceeds the size mapped by the TLB, there is a
> BIG performance problem.

One big no-no is to have a TLB structure which cannot map even the L2
cache. This has been tried and showed to fail horribly on at least one
Intel cpu (Pentium II ?) with 256 K worth of TLB (64 4K page entries)
and 512 KB of L2.

As soon as your working set passed the 256 K point, performance fell off
significantly.

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"
From: Nick Maclaren on
In article <i3cnbf$fgd$2(a)usenet01.boi.hp.com>,
Rick Jones <rick.jones2(a)hp.com> wrote:
>
>> Not really, unfortunately, for two reasons. Firstly, most of the
>> benefit comes from abolishing the need for transparent fixup of page
>> faults. Secondly, increasing the page size often just increases the
>> memory requirements for sparse address spaces.
>
>Variable page size support perhaps? A platform near and dear to my
>paycheck can go from 4KB up through 4X multiples all the way to GB.

Yup. And they can help. They still don't solve all of the problems,
but they can reduce them to a tolerable level.


Regards,
Nick Maclaren.
From: Nick Maclaren on
In article <DcmdnTlSne9be8TRnZ2dnUVZ_hGdnZ2d(a)giganews.com>,
Andy Glew <"newsgroup at comp-arch.net"> wrote:
>
>I'm sure that somebody has beaten me to this, but, let me point out that
>this is NOT a performance problem caused by page faults.
>
>It is a performance problem caused by TLB misses.
>
>Page faults should be a much smaller performance problem. To a first
>order, paging from disk almost never happens, except as part of program
>startup or cold misses to a DLL. Probably the more common form of page
>fault occurs with OS mechanisms such as COW, Copy-On-Write.

I have said this before, but I use the term in its traditional sense,
and TLB misses ARE page faults - just not ones that need data reading
from disk! They are handled by the same interrupt mechanism, for a
start. I agree with the facts of what you say, of course.

But there is another, and more fundamental, reason not to separate
them. It leads people to think incorrectly about the designs where
there are multiple levels of such things. And, if I read the tea-
leaves correctly, they may be coming back.

Let's ignore the issue that there were once systems with page tables
and no TLBs - and the original TLBs were nothing but caches for the
page tables. The horse and cart has now been replaced by a motorised
horse-box, of course.

But there were a fair number of systems like this: a page could be
in the TLB; or in could be in main memory with a page table entry,
so all that was needed was the TLB reloading; or it could be in
secondary memory, which needed a copy, the page table updating, and
the TLB reloading; or it could be on disk, which needed a call to
the swapper/pager.

The reason that I say that those might be returning is that there
are some signs of renewed interest in virtual shared memory, which
implies a very similar division. I hope that it flops, because
I regard that as a damn-fool technology, but a LOT of damn-fool
technologies have won out :-(

Related to this, technically though not logically, are the devices
and I/O designs where pages are transferred between the main process
and the I/O controller (whether that be another process or a device).
When a program gets a page fault, the page may be in main memory but
inaccessible, and the fixup is to change the ownership.


Regards,
Nick Maclaren.
From: Andy Glew "newsgroup at on
On 8/4/2010 7:20 PM, Paul A. Clayton wrote:
> On Aug 4, 7:25 pm, MitchAlsup<MitchAl...(a)aol.com> wrote:
> [snip]
>> And The PTEs do have individual valid bits, but the problem was how
>> not to store invalid PTEs in the TLB. IN the multi-PTE/store
>> microarchitectures, you cannot avoid storing these. C-A-N-N-O-T
>
> Umm, the original poster was referring to potentially stale invalid
> PTE entries (which might generate a page fault repeat). If an unset
> valid bit is interpreted as not-present (or one has an explicit
> present bit), this problem is avoided. I.e., there is no need to
> invalidate a TLB entry.

Which I think amounts to rewalking the page tables before reporting a
page fault.

Small optimization: don't treat such a hit to an invalid PTE (in a
multi-PTE sectored TLB) as a TLB miss speculatively. You might use it
as a clue that it is likely to be a page fault. But you may check it at
retirement, or someplace closer to retirement.