From: James Harris on
On 2 Aug, 16:27, MitchAlsup <MitchAl...(a)aol.com> wrote:

....

> Andy covered most of the cases: I will cover another (not pertinate to
> x86s that I know of):

Thanks to you and Andy for all the info! but I was just looking for
some *examples* of architectures where the TLB caches not-present
PTEs.

The relevance to a software engineer is whether a potentially
expensive TLB invalidation is needed when dealing with a page-not-
present fault. An unneeded invalidation should be avoided due to its
local and ongoing costs.

It's clearly not needed on later Intel and AMD CPUs but what of
earlier ones? What of Vax, Sparc, Mips, Alpha, Arm etc? (I should say
I'm not asking about all of these. A couple of examples which cache
not-present PTEs would be great.) I suspect Mips but was struggling to
understand the Mips docs I have.

Any ideas?

James
From: Paul A. Clayton on
On Aug 2, 4:37 pm, James Harris <james.harri...(a)googlemail.com> wrote:
[snip]
> The relevance to a software engineer is whether a potentially
> expensive TLB invalidation is needed when dealing with a page-not-
> present fault. An unneeded invalidation should be avoided due to its
> local and ongoing costs.
>
> It's clearly not needed on later Intel and AMD CPUs but what of
> earlier ones? What of Vax, Sparc, Mips, Alpha, Arm etc? (I should say
> I'm not asking about all of these. A couple of examples which cache
> not-present PTEs would be great.) I suspect Mips but was struggling to
> understand the Mips docs I have.
>
> Any ideas?

MIPS and Alpha both have software-controlled TLBs, though in
the case of Alpha, Privileged Architecture Library code is
executed and not a supervisor-level exception handler.

From page B3-21 of ARM Architecture Reference Manual ARMv7-A
and ARMv7-R edition (section B3.3.4, page 1295 of the pdf):
"Translation table entries that create Translation faults are not held
in the TLB, see Translation fault on
page B3-43. Therefore TLB and branch predictor invalidation is not
required for the synchronization of a
change from a translation table entry that causes a Translation fault
to one that does not."

(I.e., ARM does not load invalid PTEs into the TLB)

For the UltraSPARC IIIi it seems that software TLB fill is used
(UltraSPARC IIIi Processor User’s Manual, page 192 [pdf page 238]):
"When a non-faulting load encounters a TLB miss, the operating system
should attempt to
translate the page. If the translation results in an error, then zero
is returned and the load
completes silently."

comp.arch.embedded might have some answers for this question.



Paul A. Clayton
just a technophile

From: Andy Glew "newsgroup at on
On 8/2/2010 1:37 PM, James Harris wrote:
> On 2 Aug, 16:27, MitchAlsup<MitchAl...(a)aol.com> wrote:
>
> ...
>
>> Andy covered most of the cases: I will cover another (not pertinate to
>> x86s that I know of):
>
> Thanks to you and Andy for all the info! but I was just looking for
> some *examples* of architectures where the TLB caches not-present
> PTEs.
>
> The relevance to a software engineer is whether a potentially
> expensive TLB invalidation is needed when dealing with a page-not-
> present fault. An unneeded invalidation should be avoided due to its
> local and ongoing costs.
>
> It's clearly not needed on later Intel and AMD CPUs but what of
> earlier ones? What of Vax, Sparc, Mips, Alpha, Arm etc? (I should say
> I'm not asking about all of these. A couple of examples which cache
> not-present PTEs would be great.) I suspect Mips but was struggling to
> understand the Mips docs I have.
>
> Any ideas?
>
> James

If ever you see flakey results, on x86 or elsewhere I would strongly
suggest that you have your invalid page exception handler rewalk the
page tables to see if the page is, indeed, invalid.

--

Or maybe if you are just paranoid.


===

Heck: if you yourself can rewalk the page tables, on all machines you
can avoid the "expensive TLB invalidation".


From: Piotr Wyderski on
Andy Glew wrote:

> Heck: if you yourself can rewalk the page tables, on all machines you
> can avoid the "expensive TLB invalidation".

On the other hand, why is the TLB invalidation expensive?
There are two ways to do it, the first is via invlpg and the
other is to write to cr3. But both if them should be relatively
cheap, i.e. wait until the LSU pipe is empty and then pulse
a global edge/level reset line of the TLB subsystem. Why
isn't the reality as simple as that?

Best regards
Piotr Wyderski

From: Terje Mathisen "terje.mathisen at on
Piotr Wyderski wrote:
> Andy Glew wrote:
>
>> Heck: if you yourself can rewalk the page tables, on all machines you
>> can avoid the "expensive TLB invalidation".
>
> On the other hand, why is the TLB invalidation expensive?
> There are two ways to do it, the first is via invlpg and the
> other is to write to cr3. But both if them should be relatively cheap,
> i.e. wait until the LSU pipe is empty and then pulse
> a global edge/level reset line of the TLB subsystem. Why
> isn't the reality as simple as that?

Ouch.

Writing to CR3 to invalidate the entire TLB subsystem is _very_
expensive: Not because the operations itself takes so long, but because
you have to reload the 90+% of data which is still needed.

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"