From: Paul A. Clayton on
Ken Hagan wrote:
> I'd be disappointed to lose the benefits of position-independent code
> just because some folks abuse strcpy().

What are the advantages of PIC, especially in embedded systems?
What are considered acceptable methods for supporting PIC?

(E.g., I was thinking of having PC-relative jumps/branches be
encoded with all but the sign bit as result bits. [The sign bit and
the MSb of the jump 'offset' {i.e. the last fully summed bit of the
result}
and the corresponding bit of the starting PC--these would be used to
determine if there was a carry or a borrow.] This has the advantage
that the LSbs of the inset can be sent to the ICache without need for
an intermediate addition [trivial power conservation and slight
reduction
in branch target latency]. Aside from making PIC somewhat more
complex [even more than MIPS], this has the disadvantage of slightly
reducing the range of jumps. It has the advantage over MIPS'
method of JMP and JAL [but not conditional branches] of being able
to cross segments [MIPS jumps cannot cross 256MiB 'segment'
boundaries].)


Paul A. Clayton

From: Ken Hagan on
Paul A. Clayton wrote:
> Ken Hagan wrote:
>> I'd be disappointed to lose the benefits of position-independent code
>> just because some folks abuse strcpy().
>
> What are the advantages of PIC, especially in embedded systems?

Well I was thinking of the avoidance of zillions of fix-ups whenever
shared libraries are loaded into an address space, and the consequent
reduction in paging, but actually I can't build a very convincing
argument for the platform I was thinking about, which was Win32 rather
than an embedded system.

On Win32, the standard executable format is non-PIC (*2) and so each
DLL has a preferred base address. Since nobody (*1) bothers to specify
a non-default preference when linking their DLLs, it is vanishingly
unlikely that a DLL will be loaded at its preferred address. Therefore,
every DLL gets relocated to a different place in each process.

*1: However, "nobody" is an exaggeration since MS are fairly diligent
about arranging system libraries within the large (384MB) range of
addresses that they've reserved for themselves. (I say large, because
they've only reserved 128MB for all other vendors to fight over.) Since
MS do, in practice, produce nearly all the executable code on the
average Windows box, they avoid paging stress by "force of management"
rather than PIC.

*2: In which case, perhaps MS were justified in ignoring PIC for Win32.
I note that the Linux community might have trouble exercising the same
mangerial oversight and that Linux's ELF uses PIC.
From: Eric P. on
Ken Hagan wrote:
>
> Paul A. Clayton wrote:
> > Ken Hagan wrote:
> >> I'd be disappointed to lose the benefits of position-independent code
> >> just because some folks abuse strcpy().
> >
> > What are the advantages of PIC, especially in embedded systems?

Some embedded systems have plug in ROM modules so many of
the issues I discuss below about DLL's could apply.

> Well I was thinking of the avoidance of zillions of fix-ups whenever
> shared libraries are loaded into an address space, and the consequent
> reduction in paging, but actually I can't build a very convincing
> argument for the platform I was thinking about, which was Win32 rather
> than an embedded system.
>
> On Win32, the standard executable format is non-PIC (*2) and so each
> DLL has a preferred base address. Since nobody (*1) bothers to specify
> a non-default preference when linking their DLLs, it is vanishingly
> unlikely that a DLL will be loaded at its preferred address. Therefore,
> every DLL gets relocated to a different place in each process.
>
> *1: However, "nobody" is an exaggeration since MS are fairly diligent
> about arranging system libraries within the large (384MB) range of
> addresses that they've reserved for themselves. (I say large, because
> they've only reserved 128MB for all other vendors to fight over.) Since
> MS do, in practice, produce nearly all the executable code on the
> average Windows box, they avoid paging stress by "force of management"
> rather than PIC.
>
> *2: In which case, perhaps MS were justified in ignoring PIC for Win32.
> I note that the Linux community might have trouble exercising the same
> mangerial oversight and that Linux's ELF uses PIC.

The Win32 image file format allows for a number of different
relocation mechanisms depending on your platform. However since
there is really only the x86 you are limited to what it can do.

The x86 has relative branch and call instructions so code addresses
do not need to be patched on relocation. If a called routine is
located in a dll, the linker targets the call instruction to a jump
table in the current linkage unit that trampolines to the dll routine.
The jump table is patched on dll/exe load but this limits the patching
to just a few pages. Such patched pages must be handled as CopyOnWrite,
and are potentially different for each process as each process
could relocate a section differently, but that is not a big deal.

The x86 has no program counter relative address mode (x64 does),
nor any cheap and easy way to get a copy of the current program
counter to use as a base address [1]. That makes it pretty much
impossible to have PIC data references, which forces the loader
to patch all data addresses which causes all the patched pages
to be CopyOnWrite'ed on program start up, including for dll's
that might have otherwise been shared between processes.

PC Relative data address mode would handle most data references.
However there would still be load time patches required for
any references to data values declared in external dll's.
Unfortunately this can be very frequent for C code and it runtime
library: values like errno are referenced a lot, which can mean that
almost every code page still requires a patch and is again copied.

Ideally I'd want an indirect address mode to act as a trampoline style
mechanism. The instruction format must be such that a flipping a single
bit switches between register address (register or PC + offset is the
address of the data) and register indirect (register or PC + offset
is the address of the address of the data). A compiler could blindly
generate a PC relative address. Later the linker finds the data value
is located in a separate dll so it points the instruction at a patch
table and flips the instructions' indirect bit. The loader patches
just one address in that patch table and Bob's your uncle.

As I understand it, indirect address modes cause pipelines processors
to have kittens, but it seems to me that an OoO machine should handle
it as the indirection sequencing should be isolatable to the load/store
function unit. There is only one level of indirection so there is
only one extra TLB translate for such instruction, and store->load
forwarding with replay traps would handle all the aliasing issues.

Eric

[1] The x86 can copy of its program counter into a general register
by doing a call +0 and popping the stack. However this is not
generally suitable because it buggers up return stack prediction,
and it would have to be used for every data memory reference and
thereby dramatically affect performance.

From: Stephen Sprunk on
"Eric P." <eric_pattison(a)sympaticoREMOVE.ca> wrote in message
news:45784cc2$0$1346$834e42db(a)reader.greatnowhere.com...
> Ken Hagan wrote:
>> On Win32, the standard executable format is non-PIC (*2) and so each
>> DLL has a preferred base address. Since nobody (*1) bothers to
>> specify
>> a non-default preference when linking their DLLs, it is vanishingly
>> unlikely that a DLL will be loaded at its preferred address.
>> Therefore,
>> every DLL gets relocated to a different place in each process.
>>
>> *1: However, "nobody" is an exaggeration since MS are fairly diligent
>> about arranging system libraries within the large (384MB) range of
>> addresses that they've reserved for themselves. (I say large, because
>> they've only reserved 128MB for all other vendors to fight over.)
>> Since
>> MS do, in practice, produce nearly all the executable code on the
>> average Windows box, they avoid paging stress by "force of
>> management"
>> rather than PIC.

Well, they at least produce the majority of DLLs running on the system.
A typical Win32 app may need several dozen OS DLLs just to function, and
maybe a couple of its own for maintenance reasons or plug-ins.

>> *2: In which case, perhaps MS were justified in ignoring PIC for
>> Win32.
>> I note that the Linux community might have trouble exercising the
>> same
>> mangerial oversight and that Linux's ELF uses PIC.

Note that PIC was one of the reasons Linux switched from its old COFF
format to ELF a few years ago. That was a nightmare during the
transition, but it was eventually worth it.

> The Win32 image file format allows for a number of different
> relocation mechanisms depending on your platform. However since
> there is really only the x86 you are limited to what it can do.
>
> The x86 has relative branch and call instructions so code addresses
> do not need to be patched on relocation. If a called routine is
> located in a dll, the linker targets the call instruction to a jump
> table in the current linkage unit that trampolines to the dll routine.
> The jump table is patched on dll/exe load but this limits the patching
> to just a few pages. Such patched pages must be handled as
> CopyOnWrite, and are potentially different for each process as each
> process could relocate a section differently, but that is not a big
> deal.

Worse, the Win32 dynamic linker creates lazy trampolines. When
initialized, all the trampolines do is call into the linker code to find
the real address of the function. That code then modifies the
trampoline code to call the real function directly and then jumps to it,
so that subsequent calls to the same function are more efficient (though
that hurts, rather than helps, if you only call a function once, which
is somewhat common). Self-modifying code is evil, but considering how
much lazy linking saves on start-up time, it's probably a net win.

> The x86 has no program counter relative address mode (x64 does),
> nor any cheap and easy way to get a copy of the current program
> counter to use as a base address [1]. That makes it pretty much
> impossible to have PIC data references, which forces the loader
> to patch all data addresses which causes all the patched pages
> to be CopyOnWrite'ed on program start up, including for dll's
> that might have otherwise been shared between processes.

PIC data references are definitely possible, but they're a hack. The
Linux x86 ABI uses EBX to find the GOT, and all PIC data loads are
doubly-indirected through the GOT. Yuck.

That also means that PIC burns another one of the scarce x86 GPRs, in
addition to the stack and frame pointers. This results in a lot of
spilling EBX (and sometimes EBP) to and from the stack, which further
hurts performance.

x64 has RIP-relative addressing, which solves most PIC data load
problems, and a relative wealth of GPRs, so PIC is the default for that
ABI, even for statically-linked code.

> As I understand it, indirect address modes cause pipelines processors
> to have kittens, but it seems to me that an OoO machine should handle
> it as the indirection sequencing should be isolatable to the
> load/store
> function unit. There is only one level of indirection so there is
> only one extra TLB translate for such instruction, and store->load
> forwarding with replay traps would handle all the aliasing issues.

Indirect addressing isn't so bad, though it does have problems. At
least you can predict those reasonably well and modern processors are
starting to optimize cases where aliasing may happen. They're not so
good at doubly-indirected accesses, though. Pointer chasing is murder
on even the best CPUs, and the only solution anyone's found is to use
more, less-pipelined cores and/or SMT to hide the problem.

S

--
Stephen Sprunk "God does not play dice." --Albert Einstein
CCIE #3723 "God is an inveterate gambler, and He throws the
K5SSS dice at every possible opportunity." --Stephen Hawking


--
Posted via a free Usenet account from http://www.teranews.com

From: John Dallman on
In article <el8r2e$mlk$1$8300dec7(a)news.demon.co.uk>,
K.Hagan(a)thermoteknix.co.uk (Ken Hagan) wrote:

> On Win32, the standard executable format is non-PIC (*2) and so each
> DLL has a preferred base address. Since nobody (*1) bothers to specify
> a non-default preference when linking their DLLs, it is vanishingly
> unlikely that a DLL will be loaded at its preferred address.

My employers' main product is a DLL (or so on UNIX). We used to specify
a non-default preferred DLL address on Windows. We originally did this
because of a bug in an early version of Alpha NT that happened when a
large executable reached past the default DLL load address. Yes, the
really elementary error. It stayed because the customer that noticed
felt paranoid about having it changed. But then we hit a problem.

If every DLL has the default load address, then they all get relocated,
the first one at the default load address, and the others at steadily
increasing addresses on 4KB boundaries. So at least memory is not
fragmented. If you set a non-default load address significantly above
the default, and below the MS reserved area, then you've put down a
barrier in the middle of the address space that would be used for the
heap. This makes the customers whose code uses huge chunks of memory,,
and can't use lots of small ones instead, very unhappy. So we changed
back to the default. The load-time impact is trivial, even for a 25MB
DLL.

---
John Dallman jgd(a)cix.co.uk
"Any sufficiently advanced technology is indistinguishable from a
well-rigged demo"