|
From: Paul A. Clayton on 5 Dec 2006 12:12 Ken Hagan wrote: > I'd be disappointed to lose the benefits of position-independent code > just because some folks abuse strcpy(). What are the advantages of PIC, especially in embedded systems? What are considered acceptable methods for supporting PIC? (E.g., I was thinking of having PC-relative jumps/branches be encoded with all but the sign bit as result bits. [The sign bit and the MSb of the jump 'offset' {i.e. the last fully summed bit of the result} and the corresponding bit of the starting PC--these would be used to determine if there was a carry or a borrow.] This has the advantage that the LSbs of the inset can be sent to the ICache without need for an intermediate addition [trivial power conservation and slight reduction in branch target latency]. Aside from making PIC somewhat more complex [even more than MIPS], this has the disadvantage of slightly reducing the range of jumps. It has the advantage over MIPS' method of JMP and JAL [but not conditional branches] of being able to cross segments [MIPS jumps cannot cross 256MiB 'segment' boundaries].) Paul A. Clayton
From: Ken Hagan on 7 Dec 2006 05:40 Paul A. Clayton wrote: > Ken Hagan wrote: >> I'd be disappointed to lose the benefits of position-independent code >> just because some folks abuse strcpy(). > > What are the advantages of PIC, especially in embedded systems? Well I was thinking of the avoidance of zillions of fix-ups whenever shared libraries are loaded into an address space, and the consequent reduction in paging, but actually I can't build a very convincing argument for the platform I was thinking about, which was Win32 rather than an embedded system. On Win32, the standard executable format is non-PIC (*2) and so each DLL has a preferred base address. Since nobody (*1) bothers to specify a non-default preference when linking their DLLs, it is vanishingly unlikely that a DLL will be loaded at its preferred address. Therefore, every DLL gets relocated to a different place in each process. *1: However, "nobody" is an exaggeration since MS are fairly diligent about arranging system libraries within the large (384MB) range of addresses that they've reserved for themselves. (I say large, because they've only reserved 128MB for all other vendors to fight over.) Since MS do, in practice, produce nearly all the executable code on the average Windows box, they avoid paging stress by "force of management" rather than PIC. *2: In which case, perhaps MS were justified in ignoring PIC for Win32. I note that the Linux community might have trouble exercising the same mangerial oversight and that Linux's ELF uses PIC.
From: Eric P. on 7 Dec 2006 11:07 Ken Hagan wrote: > > Paul A. Clayton wrote: > > Ken Hagan wrote: > >> I'd be disappointed to lose the benefits of position-independent code > >> just because some folks abuse strcpy(). > > > > What are the advantages of PIC, especially in embedded systems? Some embedded systems have plug in ROM modules so many of the issues I discuss below about DLL's could apply. > Well I was thinking of the avoidance of zillions of fix-ups whenever > shared libraries are loaded into an address space, and the consequent > reduction in paging, but actually I can't build a very convincing > argument for the platform I was thinking about, which was Win32 rather > than an embedded system. > > On Win32, the standard executable format is non-PIC (*2) and so each > DLL has a preferred base address. Since nobody (*1) bothers to specify > a non-default preference when linking their DLLs, it is vanishingly > unlikely that a DLL will be loaded at its preferred address. Therefore, > every DLL gets relocated to a different place in each process. > > *1: However, "nobody" is an exaggeration since MS are fairly diligent > about arranging system libraries within the large (384MB) range of > addresses that they've reserved for themselves. (I say large, because > they've only reserved 128MB for all other vendors to fight over.) Since > MS do, in practice, produce nearly all the executable code on the > average Windows box, they avoid paging stress by "force of management" > rather than PIC. > > *2: In which case, perhaps MS were justified in ignoring PIC for Win32. > I note that the Linux community might have trouble exercising the same > mangerial oversight and that Linux's ELF uses PIC. The Win32 image file format allows for a number of different relocation mechanisms depending on your platform. However since there is really only the x86 you are limited to what it can do. The x86 has relative branch and call instructions so code addresses do not need to be patched on relocation. If a called routine is located in a dll, the linker targets the call instruction to a jump table in the current linkage unit that trampolines to the dll routine. The jump table is patched on dll/exe load but this limits the patching to just a few pages. Such patched pages must be handled as CopyOnWrite, and are potentially different for each process as each process could relocate a section differently, but that is not a big deal. The x86 has no program counter relative address mode (x64 does), nor any cheap and easy way to get a copy of the current program counter to use as a base address [1]. That makes it pretty much impossible to have PIC data references, which forces the loader to patch all data addresses which causes all the patched pages to be CopyOnWrite'ed on program start up, including for dll's that might have otherwise been shared between processes. PC Relative data address mode would handle most data references. However there would still be load time patches required for any references to data values declared in external dll's. Unfortunately this can be very frequent for C code and it runtime library: values like errno are referenced a lot, which can mean that almost every code page still requires a patch and is again copied. Ideally I'd want an indirect address mode to act as a trampoline style mechanism. The instruction format must be such that a flipping a single bit switches between register address (register or PC + offset is the address of the data) and register indirect (register or PC + offset is the address of the address of the data). A compiler could blindly generate a PC relative address. Later the linker finds the data value is located in a separate dll so it points the instruction at a patch table and flips the instructions' indirect bit. The loader patches just one address in that patch table and Bob's your uncle. As I understand it, indirect address modes cause pipelines processors to have kittens, but it seems to me that an OoO machine should handle it as the indirection sequencing should be isolatable to the load/store function unit. There is only one level of indirection so there is only one extra TLB translate for such instruction, and store->load forwarding with replay traps would handle all the aliasing issues. Eric [1] The x86 can copy of its program counter into a general register by doing a call +0 and popping the stack. However this is not generally suitable because it buggers up return stack prediction, and it would have to be used for every data memory reference and thereby dramatically affect performance.
From: Stephen Sprunk on 7 Dec 2006 14:01 "Eric P." <eric_pattison(a)sympaticoREMOVE.ca> wrote in message news:45784cc2$0$1346$834e42db(a)reader.greatnowhere.com... > Ken Hagan wrote: >> On Win32, the standard executable format is non-PIC (*2) and so each >> DLL has a preferred base address. Since nobody (*1) bothers to >> specify >> a non-default preference when linking their DLLs, it is vanishingly >> unlikely that a DLL will be loaded at its preferred address. >> Therefore, >> every DLL gets relocated to a different place in each process. >> >> *1: However, "nobody" is an exaggeration since MS are fairly diligent >> about arranging system libraries within the large (384MB) range of >> addresses that they've reserved for themselves. (I say large, because >> they've only reserved 128MB for all other vendors to fight over.) >> Since >> MS do, in practice, produce nearly all the executable code on the >> average Windows box, they avoid paging stress by "force of >> management" >> rather than PIC. Well, they at least produce the majority of DLLs running on the system. A typical Win32 app may need several dozen OS DLLs just to function, and maybe a couple of its own for maintenance reasons or plug-ins. >> *2: In which case, perhaps MS were justified in ignoring PIC for >> Win32. >> I note that the Linux community might have trouble exercising the >> same >> mangerial oversight and that Linux's ELF uses PIC. Note that PIC was one of the reasons Linux switched from its old COFF format to ELF a few years ago. That was a nightmare during the transition, but it was eventually worth it. > The Win32 image file format allows for a number of different > relocation mechanisms depending on your platform. However since > there is really only the x86 you are limited to what it can do. > > The x86 has relative branch and call instructions so code addresses > do not need to be patched on relocation. If a called routine is > located in a dll, the linker targets the call instruction to a jump > table in the current linkage unit that trampolines to the dll routine. > The jump table is patched on dll/exe load but this limits the patching > to just a few pages. Such patched pages must be handled as > CopyOnWrite, and are potentially different for each process as each > process could relocate a section differently, but that is not a big > deal. Worse, the Win32 dynamic linker creates lazy trampolines. When initialized, all the trampolines do is call into the linker code to find the real address of the function. That code then modifies the trampoline code to call the real function directly and then jumps to it, so that subsequent calls to the same function are more efficient (though that hurts, rather than helps, if you only call a function once, which is somewhat common). Self-modifying code is evil, but considering how much lazy linking saves on start-up time, it's probably a net win. > The x86 has no program counter relative address mode (x64 does), > nor any cheap and easy way to get a copy of the current program > counter to use as a base address [1]. That makes it pretty much > impossible to have PIC data references, which forces the loader > to patch all data addresses which causes all the patched pages > to be CopyOnWrite'ed on program start up, including for dll's > that might have otherwise been shared between processes. PIC data references are definitely possible, but they're a hack. The Linux x86 ABI uses EBX to find the GOT, and all PIC data loads are doubly-indirected through the GOT. Yuck. That also means that PIC burns another one of the scarce x86 GPRs, in addition to the stack and frame pointers. This results in a lot of spilling EBX (and sometimes EBP) to and from the stack, which further hurts performance. x64 has RIP-relative addressing, which solves most PIC data load problems, and a relative wealth of GPRs, so PIC is the default for that ABI, even for statically-linked code. > As I understand it, indirect address modes cause pipelines processors > to have kittens, but it seems to me that an OoO machine should handle > it as the indirection sequencing should be isolatable to the > load/store > function unit. There is only one level of indirection so there is > only one extra TLB translate for such instruction, and store->load > forwarding with replay traps would handle all the aliasing issues. Indirect addressing isn't so bad, though it does have problems. At least you can predict those reasonably well and modern processors are starting to optimize cases where aliasing may happen. They're not so good at doubly-indirected accesses, though. Pointer chasing is murder on even the best CPUs, and the only solution anyone's found is to use more, less-pipelined cores and/or SMT to hide the problem. S -- Stephen Sprunk "God does not play dice." --Albert Einstein CCIE #3723 "God is an inveterate gambler, and He throws the K5SSS dice at every possible opportunity." --Stephen Hawking -- Posted via a free Usenet account from http://www.teranews.com
From: John Dallman on 7 Dec 2006 16:41
In article <el8r2e$mlk$1$8300dec7(a)news.demon.co.uk>, K.Hagan(a)thermoteknix.co.uk (Ken Hagan) wrote: > On Win32, the standard executable format is non-PIC (*2) and so each > DLL has a preferred base address. Since nobody (*1) bothers to specify > a non-default preference when linking their DLLs, it is vanishingly > unlikely that a DLL will be loaded at its preferred address. My employers' main product is a DLL (or so on UNIX). We used to specify a non-default preferred DLL address on Windows. We originally did this because of a bug in an early version of Alpha NT that happened when a large executable reached past the default DLL load address. Yes, the really elementary error. It stayed because the customer that noticed felt paranoid about having it changed. But then we hit a problem. If every DLL has the default load address, then they all get relocated, the first one at the default load address, and the others at steadily increasing addresses on 4KB boundaries. So at least memory is not fragmented. If you set a non-default load address significantly above the default, and below the MS reserved area, then you've put down a barrier in the middle of the address space that would be used for the heap. This makes the customers whose code uses huge chunks of memory,, and can't use lots of small ones instead, very unhappy. So we changed back to the default. The load-time impact is trivial, even for a 25MB DLL. --- John Dallman jgd(a)cix.co.uk "Any sufficiently advanced technology is indistinguishable from a well-rigged demo" |