From: Peter J. Holzer on
On 2010-07-31 23:40, Ilya Zakharevich <nospam-abuse(a)ilyaz.org> wrote:
> On 2010-07-31, Peter J. Holzer <hjp-usenet2(a)hjp.at> wrote:
>>> E.g., Generally speaking, on "well-designed" 32-bit architecture, one
>>> would be able to address 8GB of memory (4GB of data, and 4GB of
>>> code).
>
>> And we could split off the stack into a different segment, too and then
>> address 12 GB of memory.
>
> Not with C.

I used to think so, but it's not quite true.

> The same subroutine should accept stack data pointers and
> heap data pointers.

Yes, but

* automatic variables don't have to be on the "hardware stack".
* only those variables which actually are accessed via pointer
need to be in a pointer-accessible space.

So a compiler could put stuff like

* return addresses
* non-array automatic variables which don't have their address taken
* function arguments and return values
* temporary variables

into the stack segment and automatic variables which do have their
address taken into the data segment.

Among other things this means that return addresses are not accessible
with a pointer and can't be overwritten by a buffer overflow. It also
means that the size of the stack segment will almost always be very
small (arrays will (almost) never be there) and that function call and
return are more expensive (you need to maintain a second "stack").
So I'm not sure whether the advantages outweigh the disadvantages.

But that's moot. I don't expect any new segmented architectures, and
existing ones are either obsolete or used in "flat" mode.

>> the code also much, much smaller than 4GB
>
> This is not what I experience with my system (which has less than 4GB
> memory, though). The monsters like Mozilla take more text memory that
> data memory (unless one loads a LOT of HTML into the browser).

Mozilla is a monster, but it still uses only about 40 MB of code memory,
which is about 1% of 4 GB:

% perl -e 'while (<>) { my ($b, $e, $p) = /^(\w+)-(\w+) (\S+)/; $s =
hex($e) - hex($b); $s{$p} += $s } for (keys %s) { printf "%s %9d\n",
$_, $s{$_} / 1024 }' /proc/18752/maps |sort -n -k 2
---p 64
rwxp 192
r--s 496
rw-s 768
r-xp 40656
r--p 98524
rw-p 279960

(this is firefox 3.6 running on 32bit linux for about a day)

So if you moved that code into a different segment, you could use 4GB
instead of 3.96GB for data. Doesn't seem like much of an improvement.
(especially if you consider that on most 32-bit OSs the address space is
limited to 2 or 3 GB anyway - lifting that limit would have a much
larger effect).


>> I see the smiley but I'd like to clarify for our young readers that
>> 32bit Linux uses near pointers. On the 386, a far pointer would be 48
>> bits.
>
> ... but only if you round up to a multiple of 8bit; otherwise 46bit.
>
>>> [*] AFAIK, Solaris (tries to?) separate code from data AMAP. On
>>> Solaris/i386, are they in different segments?
>
>> I don't think so. Sun likes Java. Java uses JIT compilers. JIT compilers
>> and separated address spaces for code and data don't mesh well.
>
> Do not see how this would be related.

A JIT compiler needs to generate executable code which is immediately
executed by the same process. This is hard to do if the JIT compiler
can't put the code into a place where it can be executed.

> AFAI suspect (basing on the sparse info I have seen), the only way to
> load code on solaris is to write an executable module on disk, and
> dlopen() it.

That would absolutely kill the performance of a JIT compiler. If
Solaris/x86 uses separate code and data segments (which I doubt) then
there is probably some way (maybe with mmap) to map a region of memory
into both the data and the code segment. More likely they use a common
address space and just use mprotect to prevent execution of data which
isn't meant to be code.

hp

From: Peter J. Holzer on
On 2010-08-01 23:13, Ilya Zakharevich <nospam-abuse(a)ilyaz.org> wrote:
> On 2010-08-01, Peter J. Holzer <hjp-usenet2(a)hjp.at> wrote:
>> Mozilla is a monster, but it still uses only about 40 MB of code memory,
>> which is about 1% of 4 GB:
>
> I suspect your system has 4K virtual address space granularity.

Yes.

> Mine has 64K.

So that would increase the average internal fragmentation per code
region from 2 kB to 32 kB (half the granularity - of course that depends
on the size distribution but its good enough for a back of the envelope
calculation). On Linux Firefox maps 132 code regions into memory (the
GNOME people have a serious case of shared-libraryritis). So that's 132
* (32 kB - 2 kB) = 3960 kB or about 4 MB more. Noticable but probably
less than the effects of other differences between OS/2 and Linux.

> What is important is the ratio of data/text.

No. What is important is the ratio between code and the usable address
space.

> In your case, it is less than 10. (With more memory, you run more of
> OTHER monsters. ;-)

Yes, but those other monsters get their own virtual address space, so
they don't matter in this discussion.


>> instead of 3.96GB for data. Doesn't seem like much of an improvement.
>> (especially if you consider that on most 32-bit OSs the address space is
>> limited to 2 or 3 GB anyway - lifting that limit would have a much
>> larger effect).
>
> No, the effect would be the opposite: 40M/2G is LARGER than 40M/4G. ;-)

No, you misunderstood. If you now have an address space of 2 GB for
code+data, and you move the code to a different segment, you win 40MB
for data. But if the OS is changed to give each process a 4 GB address
space, then you win 2 GB, which is a lot more than 40 MB.

hp

From: Peter J. Holzer on
On 2010-08-02 21:19, Ilya Zakharevich <nospam-abuse(a)ilyaz.org> wrote:
> On 2010-08-02, Peter J. Holzer <hjp-usenet2(a)hjp.at> wrote:
>>> What is important is the ratio of data/text.
>>
>> No. What is important is the ratio between code and the usable address
>> space.
>
> I see (below) that we discuss different scenarios.
>
>>> In your case, it is less than 10. (With more memory, you run more of
>>> OTHER monsters. ;-)
>
>> Yes, but those other monsters get their own virtual address space, so
>> they don't matter in this discussion.
>
> They do on OS/2: the DLL's-related memory is loaded into shared
> address region. (This way one does not need any "extra"
> per-process-context patching or redirection of DLL address accesses.)

Sounds a bit like the pre-ELF shared library system in Linux. Of course
that was designed when 16 MB was a lot of RAM and abandoned when 128 MB
became normal for a server (but then I guess the same is true for OS/2).

I'd still be surprised if anybody ran an application mix on OS/2 where
the combined code size of all DLLs exceeds 1 GB. Heck, I'd be surprised
if anybody did it on Linux (with code I really mean code - many systems
put read-only data into the text segment of an executable, but you
couldn't move that to a different address space, so it doesn't count
here).


>> No, you misunderstood. If you now have an address space of 2 GB for
>> code+data, and you move the code to a different segment, you win 40MB
>> for data. But if the OS is changed to give each process a 4 GB address
>> space, then you win 2 GB, which is a lot more than 40 MB.
>
> I do not see how one would lift this limit (without a segmented
> architecture ;-).

If you can move code to a different segment you obviously have a
segmented architecture. But even without ...

> I expect that (at least) this would make context switch majorly
> costlier...

I don't see why the kernel should need a large address space in the same
context as the running process. When both the size of physical RAM and
the maximum VM of any process could realistically be expected to be much
smaller than 4GB, a fixed split between user space and kernel space
(traditionally 2GB + 2GB in Unix, but 3GB + 1GB in Linux) made some
sense: Within a system call, the kernel could access the complete
address space of the calling process and the complete RAM without
fiddling with page tables. But when physical RAM exceeded the the kernel
space that was no longer possible anyway, so there was no longer a
reason to reserve a huge part of the address space of each process for
the kernel. But of course making large changes for a factor of at most 2
doesn't make much sense in a world governed by Moore's law, and anybody
who needed the space moved to 64 bit systems anyway.

hp