From: Morten Reistad on
In article <693h87-kn7.ln1(a)ntp.tmsw.no>,
Terje Mathisen <terje.mathisen(a)tmsw.no> wrote:
>Morten Reistad wrote:
>> Now, can we attack this from a simpler perspective; can we make
>> the L2-memory interaction more intelligent? Like actually make
>> a paging system for it? Paging revolutionised the disk-memory
>> systems, remember?
>
>Morten, I've been preaching these equivalences for more than 5 years:
>
>Old Mainframe: cpu register -> memory -> disk -> tape
>Modern micro: cpu register -> cache -> ram -> disk

So have I.

>Current cache-ram interfaces work in ~128-byte blocks, just like the
>page size of some of the earliest machines with paging (PDP10/11 ?).
>
>RAM needs to be accessed in relatively large blocks, since the hardware
>is optimized for sequential access.
>
>Current disks are of course completely equivalent to old tapes: Yes, it
>is possible to seek randomly, but nothing but really large sequential
>blocks will give you close to theoretical throughput.
>
>Tape is out of the question now simply because the time to do a disaster
>recovery rollback of a medium-size (or larger) system is measured in
>days or weeks, instead of a couple of hours.

We do have the problem of software bloat, so the cache keeps
thrashing. But we need to test out these ideas.

So here is a suggestion:

Have someone build a simplish, historical cpu that is reasonably
amenable to a modern implementation. The PDP11 seems like a nice
target. Then build an 8-way PDP11, including a few tens of megabytes
of "L2 cache" ram on-chip, and have the memory interface look like a
disk.

Then fire up an old unix, and measure performance. Swapping to
memory. It could also be an 80286.

Or, give us an Xeon, where the MMU can be reconfigured to handle
L2 cache as memory, and memory as disk. We could try som low
footprint OS, like OpenBSD or QNX on that, and fire up some
applications and look at the results.

There was a revolution from 1969 onwards when Dennings paper[1] was
implemented for paging instead of various other strategies for
paging. I suspect re-applying this to the on-chip static memory
vs off-chip, behind mmu dynamic memory would be a similar win.

-- mrr


[1] http://cs.gmu.edu/cne/pjd/PUBS/Workingsets.html


From: Morten Reistad on
In article <hp5254$r1m$1(a)news.eternal-september.org>,
Stephen Fuld <SFuld(a)Alumni.cmu.edu.invalid> wrote:
>On 4/2/2010 5:07 AM, Terje Mathisen wrote:
>> Morten Reistad wrote:

>While this is all, at least sort of, true, the question is what do you
>want to do about it. ISTM that the salient characteristics paging, i.e.
>memory to disk, interface are that it requires OS interaction in order
>to optimize, that the memory to disk interfaces have been getting
>narrower (i.e. SATA, Fibre Channel and serial SCSI) not wider, and that
>the CPU doesn't directly address the disk. Do you want to narrow the
>CPU's addressing range to just include the cache? Do you want the
>software to get involved in cache miss processing?

I don't want to narrow the addressing range at all. You still have
the same virtual addresses as before. It is just that we only use
"L2 cache" (on-chip, fast, static-ish memory) as "memory", and that
we use paging/swapping to address "main memory" (off-chip, high
latency dynamic memory). And that we throw the established theory
onto the problem of this bottleneck, and look what happens.

User programs should run unmodified. We need to make some drivers
for our common operating systems, and have some new hardware to
support this. It should be doable with the correct mmu.

We can then build "memory boxes" of L2 memory via hyperchannel or
similar low-latency, fast interfaces, with a hundred meg or so per
memory box. The code to handle all of this is still in the OS'es
in the Open Source world, we just have to map the usage onto new
hardware.

So, we may yet see systems with 100k page faults per second. :-/

>This is all to say that, as usual, the devil is in the details. :-(

We can build the system disks etc in memory too, but have to
be careful about having persistent storage done right.

If we think about this very carefully we can have snapshot
states of the system committed to persistent storage at
regular intervals, like every few seconds or so.

-- mrr
From: Morten Reistad on
In article <cf02372e-6462-4ef7-80e1-35996dba5bce(a)q15g2000yqj.googlegroups.com>,
Robert Myers <rbmyersusa(a)gmail.com> wrote:
>On Apr 2, 11:23�am, Stephen Fuld <SF...(a)alumni.cmu.edu.invalid> wrote:
>
>> While this is all, at least sort of, true, the question is what do you
>> want to do about it. �ISTM that the salient characteristics paging, i.e.
>> memory to disk, interface are that it requires OS interaction in order
>> to optimize, that the memory to disk interfaces have been getting
>> narrower (i.e. SATA, Fibre Channel and serial SCSI) not wider, and that
>> the CPU doesn't directly address the disk. �Do you want to narrow the
>> CPU's addressing range to just include the cache? �Do you want the
>> software to get involved in cache miss processing?
>>
>> This is all to say that, as usual, the devil is in the details. �:-(
>
>On die memory isn't yet big enough to be fussing the details, although
>I assume we will get there.
>
>The better model to look at might be graphics cards that carry a large
>enough amount of super-fast memory to be interesting as a model for
>general computation.

I don't care what you call it, GPU, CPU, mill, whatever.

We can run a decent system like Qnx, OpenBSD etc in about 6 megabytes
of ram; except it will start to make page faults when you start
to do something interesting.

If the 6 mb (or 12, or 24) of ram are superfast, have a minimal mmu,
at least capable of process isolation and address translation; and do
the "paging" to main memory, then you could run one of these minimal,
general purpose operating systems inside each gpu/cpu/whatever, and
live with the page faults. It will be several orders of magnitude
faster and lower latency than the swapping and paging we normally love
to hate. We already have that "swapping", except we call it "memory
access".

The theory is old, stable and well validated. The code is done, and
still in many operating systems. We "just need drivers".

-- mrr




From: Terje Mathisen on
Stephen Fuld wrote:
> On 4/2/2010 5:07 AM, Terje Mathisen wrote:
>> Old Mainframe: cpu register -> memory -> disk -> tape
>> Modern micro: cpu register -> cache -> ram -> disk
>>
>> Current cache-ram interfaces work in ~128-byte blocks, just like the
>> page size of some of the earliest machines with paging (PDP10/11 ?).
>>
>> RAM needs to be accessed in relatively large blocks, since the hardware
>> is optimized for sequential access.
>>
>> Current disks are of course completely equivalent to old tapes: Yes, it
>> is possible to seek randomly, but nothing but really large sequential
>> blocks will give you close to theoretical throughput.
>>
>> Tape is out of the question now simply because the time to do a disaster
>> recovery rollback of a medium-size (or larger) system is measured in
>> days or weeks, instead of a couple of hours.
>
> While this is all, at least sort of, true, the question is what do you
> want to do about it. ISTM that the salient characteristics paging, i.e.
> memory to disk, interface are that it requires OS interaction in order
> to optimize, that the memory to disk interfaces have been getting
> narrower (i.e. SATA, Fibre Channel and serial SCSI) not wider, and that
> the CPU doesn't directly address the disk. Do you want to narrow the
> CPU's addressing range to just include the cache? Do you want the
> software to get involved in cache miss processing?

Not at all!

I use my argument as a lead-in to tell programmers they had better study
the algorithms developed for 30-40 year old mainframes with limited
memory, because unless they can make their algorithms fit this model,
performance will really suffer.

I.e. I don't suggest they should try to do anything at the OS level,
rather take the performance steps as a given and work around/within
those limitations.
>
> This is all to say that, as usual, the devil is in the details. :-(

Indeed.

Terje
From: nmm1 on
In article <u8cj87-q7e.ln1(a)ntp.tmsw.no>,
Terje Mathisen <terje.mathisen(a)tmsw.no> wrote:
>Stephen Fuld wrote:
>>
>> While this is all, at least sort of, true, the question is what do you
>> want to do about it. ISTM that the salient characteristics paging, i.e.
>> memory to disk, interface are that it requires OS interaction in order
>> to optimize, that the memory to disk interfaces have been getting
>> narrower (i.e. SATA, Fibre Channel and serial SCSI) not wider, and that
>> the CPU doesn't directly address the disk. Do you want to narrow the
>> CPU's addressing range to just include the cache? Do you want the
>> software to get involved in cache miss processing?
>
>Not at all!
>
>I use my argument as a lead-in to tell programmers they had better study
>the algorithms developed for 30-40 year old mainframes with limited
>memory, because unless they can make their algorithms fit this model,
>performance will really suffer.
>
>I.e. I don't suggest they should try to do anything at the OS level,
>rather take the performance steps as a given and work around/within
>those limitations.

The thing that pisses me off is having to explain to them that they
ALSO need to take account of the design deficiencies of the less
clueful mainframe architectures and operating systems into account,
because that is the level at which modern systems map to them :-(

Stephen's points are a prime example of this. We learnt that that
was NOT how to handle virtual memory back in the 1960s, but the new
kid on the block (IBM mainframe division) wouldn't be told anything,
and things have gone downhill from there :-(

I keep being told that TLB misses aren't important, because modern
TLBs are so large, and programmers don't need to know about memory
banking designs. Yeah. Right. Now, back in the real world ....


Regards,
Nick Maclaren.