From: Peter Olcott on

"Pete Delgado" <Peter.Delgado(a)> wrote in message
> "Peter Olcott" <NoSpam(a)> wrote in message
> news:q9mdncghSqVDTTbWnZ2dnUVZ_rCdnZ2d(a)
>> You have to look for additional page faults after all of
>> the data is loaded, that is why I explicitly put a little
>> Hit any key, pause in.
> You didn't answer any of my questions...
> -Pete

You aren't paying attention.

> Are you:
> a) Running under the debugger by any chance?
> b) Allowing the system to hibernate during your 12 hour
> run?
> c) Doing anything special to lock the data in memory?

No, No, No

From: Joseph M. Newcomer on
See below...
On Thu, 25 Mar 2010 14:09:06 -0500, "Peter Olcott" <NoSpam(a)> wrote:

>> ****
>> Rubbish. You told us you had something like 27,000 page
>> faults while you were loading
>> your data! And you have completley missed the point here,
>> which is the page faults are
>> AMORTIZED over ALL processes! So if you have 8
>> processes, it is like each page fault
>Nope, the 27,000 pages faults were just for my process. I
>don't care about page faults when the data is loaded. I want
>to (and apparently have) eliminate page faults during OCR
Duh! Didn't we just tell you that memory-mapped files DO THIS FOR YOU! But you have your
fixation that it doesn't work this way (and we know how it DOES work, but you won't
listen), so nothing we can tell you seems to have any impact.
>> counts as 1/8 of a page fault. And they only happen
>> during the loading phase, which
>> doesn't change anything!
>> ****
>>>It continues to work (in practice) the way that I need it
>>>work, and I have never seen it work according to Joe's
>>>theories. Whenever there is plenty of excess RAM (such as
>>>GB more than anything needs) there are no page-faults in
>>>process. I even stressed this out a lot and had four
>>>processes taking 1.5 GB each (of my 8 GB) and still zero
>>>page faults in any of the four processes.
>> ****
>> I don't have theories. I am talking about practice. You
>> talk about having "no" page
>> faults, but do you know if those pages have been written
>> to the paging file? No, you
>> don't. And with a MMF, you will quickly converge on the
>> same zero page faults; by what
>So if I end up at the same place, then it makes no sense to
>learn another way to end up at the same place.
Hmmm. If I have two approaches that generate a million dollars, and one generates a
million dollars and stops, and the other generates a million dollars and continues to do
so, there is no reason to learn the second method, because the first method and the second
method both generate a million dollars!

What part of "scalability" did you fail to comprehend? You are saying that if you have
one process, and one thread, and it generates N page faults, and the other method
generates N page faults, then since N==N, there is nothing to be gained by the second

But we have REPEATEDLY told you that if you have multiple processes,, say K processes,
your method ALWAYS generates K*N page faults, and each process uses B + N pages, and our
suggstion generates N page faults indpependent of K. Each process uses EFFECTIVELY B +
N/p pages where p is the number of processes (the value N is amortized over all
processes). .And in the single-process single-thread case, it generates M page faults
where M < N. Is this so hard to comprehend? It involves second-grade arithmetic.

Joseph M. Newcomer [MVP]
email: newcomer(a)
MVP Tips:
From: Joseph M. Newcomer on
See below...
On Thu, 25 Mar 2010 14:10:48 -0500, "Peter Olcott" <NoSpam(a)> wrote:

>> And what did you miss about "scalability"? Oh, that;s
>> right, you will just throw more
>> hardware at it. And rely on your ISP to provide
>> load-balancing. Have you talked to them
>> about how they do load-balancing when you have multiple
>> servers?
>> joe
>My whole focus was to leverage memory to gain speed.
If your only tools is a hammer, all your problems look like nails. Guess what: we tried
to explain to you how to leverage memory to gain speed, using a different perspective, but
you don't want to pay attention to us. Multithreaded single process and
memory-mapped-file multiprocess BOTH leverage memory usage to gain speed. This is the
consequence of deep understanding of reality.
Joseph M. Newcomer [MVP]
email: newcomer(a)
MVP Tips:
From: Peter Olcott on

"Joseph M. Newcomer" <newcomer(a)> wrote in
message news:5blnq511d9covqn67717b0arbrpd6bng3g(a)
> See below...
> On Thu, 25 Mar 2010 10:09:29 -0500, "Peter Olcott"
> <NoSpam(a)> wrote:
>>"Joseph M. Newcomer" <newcomer(a)> wrote in
>>message news:agqmq5hvh7d7e99ekhbrjp1snta9hm630p(a)
>>> See below...
>>> On Thu, 25 Mar 2010 00:07:00 -0500, "Peter Olcott"
>>> <NoSpam(a)> wrote:
>>>>"Joseph M. Newcomer" <newcomer(a)> wrote in
>>>>> SEe below...
>>>>> On Tue, 23 Mar 2010 15:53:36 -0500, "Peter Olcott"
>>>>> <NoSpam(a)> wrote:
>>>>>>> Run a 2nd instance and you begin to see faults. You
>>>>>>> saw
>>>>>>> that. You proved that. You told is that. It is why
>>>>>>> this
>>>>>>> thread got started.
>>>>>>Four instances of 1.5 GB RAM and zero page faults
>>>>>>data is loaded.
>>>>>>You never know a man with a billion dollars in the
>>>>>>might panic and sell all of his furniture just in case
>>>>>>loses the billion dollars and won't be able to afford
>>>>>>his electric bill.
>>>>> ****
>>>>> There are people who behave this way. Custodial care
>>>>> and
>>>>> psychoactive drugs (like
>>>>> lithium-based drugs) usually help them. SSRIs
>>>>> sometimes
>>>>> help (selective serotonin
>>>>> reuptake inhibitors). I don't know what an SSRI or
>>>>> lithium equivalent is for an app that
>>>>> becomes depressed.
>>>>Ah so then paging out a process or its data when loads
>>>>RAM is still available is crazy right?
>>> ****
>>> No, lots of operating systems do it. Or did you miss
>>> that
>>> part of my explanation of the
>>It have never occurred with my process.
> ****
> And you know this because? Oh, the "Page anticipation
> counter", which is a completely
> different concept than the page fault counter (I don't
> know if there is such a thing, but
> note that pre-pageout is NOT something that causes a page
> fault, but you clearly missed
> the distinction here!)
> ****
>>> two-timer linux page-marking method?
>>> You still persist in believing your fantasies.
>>> Essentially, what the OS is doing is the euivalent of
>>> putting its money into an
>>> interest-bearing account! It is doing this while
>>> maximizing the liquidity of its assets.
>>> That isn't crazy. NOT doing it is crazy! But as
>>> operating systems programmers, we
>>If the most RAM it can possibly need is 1 GB, and it has 4
>>GB then it seems crazy to page anything out. How is this
> *****
> Because I just cited an operating system (linux) that DOES
> pre-pageout pages.
> You may call it crazy, but some designer somewhere
> discovered empirically that it improves
> overall system performance. The difference is all you
> have is a p-baked opinion (for p <
> 0.1) of what you think an operating system should do, and
> the people who did this have
> real data that supports this as a strategy.
> ****
>>> learning this in the 1970s. We even wrote papers about
>>> it. And books. I not only read
>>> those papers and books, I helped write some of them.
>>> You
>>> will find me acknowledged in
>>> some of them.
>>Sure and back then 64K was loads of RAM. I worked on an
>>application that calculated the water bills for the City
>>Council Bluffs IA, on a machine with 4K RAM.
> ****
> 64K was loads of RAM on small machines. But in that era,
> I was working on machines that
> had between 1 and 8 megabytes of memory (when you can
> afford to spend $250,000 on your
> memory add-on, you can put LOTS of memory on your
> computer). Sadly, you are confusing toy
> personal computers with real mainframes. Sorry, I was
> there. Our IBM/360/67 (1967) had

Not a toy, not a mainframe either, 4K RAM.

> 8MB of RAM, our first DECSYSTEM-10 (1969) had 1MB of RAM
> (well, 256K WORDS, which is how
> we measured it then). The first 4Meg memory system that
> I was responsible for budying
> required a forklift to remove it from the truck, and cost
> about $300,000. (Ampex add-on
> for a DECSystem-20, 1982). Actually, that was 4Meg WORDS,
> or about 16MB. So I really
> don't care about 4K systems (e.g., the IBM 1401/1440
> series entry-level systems; ours had
> 16K) or toys that ran DR-DOS. Because we had real systems
> with real memory, and they were
> paged, we had to build real software that ran on them.
> And the paging disks were DEAD
> SLOW compared to modern hard drives (50ms seek time for
> 1-cylinder seek, for example); DEC
> RP02 drives that held 40MB or RP03s that held 80MB, not to
> mention the klunky 2MB drives
> on PDP-11s. So we HAD to squeeze every iota of
> performance out of these systems, and I
> spent a nontrivial part of my life figuring out how to do
> this. I even wrote a
> performance tool that could accurately measure page
> transitions on code so we could link
> functions so the calls happened in the same page. I
> rewrote it for MS-DOS with extended
> memory and code paging back in 1987. So don't tell me I
> don't understand virtual memory
> or how it works. I've been doing it since 1967. TSS/360,
> OS/360, DECSystem-10,
> DECSYstem-20, Mac, Apollo under the Aegis system,
> Vax/VMS,PDP-11, including PDP-11s under
> our Hydra multiprocessor operating system (which I helped
> write) and getting reports on
> the PDP-11s under STAROS (a NUMA-based system of 128
> Micro-11s and its competing project
> whose name I forget), x86s under MS-DOS, Win16, and Win32;
> Unix on a variety of platforms
> (PDP-11, Vax, RISC/6000 AIX, I've seen a LOT of
> different algorithms for paging, and
> studied a lot more, and Peter Denning pretty much nailed
> it in his classic paper on
> paging; and Lazlo Belady's famous LRU paper. not to
> mention the more recent work done by
> IBM Research, John Ousterhout's work on file systems (John
> and I were students together at
> CMU) or the work of Mahadev Satyanarayanan at CMU on the
> Andrew File system and his
> successor work.
> (Satya and I were also students together at CMU, and I
> used AFS in 1990-1991). But never
> mind, you know a lot more about this because you say you
> THINK about ideal behavior but
> never, ever actually DO anything. Note that all the
> people I'm citing here did REAL
> experiments and collected REAL data, then fed this back
> into the operating system design
> process. But why should you believe them; they only are
> the key players in the history of
> the field, people who set out the design goals that all
> modern operating systems and file
> systems now use? What could they POSSIBLY have learned by
> collecting REAL data instead of
> just sitting thinking about what they might have read in
> some book on operating system
> design?
> joe
> ****
>>> Sadly, you persist in believing what you want to believe
>>> instead of understanding how real
>>> systems work.
>>> joe
>>> ****
>>> Joseph M. Newcomer [MVP]
>>> email: newcomer(a)
>>> Web:
>>> MVP Tips:
> Joseph M. Newcomer [MVP]
> email: newcomer(a)
> Web:
> MVP Tips:

From: Joseph M. Newcomer on
See below...
On Thu, 25 Mar 2010 10:12:56 -0500, "Peter Olcott" <NoSpam(a)> wrote:

>"Joseph M. Newcomer" <newcomer(a)> wrote in
>message news:00rmq5hctllab7ursv8q64pq5eiv8s82ad(a)
>> See below...
>> On Thu, 25 Mar 2010 00:01:37 -0500, "Peter Olcott"
>> <NoSpam(a)> wrote:
>>>"Joseph M. Newcomer" <newcomer(a)> wrote in
>>>message news:rdqlq5dv2u8bh308se0td53rk7lqmv0bki(a)
>>>> Make sure the addresses are completely independent of
>>>> where the vector appears in memory.
>>>> Given you have re-implemented std::vector (presumably as
>>>> peter::vector) and you have done
>>>> all the good engineering you claim, this shouldn't take
>>>> very much time at all. Then you
>>>> can use memory-mapped files, and share this massive
>>>> footprint across multiple processes,
>>>> so although you might have 1.5GB in each process, it is
>>>> the SAME 1.5GB because every
>>>> process SHARES that same data with every other process.
>>>> Seriously, this is one of the exercises in my Systems
>>>> Programming course; we do it
>>>> Thursday afternoon.
>>>> joe
>>>But all that this does is make page faults quicker right?
>>>Any page faults at can only degrade my performance.
>> ***
>> Denser than depleted uranium. Fewer page faults, quicker.
>> For an essay, please explain
>> in 500 words or less why I am right (it only requires
>> THINKING about the problem) and why
>> these page faults happen only ONCE even in a multiprocess
>> usage! Compare to the ReadFile
>> solution. Compare and contrast the two approaches. Talk
>> about storage allocation
>> bottlenecks.
>> I'm sorry, but you keep missing the point. DId you think
>> your approach has ZERO page
>> faults? You even told us it doesn't!
>I was making a conservative estimate, actual measurement
>indicated zero page faults after all data was loaded, even
>after waiting 12 hours.
And a memory-mapped file would not show the same performance? You know this HOW?
>> Why do you think a memory-mapped file is going to
>> be different? Oh, I forgot, you don't WANT to understand
>> how they work, or how paging
>> works!
>Not if testing continues to show that paging is not

Your method does not scale; our suggestions give you scalability, maximize throughput, and
probably makes it possible to meet your wonderful 500ms goal consistently.

>> joe
>> ****
>> Joseph M. Newcomer [MVP]
>> email: newcomer(a)
>> Web:
>> MVP Tips:
Joseph M. Newcomer [MVP]
email: newcomer(a)
MVP Tips: