Can extra processing threads help in this case? [MFC]

Prev: Improving Pete'r Application Performance
Next: Competitors for Pet'e OCR system

From: Joseph M. Newcomer on 22 Mar 2010 22:20

See below...
On Mon, 22 Mar 2010 19:45:31 -0500, "Peter Olcott" <NoSpam(a)OCR4Screen.com> wrote:

>
>"Joseph M. Newcomer" <newcomer(a)flounder.com> wrote in
>message news:661gq5l01bg7rf539dgotj1b69fiq3re18(a)4ax.com...
>> See below...
>> On Mon, 22 Mar 2010 18:46:24 -0500, "Peter Olcott"
>> <NoSpam(a)OCR4Screen.com> wrote:
>>
>>>
>>>"Hector Santos" <sant9442(a)nospam.gmail.com> wrote in
>>>message
>>>news:exGTQtgyKHA.3884(a)TK2MSFTNGP06.phx.gbl...
>>>> Peter Olcott wrote:
>>>>
>>>>> You tell me all about pages faults, yet the process
>>>>> monitor
>>>>> reports zero page faults, and you continue to claim
>>>>> that
>>>>> its
>>>>> all about page faults, and virtual memory.
>>>>
>>>>
>>>> Its not a claim - its a fact.
>>>>
>>>>> Pages faults indicate victual memory usage right?
>>>>
>>>>
>>>> It shows when your PROCESS is asking too much the can
>>>> provide to you all in memory - it has to virtualize it.
>>>>
>>>>> A lack of page faults indicates a lack of virtual
>>>>> memory
>>>>> usage right?
>>>>
>>>>
>>>> No. If its zero or not changing and I know your process
>>>> is
>>>> not, it means that your process working set is not
>>>> demanding more than it can handle or OTHER processes
>>>> have
>>>> not chewed up memory, limiting your available memory.
>>>
>>>OK so zero page faults does not mean that virtual memory
>>>is
>>>not being used?
>> ***
>> OF COURSE virtiual memory is being used; there is NO OTHER
>> KIND OF MEMORY for a process.
>> What it means is that all of the virtual memory has
>> remained resident, something that
>> before you published this results was not something that
>> was demonstrable. You have
>> demonstrated that it is not being paged out and the pages
>> reused, at least under your test
>> scenario.
>> ***
>>>(1) YES zero page faults means that virtual memory is not
>>>active on this process
>> ****
>> But given that there is only virtual memory, you cannot
>> assert that zero page faults mean
>> it is not being used, only that the virtual pages have
>> remained in memory, a useful piece
>> of knowledge. And if you had a clue about memory-mapped
>> files, this would tell you that
>> using a named, shared segment would improver performance
>> of multiple processes using MMF
>> to get the data in, and it might also mean you wouldn't
>> see the several-minute startup
>> transient. You certainly wouldn't see it on the second or
>> higher processes.
>
>And of course you know that a second thread would work just
>fine because you know that my process is not memory
>bandwidth intensive.
***
No, I don't KNOW this. What I have said was, until you have run the experiment, YOU DON'T
KNOW EITHER! But when you have a multicore system, with complex caching going on, the
ONLY valid way to find out what is going on is to RUN REAL EXPERIMENTS and OBSERVE the
result. For example, suppose in a multithreading environment, any one thread runs 50%
slower because of inteference from the other threads. So intead of taking 100ms, suppose
it takes 150ms. BUT, for 8 threads on an 8-core machine, it means that the expected time
to completion for ANY thread is 150ms, whereas the single-threaded solution says that the
expected time of completion is 400ms (worst-case is 800ms) and for the mutlithreaded
solution, the worst case is 150ms, not 800ms. Duh! So there's interference, and it
slows the threads down, but look at the overall throughput! And you insist that you can
predict this behavior, and don't need to run any tests; and I say, your theory that you
are using to make this prediction is flawed, and from it there can never be a closed-form
solution. Since you have no idea what is really going on, the ONLY way to figure out what
is going on is to actually run a REAL EXPERIMENT. Screw your p-baked theory, you have NO
IDEA of what is REALLY going to happen. You might be absolutely right; you might be dead
wrong, but until you have run the experiment you have NO IDEA what is going to happen. I
don't, and I don't need to know how your app runs to know that I don't know the answer.
joe

>
>>
>> Perhaps you can explain whay you man by "virtual memory is
>> not active". Alas, for your
>
>If there are no pages going in and out of physical memory it
>is active in the same way that a parked car is active. Maybe
>you call it active if the engine is still running, even if
>its not going anywhere.
****
No, this is not what is meant. By "virtual memory is not active", any OS programmer would
mean "memory translation is disabled". That is nonsense. What you mean is "the virtual
memory is inducing no paging behavior" which is a COMPLETELY DIFFERENT statement.
For example, to be completely accurate, on an Intel chip, the notion that virtual memory
is active means that the low-order bit of control register CR0 is set to 1. And that is
the ONLY mode in which the chip runs when executing Windows. So "virtual memory" which
means "memory mapping" is ALWAYS in use. It is not even something you can discuss! There
IS NO OTHER OPTION! So you have to be PRECISE about these questions, or you are merely
showing off your lack of understanding.

Now, given that you have virtual memory enabled, the GDTR (global Description Table
Register) and LDTR (Local Description Table Register) must also be loaded with the correct
pointers to the page tables. There's a lot more than that going on to transition from
real mode to virtual mode, but this happens before the first pixel is displayed after the
BIOS boot screen (which is displayed by code running in Real Mode, that is, no
translation, CR0<0> == 0). So you go into virtual mode sometime in the first second after
the BIOS boot stops displaying, and you STAY there. So there is NEVER a case when
"Virtual memory is not active" once the boot process starts. You have obviously confused
the concept of virtual memory with the concept of paging. VM allows you to
"oversubscribe" physical memory. In that case, the "VM Manager" comes into play to
apportion the physical memory among the various memory maps that are each thinking they
own it. It accomplishes this by moving page contents out to disk and marking the
associated Page Table Entry (PTE) as "invalid" and encoding retrieval information for the
page in place of the translated address value. An attempt to fetch this invalid PTE
generates an access fault which the OS handles by creating an empty page frame in physical
memory (perhaps by moving an existing page out, perhaps by selecting a frame which has
already been paged out), reading the data from the paging file into that page frame, and
modifying the PTE to indicate the virtual-to-physical address translation, then
"restarting" the failed instruction all over again; this time, it works because the page
being accessed is really there. This means that three memory accesses are the minimum for
each generated memory request generated by your app; one to the first-level memory map
table and one to the second-level memory map table, and one to the translated address.

Concepts like "working set" help manage these oversubscriptions while maximizing total
system performance.

And consider the poor TLB. I've asked you to undertstand this, and you have not indicated
you understand how memory mapping works or what it costs. You can't talk as if VM is a
zero-cost option, or can be "turned off". This is why I point out that your questions are
nonsensical, and your selection of answers even more so.

joe
****

>>>
>> Joseph M. Newcomer [MVP]
>> email: newcomer(a)flounder.com
>> Web: http://www.flounder.com
>> MVP Tips: http://www.flounder.com/mvp_tips.htm
>
Joseph M. Newcomer [MVP]
email: newcomer(a)flounder.com
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm

From: Hector Santos on 22 Mar 2010 22:22

Peter Olcott wrote:

> The only thing that could
> make them less than completely moot would be proof that my
> process can not remain resident in RAM all the time.

Run a 2nd process and watch your paging go crazy and your system
degrade! Why do you think it took much longer to finish?

Your MEMORY is always VIRTUALIZE. Microsoft says so. You continue to
ignore this:

http://support.microsoft.com/kb/555223

In modern operating systems, including Windows, application
programs and many system processes *ALWAYS* reference memory using
virtual memory addresses which are automatically translated to real
(RAM) addresses by the hardware. Only core parts of the operating
system kernel bypass this address translation and use real memory
addresses directly.

Virtual Memory is always in use, *EVEN* when the memory required
by all running processes does not exceed the amount of RAM
installed on the system.

So go ahead and run a 2nd instance and why Windows do things that will
defy your logic.

--
HLS

From: Peter Olcott on 22 Mar 2010 23:26

"Hector Santos" <sant9442(a)nospam.gmail.com> wrote in message
news:OYxY7%23iyKHA.3884(a)TK2MSFTNGP06.phx.gbl...
> Peter Olcott wrote:
>
>> The only thing that could make them less than completely
>> moot would be proof that my process can not remain
>> resident in RAM all the time.
>
> Run a 2nd process and watch your paging go crazy and your
> system degrade! Why do you think it took much longer to
> finish?
>
> Your MEMORY is always VIRTUALIZE. Microsoft says so. You
> continue to ignore this:
>
> http://support.microsoft.com/kb/555223
>
> In modern operating systems, including Windows,
> application
> programs and many system processes *ALWAYS* reference
> memory using
> virtual memory addresses which are automatically
> translated to real
> (RAM) addresses by the hardware. Only core parts of the
> operating
> system kernel bypass this address translation and use
> real memory
> addresses directly.
>
> Virtual Memory is always in use, *EVEN* when the memory
> required
> by all running processes does not exceed the amount of
> RAM
> installed on the system.
>
> So go ahead and run a 2nd instance and why Windows do
> things that will defy your logic.
>
> --
> HLS

I don't care if you are right or not any more. It has
already cost me too much time. I will implement it as a
single thread that is connected to a FIFO queue. There may
be a time in the future that I test it with more than one
thread.

Thanks again for your verifiably excellent advice on how to
web enable my app by hooking it to a webserver.

From: Peter Olcott on 22 Mar 2010 23:38

Try running your process again using a std::vector<unsigned
int>
Make sure that you initialize all of this to the subscript
of the init loop.
Make sure that the process monitor shows that the amount of
memory you are allocating is the same amount that total
memory is reduced by.
Make sure that you only use 1/2 of total memory or less.
Make a not of the page fault behavior.
I will try the same thing.

"Hector Santos" <sant9442(a)nospam.gmail.com> wrote in message
news:OYxY7%23iyKHA.3884(a)TK2MSFTNGP06.phx.gbl...
> Peter Olcott wrote:
>
>> The only thing that could make them less than completely
>> moot would be proof that my process can not remain
>> resident in RAM all the time.
>
> Run a 2nd process and watch your paging go crazy and your
> system degrade! Why do you think it took much longer to
> finish?
>
> Your MEMORY is always VIRTUALIZE. Microsoft says so. You
> continue to ignore this:
>
> http://support.microsoft.com/kb/555223
>
> In modern operating systems, including Windows,
> application
> programs and many system processes *ALWAYS* reference
> memory using
> virtual memory addresses which are automatically
> translated to real
> (RAM) addresses by the hardware. Only core parts of the
> operating
> system kernel bypass this address translation and use
> real memory
> addresses directly.
>
> Virtual Memory is always in use, *EVEN* when the memory
> required
> by all running processes does not exceed the amount of
> RAM
> installed on the system.
>
> So go ahead and run a 2nd instance and why Windows do
> things that will defy your logic.
>
> --
> HLS

From: Peter Olcott on 22 Mar 2010 23:50

"Hector Santos" <sant9442(a)nospam.gmail.com> wrote in message
news:uoo4M8iyKHA.2552(a)TK2MSFTNGP04.phx.gbl...
> Peter Olcott wrote:
>
>>>
>>>> And of course you know that a second thread would work
>>>> just fine because you know that my process is not
>>>> memory bandwidth intensive.
>>>
>>> yes, we know that. The simulator, real code with shared
>>> memory and multiple threads, proved this and if you took
>>> the time to explore it, you will see for yourself.
>>>
>>
>> void Process()
>> {
>> KIND num;
>> for(int r = 0; r < repeat; r++)
>> for (WORD i=0; i < size; i++)
>> num = data[i];
>> }
>>
>> Not at all representative of my process, thus proves
>> nothing about my process.
>
>
> This is a MAXIMUM MEMORY ACCESS you can every reach. Your
> application memory access will be lese stressful.
>
>> Your process could derive pure spatial locality of
>> reference whereas mine would not.
>
>
> and I followed up with a RANDOM access memory access:
>
> void Process()
> {
> KIND num;
> for(int r = 0; r < repeat; r++)
> for (WORD i=0; i < size; i++)
> DWORD j = (rand() % size);
> num = data[j];
> }
>

rand() may be too CPU intensive to accurately represent my
process. It could be used to fill a lookup table.

> and provided all the results on that to SHOW that
> randomness, which is closer to your unpredictable memory
> access theory, produced better results. I even gave you
> some tips on using pareto's principle because I don't
> believe YOUR application is unpredictable YOU seem to
> think it is.
>
> > I do not move
>
>> to the next sequential memory location, my memory access
>> (from the cache point of view) is nearly purely random.
>
>
> See above. Again, the serialize access simulation
> represents the worst case scenario that will contradict
> your theory that there is a major bottle neck with memory
> access contention with multiple threads.
>
>> If you had a list of 10,000 memory locations that are all
>> very far from each other, then your process would
>> approximate mine.
>
>
> The simulator had MAXULONG/6 items of DWORD (4 bytes)
> array, ~1`.4GB for a 2GB machine which is 75% of memory
> capacity, you only have a 50% memory need - FOR 1 PROCESS.
> So this simulator is FAR worst case than yours for MEMORY
> ACCESS.
>
>> You might also look at the generated code, the optimizer
>> tends to eliminate code such as your test case.
>
>
> Not the case here, and EVEN THEN, there is still 10 loops,
> MAXULONG/6 items accessed.
>
> The FACT is, it is being read because the MEMORY LOAD and
> the working set increases.
>
> The bottom line the code shows your process is scalable
> when coded properly to leveraged the technology in the
> Windows OS with multi-core hardware.
>
> Your design presumptions that it is memory bound for
> multi-thread processing was incorrect.
>
> --
> HLS

First | Prev | Next | Last
Pages: 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
Prev: Improving Pete'r Application Performance
Next: Competitors for Pet'e OCR system