From: Hector Santos on
Geoff wrote:

> On Sat, 20 Mar 2010 09:52:33 -0500, "Peter Olcott"
> <NoSpam(a)OCR4Screen.com> wrote:
>
>> Maximum total processing time is 1/10 second for a whole
>> page of text. My initial implementation (for testing
>> purposes) may simply refuse larger requests. The final
>> implementation will place large requests in a separate lower
>> priority queue.
>>
>
> Your "memory bandwidth intensive" requirement is the bottleneck to
> multithreading or multiprocessing. If your big memory chunk is
> read-only, your problem with the DFA is that it lacks locality of
> reference to that data. You end up hitting the RAM instead of being
> able to utilize the data in the CPU caches. Multiple threads end up
> contending with each other for access to RAM memory, hence the
> slowdown. Compute-intensive applications benefit from multi-threading
> by being able to stay off the RAM bus and utilize the caches in each
> core.


Threads will benefit by reducing its context switching.

The point in all this is what we are taking Pete's poor engineering
and WINTEL understanding and software design for his DFA as limits to
under utilize the power of a WINTEL QUAD 8MB Windows 7 machine.

In other words, he really doesn't know what his boundary conditions
are and until he has tried to use memory mapped files for his
read-only, mind you, not write (minimize contention you can get) font
library of files, I am not convinced it is a single process FIFO
queue processing only standalone application.

This is a simple engineering problem with simple solution. He just
hasn't realize it.

Even then, degradation does not have to be linear as he suggest with
each process started. The load requirements per thread would be much
different than it is per process which is what only sees now. Thread
Sharing the data would prove to be highly efficient memory wise
especially under a multi-cpu machine. Single CPU? Context switching
gets in the way. Under multi-cpu, you have less context switching.


--
HLS
From: Hector Santos on
Peter Olcott wrote:

>
> Geoff has explained this better than I have.


And I don't agree with him - not iota.

Until you redesign your software MEMORY USAGE, your current code is
not optimize for your WINTEL box or for any web service worth its
salt. You might as well get a DOS machine to reduce all Windows
overhead especially graphical overhead. Recompile your code to use
DMPI and you will be better off than what you have now.

--
HLS
From: Peter Olcott on

"Hector Santos" <sant9442(a)nospam.gmail.com> wrote in message
news:%23Xe$Z9GyKHA.2644(a)TK2MSFTNGP04.phx.gbl...
> Geoff wrote:
>
>> On Sat, 20 Mar 2010 09:52:33 -0500, "Peter Olcott"
>> <NoSpam(a)OCR4Screen.com> wrote:
>>
>>> Maximum total processing time is 1/10 second for a whole
>>> page of text. My initial implementation (for testing
>>> purposes) may simply refuse larger requests. The final
>>> implementation will place large requests in a separate
>>> lower priority queue.
>>
>> Your "memory bandwidth intensive" requirement is the
>> bottleneck to
>> multithreading or multiprocessing. If your big memory
>> chunk is
>> read-only, your problem with the DFA is that it lacks
>> locality of
>> reference to that data. You end up hitting the RAM
>> instead of being
>> able to utilize the data in the CPU caches. Multiple
>> threads end up
>> contending with each other for access to RAM memory,
>> hence the
>> slowdown. Compute-intensive applications benefit from
>> multi-threading
>> by being able to stay off the RAM bus and utilize the
>> caches in each
>> core.
>
>
> Threads will benefit by reducing its context switching.
>

You missed this part:
Multiple threads end up contending with each other for
access to RAM memory, hence the slowdown.

If you only have X cycles of memory per second, and one
process (or thread) uses up all X cycles, adding another
process (or thread) can only slow things down, not speed
them up.


> The point in all this is what we are taking Pete's poor
> engineering and WINTEL understanding and software design
> for his DFA as limits to under utilize the power of a
> WINTEL QUAD 8MB Windows 7 machine.
>
> In other words, he really doesn't know what his boundary
> conditions are and until he has tried to use memory mapped
> files for his read-only, mind you, not write (minimize
> contention you can get) font library of files, I am not
> convinced it is a single process FIFO queue processing
> only standalone application.
>
> This is a simple engineering problem with simple solution.
> He just hasn't realize it.
>
> Even then, degradation does not have to be linear as he
> suggest with each process started. The load requirements
> per thread would be much different than it is per process
> which is what only sees now. Thread Sharing the data
> would prove to be highly efficient memory wise especially
> under a multi-cpu machine. Single CPU? Context switching
> gets in the way. Under multi-cpu, you have less context
> switching.
>
>
> --
> HLS


From: Peter Olcott on

"Hector Santos" <sant9442(a)nospam.gmail.com> wrote in message
news:O%23cPxAHyKHA.984(a)TK2MSFTNGP05.phx.gbl...
> Peter Olcott wrote:
>
>>
>> Geoff has explained this better than I have.
>
>
> And I don't agree with him - not iota.
>

Let's see where Joe weighs in on this.

> Until you redesign your software MEMORY USAGE, your
> current code is not optimize for your WINTEL box or for
> any web service worth its salt. You might as well get a
> DOS machine to reduce all Windows overhead especially
> graphical overhead. Recompile your code to use DMPI and
> you will be better off than what you have now.
>
> --
> HLS

I won't be running on Wintel, I will be running on Linux
Intel. I won't need any GUI.


From: Hector Santos on
Peter Olcott wrote:

>> Threads will benefit by reducing its context switching.

>>

> You missed this part:
> Multiple threads end up contending with each other for
> access to RAM memory, hence the slowdown.


No, I didn't must this at all and its certainly not something YOU
should worry about. You are MOST definitely OVER engineering this to
an unreasonable restriction, that quite simply defies engineering
logic.

When you implement sharable atomic read only memory for multi-core
threads, you don't have write contention, you will not be swapping
here - it will be FASTER than single multiple processes running
LOADING redundant DATA BLOCKS putting MORE pressure on the system to
manage not 4GB, but 8GB of memory - OF COURSE, your system will
degrade as you seen it by just starting two EXE copies! DUH!!

And again, it is NOT going to degrade at any LINEAR rate. It is not a
lock read concept where the other thread has to WAIT until the other
thread finishes reading the memory.

You are completely, absolutely, wrong about this, and again, where are
talking about Scaling Efficiency - you can't measure that by just
starting TWO EXE copies! That is completely wrong - not when each has
to load 4GB - redundantly.

> If you only have X cycles of memory per second, and one
> process (or thread) uses up all X cycles, adding another
> process (or thread) can only slow things down, not speed
> them up.


Again, you are thinking LINEAR degradation and that is SIMPLY not the
case.

--
HLS