From: Geoff on
On Sun, 21 Mar 2010 10:31:19 -0500, "Peter Olcott"
<NoSpam(a)OCR4Screen.com> wrote:

>
>"Geoff" <geoff(a)invalid.invalid> wrote in message
>news:7egbq5pif7se5jmkbc0idv3hg6umlt0uo8(a)4ax.com...
>> On Sun, 21 Mar 2010 00:16:08 -0500, "Peter Olcott"
>> <NoSpam(a)OCR4Screen.com> wrote:
>>
>>>Given (as in Geometry) as an immutable premise (to be
>>>taken
>>>as true even if it is false) that my process takes
>>>essentially ALL of the memory bandwidth, then within the
>>>specific context of this immutable premise (to be taken as
>>>true even if it is false) could adding another such
>>>process
>>>speed things up or would it slow them down?
>>>
>>
>> A thread is not a process.
>>
>>>> What I am saying is this, suppose you have 10 lines of
>>>> DFA
>>>> C code, the compiler creates OP CODES for these 10
>>>> lines.
>>>> Each OP CODE has a fixed frequency cycle. When the
>>>> accumulated frequency reaches a QUANTUM (~15ms), you
>>>> will
>>>> get a context switch - in other words, your code is
>>>> preempted (stop), swapped out, and Windows will give all
>>>> other threads a change to run.
>>>
>>>This does not occur on my quad-core machine. I
>>>consistently
>>>get every bit of all of the CPU cycles of a single core.
>>
>> The other three cores sit idle.
>> You are getting 100% use of 25% of the machine capacity.
>
>Also when I add another process (I add a process rather than
>a thread because the code can already do this without
>changes) the sum of both processes becomes substantially
>less then the prior single process. The prior single process
>took 25% of the CPU time, the two processes now take 11 + 7
>= 19% of the CPU time.

The two processes contend for memory and/or other resources and the
system thrashes between them. Absolutely not a valid test of what
multithreading can do for your application. You have not made an
objective measurement.

>
>Even though these are processes rather than threads, from
>what I understand of the difference between them this test
>would tend to indicate that adding another thread would have
>comparable results.
>

Then you misunderstand threading technology. If your application is
running "a single tight loop" as you claim then spawning a new thread
for each core in your system with shared memory would allow SMP to run
them with a massive improvement in response time per request or to
allow your process to analyze 4 items from the queue instead of one at
a time. You are wasting 3/4 of your machine.

It is very clear to me that you are not going to be convinced by any
discussion and you are not inclined to make changes to your
"optimized" code, therefore it's a waste of time to discuss this with
you further.
From: Hector Santos on
Peter Olcott wrote:

> "Hector Santos" <sant9442(a)nospam.gmail.com> wrote in message
>
> I am going for a second opinion from another sets of groups.
> I made a copy of this other post and posted it to this
> forum. I can't spend the time reading up on a whole lot of
> different things that would seem to be moot at this point.


Make sure you describe your application and you will see any tell
multi-thread expert or experienced engineer in this area will ya the
same things. You have an artificial limit based on a 4GB RAM loading
which you don't get anyway and clearly, you won't get any benefits by
duplicating this 4GB RAM without any sharable. You just can't use
that DOUBLE loading of redundant memory as any basis to suggest that
multi-core/processor and multi-threads machines does not help these
high performance needs. Just no comparison.

But....


> I am convinced that an HTTP based webserver directly hooked
> to my code is the best approach to making my app web
> enabled,


If that is what you want, that fine, and it can work too - when done
right. But with a FIFO Single Task Handling (FSTH) approach, you need
to also realize the engineering constraints it creates too and how
more complicated your design and cost will be.

What you don't realize that even with a FSTH, you have to choice but
to take into account thread synchronization and reader/writer design
methods anyway. It is isn't as simple and plug a non=thread ready OCR
program into a thread-ready server. But thats just a matter of having
a thread-safe reader/writer ready FIFO collection class, then again,
giving that your loading will probably be very low anyway, you can
probably choose not to consider this and use a "cross my fingers"
approach that you don't run into a corruption or pointer conflict
based on random incidences and timing that might never occur for you.

Our you can learn the C/C++ design principles for thread designs like
hundreds of thousands of people, if not millions, have done, I guess
they knew it had a benefit, code it right and all these issues are now
moot - you can do what you want above and the issues are resolves and
yet at the same thing reduce your TCO by offering some level of
scalability and top customer service.

--
HLS
From: Hector Santos on
Geoff wrote:

>> Even though these are processes rather than threads, from
>> what I understand of the difference between them this test
>> would tend to indicate that adding another thread would have
>> comparable results.
>>
>
> Then you misunderstand threading technology. If your application is
> running "a single tight loop" as you claim then spawning a new thread
> for each core in your system with shared memory would allow SMP to run
> them with a massive improvement in response time per request or to
> allow your process to analyze 4 items from the queue instead of one at
> a time. You are wasting 3/4 of your machine.
>
> It is very clear to me that you are not going to be convinced by any
> discussion and you are not inclined to make changes to your
> "optimized" code, therefore it's a waste of time to discuss this with
> you further.


You knows funny about this, about this "intensive" processing and
"limits" so constraining, that in Pete's mind, no current computer
technology at any level will help, I had to see how "intensive" this
really is.

I searched, found

http://www.freeocr.net/

There is already a WEB SITE that does this:

http://asv.aso.ecei.tohoku.ac.jp/tesseract/

VERY IMPRESSIVE! WITH IMMEDIATE responses!

And I downloaded the open source OCR software "Tesseract" at google code:

http://code.google.com/p/tesseract-ocr

and it was pretty impressive in what it can do.

This thing is SO simple, any WEB MASTER can run this without any C/C++
coding, just run the already compile EXE as a CGI. I was seeing 1 to
2 seconds of processing time or less for full paragraphs of text images.

I'm sure Peter will say his patented DFA/OCR process is more intensive
and better, that it can't be run more than once!


--
HLS
From: Geoff on
On Sun, 21 Mar 2010 15:19:58 -0400, Hector Santos
<sant9442(a)nospam.gmail.com> wrote:

>Geoff wrote:
>
>>> Even though these are processes rather than threads, from
>>> what I understand of the difference between them this test
>>> would tend to indicate that adding another thread would have
>>> comparable results.
>>>
>>
>> Then you misunderstand threading technology. If your application is
>> running "a single tight loop" as you claim then spawning a new thread
>> for each core in your system with shared memory would allow SMP to run
>> them with a massive improvement in response time per request or to
>> allow your process to analyze 4 items from the queue instead of one at
>> a time. You are wasting 3/4 of your machine.
>>
>> It is very clear to me that you are not going to be convinced by any
>> discussion and you are not inclined to make changes to your
>> "optimized" code, therefore it's a waste of time to discuss this with
>> you further.
>
>
>You knows funny about this, about this "intensive" processing and
>"limits" so constraining, that in Pete's mind, no current computer
>technology at any level will help, I had to see how "intensive" this
>really is.
>
>I searched, found
>
> http://www.freeocr.net/
>
>There is already a WEB SITE that does this:
>
> http://asv.aso.ecei.tohoku.ac.jp/tesseract/
>
>VERY IMPRESSIVE! WITH IMMEDIATE responses!
>
>And I downloaded the open source OCR software "Tesseract" at google code:
>
> http://code.google.com/p/tesseract-ocr
>
>and it was pretty impressive in what it can do.
>
>This thing is SO simple, any WEB MASTER can run this without any C/C++
>coding, just run the already compile EXE as a CGI. I was seeing 1 to
>2 seconds of processing time or less for full paragraphs of text images.
>
>I'm sure Peter will say his patented DFA/OCR process is more intensive
>and better, that it can't be run more than once!

Open source, unpatented, free. Easy setup. In a variety of languages.
Beautiful.
From: Joseph M. Newcomer on
See below...

On Sat, 20 Mar 2010 16:07:03 -0500, "Peter Olcott" <NoSpam(a)OCR4Screen.com> wrote:

>
>"Hector Santos" <sant9442(a)nospam.gmail.com> wrote in message
>news:%23Xe$Z9GyKHA.2644(a)TK2MSFTNGP04.phx.gbl...
>> Geoff wrote:
>>
>>> On Sat, 20 Mar 2010 09:52:33 -0500, "Peter Olcott"
>>> <NoSpam(a)OCR4Screen.com> wrote:
>>>
>>>> Maximum total processing time is 1/10 second for a whole
>>>> page of text. My initial implementation (for testing
>>>> purposes) may simply refuse larger requests. The final
>>>> implementation will place large requests in a separate
>>>> lower priority queue.
>>>
>>> Your "memory bandwidth intensive" requirement is the
>>> bottleneck to
>>> multithreading or multiprocessing. If your big memory
>>> chunk is
>>> read-only, your problem with the DFA is that it lacks
>>> locality of
>>> reference to that data. You end up hitting the RAM
>>> instead of being
>>> able to utilize the data in the CPU caches. Multiple
>>> threads end up
>>> contending with each other for access to RAM memory,
>>> hence the
>>> slowdown. Compute-intensive applications benefit from
>>> multi-threading
>>> by being able to stay off the RAM bus and utilize the
>>> caches in each
>>> core.
>>
>>
>> Threads will benefit by reducing its context switching.
>>
>
>You missed this part:
>Multiple threads end up contending with each other for
>access to RAM memory, hence the slowdown.
***
And the part YOU missed is that the L3 cache of the i7-based machines is SHARED among all
cores, thus your notion of "contention" can be replaced with "multiple threads end up
sharing the prefeteched cached data, hence an effective speedup"

which, unless you actually MEASURED multiple threads running on multiple corers
concurrently, you have no concept of what might be actually happening with your memory.

Please stop offering opinions which have no basis in reality; until you have MEASURED the
performance (and I mean SCIENTIFICALLY MEASURED, which means you know the mean AND the
standard deviation of your measurements, across hundreds or thousdands of measurements, so
you can say "I measured the time as 3.73�0.3 minutes for N=100, and not "I did two
experiments, and they differed infinitesimally, and from that I conclude...") You are
asking use to guess what is going to happen, and I for one would not guess. I know that
only actual measured data matters. Note that you have to run a multithreaded experiment
on multiple cores to get valid data.
joe
****
>
>If you only have X cycles of memory per second, and one
>process (or thread) uses up all X cycles, adding another
>process (or thread) can only slow things down, not speed
>them up.
>
>
>> The point in all this is what we are taking Pete's poor
>> engineering and WINTEL understanding and software design
>> for his DFA as limits to under utilize the power of a
>> WINTEL QUAD 8MB Windows 7 machine.
>>
>> In other words, he really doesn't know what his boundary
>> conditions are and until he has tried to use memory mapped
>> files for his read-only, mind you, not write (minimize
>> contention you can get) font library of files, I am not
>> convinced it is a single process FIFO queue processing
>> only standalone application.
>>
>> This is a simple engineering problem with simple solution.
>> He just hasn't realize it.
>>
>> Even then, degradation does not have to be linear as he
>> suggest with each process started. The load requirements
>> per thread would be much different than it is per process
>> which is what only sees now. Thread Sharing the data
>> would prove to be highly efficient memory wise especially
>> under a multi-cpu machine. Single CPU? Context switching
>> gets in the way. Under multi-cpu, you have less context
>> switching.
>>
>>
>> --
>> HLS
>
Joseph M. Newcomer [MVP]
email: newcomer(a)flounder.com
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm