From: Peter Olcott on

"Hector Santos" <sant9442(a)nospam.gmail.com> wrote in message
news:uXH53tDyKHA.3304(a)TK2MSFTNGP06.phx.gbl...
> Hector Santos wrote:
>
>> Good Morning.
>>
>> Jerry is correct, you do need to step back and get
>> something done and then measure.
>>
>> But there are some basic fundamentals that you are
>> missing. I don't think it has been working perfectly -
>> just enough to satisfy some proof of concept for you. If
>> you want this to be a single thread FIFO queue processor,
>> the OCR application is not ready for a web server which
>> will spawn a thread for each request. If you are
>> measuring a 3.7 min processing time, well, not only that
>> change the paradigm of your client UI, you have a
>> 3.7*(1+Q) completion time where Q is the number of items
>> in the queue for each new request.
>>
>> You need to focus first on making sure you can scale the
>> OCR application. Making it thread ready and safe is a
>> first step.
>
> I should note that if you can't scale up (more
> threads/cpus), then you have no choice but to scale out
> (more machines).
>
> I extremely doubt that you can't run multiple threads of
> your OCR applications.
>
> --
> HLS

I already testing running two separate instances of my OCR
process on a quad core machine, each process doubled its
execution time. Since memory access time is the bottleneck,
this makes perfect sense.

Maximum total processing time is 1/10 second for a whole
page of text. My initial implementation (for testing
purposes) may simply refuse larger requests. The final
implementation will place large requests in a separate lower
priority queue.


From: Hector Santos on
Peter Olcott wrote:

> Oh now I see where you are getting the 3.7 minutes from. The
> actual maximum time per user request is 1/10 second for a
> whole page of data. Requests larger than a page of data will
> be placed into a lower priority queue. The system recognizes
> 72,000 characters per second. My OCR does not need to change
> much, it just needs to get its requests from a FIFO queue.
> The webserver will have multiple threads to append to this
> queue.
>
>> You need to focus first on making sure you can scale the
>> OCR application. Making it thread ready and safe is a
>> first step.
>
> It does not need to be thread ready, it only needs to be
> able to read from a FIFO queue.

Ok, I'll try this again (I already posted the points here but the mail
was freaking lost), try it again in simple terms:

THINK TRANSACTION TIME!

How what are your boundary conditions for a HTTP request transaction
time, both minimum and maximum?

Based on what you said below:

minimum: 100 ms per PAGE OF META DATA
maximum: DELAYED PROCESSING

Look, you are touching base with all sorts of common practice
engineering design principles in this area. How about this so you can
cut down on mail-tagging thread?

What is your functional specification, NOT TECHNICAL, functional
specification of the OCR application?

In short, you either have:

- fast *semi-interactive* client/server framework
- delayed store and forward client/server framework

This changes (or rather defines) all sorts of things. This all goes
back to my original input:

You need to work out the OCR state machine protocol.

From there, you can put together the solution you need. I had a
basic idea of what you needed from the beginning, that is why I guided
you to the CSPServer class at codeproject.com

But even then, you still need to make it all thread ready and safe,
regarding its interactive or store and forward, or fifo, lifo, or what
have you.

In other words, but you can even begin to talk having a party, you
need to get the house ready for many guest.

--
HLS
From: Peter Olcott on

"Hector Santos" <sant9442(a)nospam.gmail.com> wrote in message
news:%23gh1e5DyKHA.5292(a)TK2MSFTNGP06.phx.gbl...
> Peter Olcott wrote:
>
>> Oh now I see where you are getting the 3.7 minutes from.
>> The actual maximum time per user request is 1/10 second
>> for a whole page of data. Requests larger than a page of
>> data will be placed into a lower priority queue. The
>> system recognizes 72,000 characters per second. My OCR
>> does not need to change much, it just needs to get its
>> requests from a FIFO queue. The webserver will have
>> multiple threads to append to this queue.
>>
>>> You need to focus first on making sure you can scale the
>>> OCR application. Making it thread ready and safe is a
>>> first step.
>>
>> It does not need to be thread ready, it only needs to be
>> able to read from a FIFO queue.
>
> Ok, I'll try this again (I already posted the points here
> but the mail was freaking lost), try it again in simple
> terms:
>
> THINK TRANSACTION TIME!
>
> How what are your boundary conditions for a HTTP request
> transaction time, both minimum and maximum?
>
> Based on what you said below:
>
> minimum: 100 ms per PAGE OF META DATA
> maximum: DELAYED PROCESSING
>
> Look, you are touching base with all sorts of common
> practice engineering design principles in this area. How
> about this so you can cut down on mail-tagging thread?
>
> What is your functional specification, NOT TECHNICAL,
> functional
> specification of the OCR application?
>
> In short, you either have:
>
> - fast *semi-interactive* client/server framework
> - delayed store and forward client/server framework
>
> This changes (or rather defines) all sorts of things.
> This all goes back to my original input:
>
> You need to work out the OCR state machine protocol.
>
> From there, you can put together the solution you need. I
> had a basic idea of what you needed from the beginning,
> that is why I guided you to the CSPServer class at
> codeproject.com
>
> But even then, you still need to make it all thread ready
> and safe, regarding its interactive or store and forward,
> or fifo, lifo, or what have you.
>
> In other words, but you can even begin to talk having a
> party, you need to get the house ready for many guest.
>
> --
> HLS

I did not fully understand much of the above.

My OCR process has the following requirements:
(1) Must be a single thread on a machine dedicated to OCR
processing.
(2) Its data must remain resident in RAM
(3) Its input will be a less than 100K 24-bit color PNG file
coming over HTTP.
(4) Its output will be less than 10K of UTF-8 text.
(5) OCR processing time is less than 100 milliseconds.



From: Peter Olcott on

"Hector Santos" <sant9442(a)nospam.gmail.com> wrote in message
news:%23gh1e5DyKHA.5292(a)TK2MSFTNGP06.phx.gbl...
> Peter Olcott wrote:
>
>> Oh now I see where you are getting the 3.7 minutes from.
>> The actual maximum time per user request is 1/10 second
>> for a whole page of data. Requests larger than a page of
>> data will be placed into a lower priority queue. The
>> system recognizes 72,000 characters per second. My OCR
>> does not need to change much, it just needs to get its
>> requests from a FIFO queue. The webserver will have
>> multiple threads to append to this queue.
>>
>>> You need to focus first on making sure you can scale the
>>> OCR application. Making it thread ready and safe is a
>>> first step.
>>
>> It does not need to be thread ready, it only needs to be
>> able to read from a FIFO queue.
>
> Ok, I'll try this again (I already posted the points here
> but the mail was freaking lost), try it again in simple
> terms:

All the posts that I am seeing and all the posts that I am
making are showing up on Google Groups

http://groups.google.com/group/microsoft.public.vc.mfc/browse_thread/thread/f6146804be18f451/153bc7d9580b3741?hl=en&q=group%3Amicrosoft.public.vc.mfc&lnk=nl&

>
> THINK TRANSACTION TIME!
>
> How what are your boundary conditions for a HTTP request
> transaction time, both minimum and maximum?
>
> Based on what you said below:
>
> minimum: 100 ms per PAGE OF META DATA
> maximum: DELAYED PROCESSING
>
> Look, you are touching base with all sorts of common
> practice engineering design principles in this area. How
> about this so you can cut down on mail-tagging thread?
>
> What is your functional specification, NOT TECHNICAL,
> functional
> specification of the OCR application?
>
> In short, you either have:
>
> - fast *semi-interactive* client/server framework
> - delayed store and forward client/server framework
>
> This changes (or rather defines) all sorts of things.
> This all goes back to my original input:
>
> You need to work out the OCR state machine protocol.
>
> From there, you can put together the solution you need. I
> had a basic idea of what you needed from the beginning,
> that is why I guided you to the CSPServer class at
> codeproject.com
>
> But even then, you still need to make it all thread ready
> and safe, regarding its interactive or store and forward,
> or fifo, lifo, or what have you.
>
> In other words, but you can even begin to talk having a
> party, you need to get the house ready for many guest.
>
> --
> HLS


From: Hector Santos on
Peter Olcott wrote:

> "Hector Santos" <sant9442(a)nospam.gmail.com> wrote in message
> news:%23gh1e5DyKHA.5292(a)TK2MSFTNGP06.phx.gbl...
>> Peter Olcott wrote:
>>
>>> Oh now I see where you are getting the 3.7 minutes from.
>>> The actual maximum time per user request is 1/10 second
>>> for a whole page of data. Requests larger than a page of
>>> data will be placed into a lower priority queue. The
>>> system recognizes 72,000 characters per second. My OCR
>>> does not need to change much, it just needs to get its
>>> requests from a FIFO queue. The webserver will have
>>> multiple threads to append to this queue.
>>>
>>>> You need to focus first on making sure you can scale the
>>>> OCR application. Making it thread ready and safe is a
>>>> first step.
>>> It does not need to be thread ready, it only needs to be
>>> able to read from a FIFO queue.
>> Ok, I'll try this again (I already posted the points here
>> but the mail was freaking lost), try it again in simple
>> terms:
>>
>> THINK TRANSACTION TIME!
>>
>> How what are your boundary conditions for a HTTP request
>> transaction time, both minimum and maximum?
>>
>> Based on what you said below:
>>
>> minimum: 100 ms per PAGE OF META DATA
>> maximum: DELAYED PROCESSING
>>
>> Look, you are touching base with all sorts of common
>> practice engineering design principles in this area. How
>> about this so you can cut down on mail-tagging thread?
>>
>> What is your functional specification, NOT TECHNICAL,
>> functional
>> specification of the OCR application?
>>
>> In short, you either have:
>>
>> - fast *semi-interactive* client/server framework
>> - delayed store and forward client/server framework
>>
>> This changes (or rather defines) all sorts of things.
>> This all goes back to my original input:
>>
>> You need to work out the OCR state machine protocol.
>>
>> From there, you can put together the solution you need. I
>> had a basic idea of what you needed from the beginning,
>> that is why I guided you to the CSPServer class at
>> codeproject.com
>>
>> But even then, you still need to make it all thread ready
>> and safe, regarding its interactive or store and forward,
>> or fifo, lifo, or what have you.
>>
>> In other words, but you can even begin to talk having a
>> party, you need to get the house ready for many guest.
>>
>> --
>> HLS
>
> I did not fully understand much of the above.
>
> My OCR process has the following requirements:
> (1) Must be a single thread on a machine dedicated to OCR
> processing.


Why? I think you made a decision based on a poor implementation of
your software. See #5

> (2) Its data must remain resident in RAM


Same as #1

> (5) OCR processing time is less than 100 milliseconds.


This doesn't make sense. If a OCR process time is 100 ms, which is
pretty darn fast and SHORTEN then most http applications then why
can't you have multi-threads?

Again, I think that you have put unreasonable design constraints
mostly likely based on a poor understanding of WINTEL and an
implementation of your application on a WINTEL boxed high powered with
a QUAD, 8GB and an advanced NT based OS.


--
HLS