From: Peter Olcott on

"Jerry Coffin" <jerryvcoffin(a)yahoo.com> wrote in message
news:MPG.260e29d84d7b961d989854(a)news.sunsite.dk...
> In article
> <SYqdncgSB6lQ0z_WnZ2dnUVZ_sednZ2d(a)giganews.com>,
> NoSpam(a)OCR4Screen.com says...
>
> [ ... ]
>
>> Why is any of this complexity actually necessary in
>> actual
>> practice? It would be absurd to be required these days
>> when
>> so much RAM is available. If I always have 4.0 GB more
>> than
>> I need either your whole point becomes moot, or OS design
>> is
>> atrocious.
>
> It's probably not, but you've given a rather oddball
> request, and
> people are trying to answer it.
>
> I think you should step back from all the optimization
> you're working
> on, and instead deal with simply getting the thing working
> to some
> degree. Then you can _measure_ its performance and where
> it's
> spending its time. Only after you've done that do you
> stand a decent
> chance of accomplishing much in terms of optimization --
> chances are
> that right now you're spending a lot of time and effort on
> things
> that will never matter, and overlooking others that will
> end up being
> crucial.
>
> --
> Later,
> Jerry.

My OCR technology has been working perfectly for a few years
now, and it does require that its data remains in RAM. This
whole thread is about converting an MFC desktop application
into a web application. With Hector and Joe's help this now
seems to be pretty easy. The current design is to get a
copy of mongoose and drop it into my OCR application, thus
making my OCR application a webserver.


From: Hector Santos on
Peter Olcott wrote:

> "Jerry Coffin" <jerryvcoffin(a)yahoo.com> wrote in message


>> I think you should step back from all the optimization
>> you're working on, and instead deal with simply getting the

>> thing working to some degree. Then you can _measure_
>> its performance and where it's spending its time. Only
>> after you've done that do you stand a decent

>> chance of accomplishing much in terms of optimization --
>> chances are that right now you're spending a lot of time

>> and effort on things that will never matter, and
>> overlooking others that will end up being crucial.

>
> My OCR technology has been working perfectly for a few years
> now, and it does require that its data remains in RAM. This
> whole thread is about converting an MFC desktop application
> into a web application. With Hector and Joe's help this now
> seems to be pretty easy. The current design is to get a
> copy of mongoose and drop it into my OCR application, thus
> making my OCR application a webserver.


Good Morning.

Jerry is correct, you do need to step back and get something done and
then measure.

But there are some basic fundamentals that you are missing. I don't
think it has been working perfectly - just enough to satisfy some
proof of concept for you. If you want this to be a single thread FIFO
queue processor, the OCR application is not ready for a web server
which will spawn a thread for each request. If you are measuring a
3.7 min processing time, well, not only that change the paradigm of
your client UI, you have a 3.7*(1+Q) completion time where Q is the
number of items in the queue for each new request.

You need to focus first on making sure you can scale the OCR
application. Making it thread ready and safe is a first step.


--
HLS
From: Peter Olcott on

"Hector Santos" <sant9442(a)nospam.gmail.com> wrote in message
news:eUEo4l%23xKHA.5364(a)TK2MSFTNGP05.phx.gbl...
> Peter Olcott wrote:
>
>> "Hector Santos" <sant9442(a)nospam.gmail.com> wrote in
>> message news:%23YrAhQ%23xKHA.3408(a)TK2MSFTNGP06.phx.gbl...
>>> Peter Olcott wrote:
>>>
>>>> I just want to understand how the handle the most
>>>> malicious user in the most robust way. If I can block
>>>> an IP address without costing me any bandwidth, then I
>>>> think I have one significant aspect of blocking the
>>>> most malicious user.
>>> As joe puts it, you are "obsessing" again :)
>>>
>>> Its very simple with the server model:
>>>
>>> s = socket() - create a socket handle
>>> bind(s) - associate an local IP with socket
>>> s handle
>>> listen(s) - starting to listen to connection
>>> accept(s,c) - wait for clients, new socket c
>>> for connect
>>>
>>> CHECK PEER IP ADDRESS OF C IN FILTER LIST,
>>> IF FOUND, CLOSE SOCKET GOTO BACK TO ACCEPT
>>>
>>> recv(c) - receiver whatever
>>> send(c) - send whatever
>>> shutdown(c) - tell remote I'm about to close
>>> closesocket(c) - close socket handle
>>> go back to accept
>>>
>>> --
>>> HLS
>>
>> That does seem pretty simple. Would I only need a single
>> recv(c) for a 10 MB file?
>
> No, a recv() is a loop and you read 8K at a time:
>
> FILE *fv = fopen(szImageFileName,"wb");
> char buf[8*1024] = {0};
> for (;;)
> {
> int len = recv(c,buf,sizeof(buf),0);
> if (len <= 0) break;
> fwrite(buf,len,1,fv);
> }
> fclose(fv);
>
> But the above is very simplistic.
>
> Mongoose.c, without verifying but I trust it will,
> otherwise it isn't a web server, will same the POSTED data
> in some temporary file for the transaction (while the
> connection is alive). That FILE NAME is then passed your
> CGI script or whenever is going to process the posted
> data.
>
> So it should be done for you. You just need to learn how
> Mongoose implements this fundamental web server idea.
>
> Again, if it doesn't handled POSTed data for you, then GET
> RID of it, it isn't a real web server of any kind.
>
> But I know it does because its fundamental and a HTTP
> standard requirement to handle POST correctly and that
> includes save the data somewhere for the session to
> process.
>
>
> --
> HLS

That is great Hector. It looks like the basic architectural
design is set. I won't limit myself to mongoose in the long
run.
http://en.wikipedia.org/wiki/Comparison_of_lightweight_web_servers
One of these might prove to be better in the long run.
Another design constraint is very high security. That will
require another whole learning curve.


From: Peter Olcott on

"Hector Santos" <sant9442(a)nospam.gmail.com> wrote in message
news:ea9FzcDyKHA.2644(a)TK2MSFTNGP04.phx.gbl...
> Peter Olcott wrote:
>
>> "Jerry Coffin" <jerryvcoffin(a)yahoo.com> wrote in message
>
>
>>> I think you should step back from all the optimization
>>> you're working on, and instead deal with simply getting
>>> the
>
> >> thing working to some degree. Then you can _measure_
> >> its performance and where it's spending its time. Only
> >> after you've done that do you stand a decent
>
>>> chance of accomplishing much in terms of optimization --
>>> chances are that right now you're spending a lot of time
>
> >> and effort on things that will never matter, and
> >> overlooking others that will end up being crucial.
>
>>
>> My OCR technology has been working perfectly for a few
>> years now, and it does require that its data remains in
>> RAM. This whole thread is about converting an MFC desktop
>> application into a web application. With Hector and Joe's
>> help this now seems to be pretty easy. The current
>> design is to get a copy of mongoose and drop it into my
>> OCR application, thus making my OCR application a
>> webserver.
>
>
> Good Morning.
>
> Jerry is correct, you do need to step back and get
> something done and then measure.
>

Yes, but, I NEVER EVER do this until the basic architecture
is certainly decided upon.

> But there are some basic fundamentals that you are
> missing. I don't think it has been working perfectly -
> just enough to satisfy some proof of concept for you. If
> you want this to be a single thread FIFO

Consistent 100% accuracy on all font instances.

> queue processor, the OCR application is not ready for a
> web server which will spawn a thread for each request. If
> you are measuring a 3.7 min processing time, well, not
> only that change the paradigm of your client UI, you have
> a 3.7*(1+Q) completion time where Q is the number of items
> in the queue for each new request.


Oh now I see where you are getting the 3.7 minutes from. The
actual maximum time per user request is 1/10 second for a
whole page of data. Requests larger than a page of data will
be placed into a lower priority queue. The system recognizes
72,000 characters per second. My OCR does not need to change
much, it just needs to get its requests from a FIFO queue.
The webserver will have multiple threads to append to this
queue.

>
> You need to focus first on making sure you can scale the
> OCR application. Making it thread ready and safe is a
> first step.

It does not need to be thread ready, it only needs to be
able to read from a FIFO queue.

>
>
> --
> HLS


From: Hector Santos on
Hector Santos wrote:

> Good Morning.
>
> Jerry is correct, you do need to step back and get something done and
> then measure.
>
> But there are some basic fundamentals that you are missing. I don't
> think it has been working perfectly - just enough to satisfy some proof
> of concept for you. If you want this to be a single thread FIFO queue
> processor, the OCR application is not ready for a web server which will
> spawn a thread for each request. If you are measuring a 3.7 min
> processing time, well, not only that change the paradigm of your client
> UI, you have a 3.7*(1+Q) completion time where Q is the number of items
> in the queue for each new request.
>
> You need to focus first on making sure you can scale the OCR
> application. Making it thread ready and safe is a first step.

I should note that if you can't scale up (more threads/cpus), then you
have no choice but to scale out (more machines).

I extremely doubt that you can't run multiple threads of your OCR
applications.

--
HLS