From: Jerry Coffin on
In article <L9qdndbIjeoeOCHWnZ2dnUVZ_tKdnZ2d(a)giganews.com>,
NoSpam(a)OCR4Screen.com says...

[ ... ]

> The web server is designed with one thread per HTTP request.

This falls somewhere in the "terrible" to "oh no!" range. You
normally want a fairly small pool of threads with a queue of tasks
for the threads to do. The number of threads in the pool is normally
based on the number of (logical) processors available -- on the order
of 2 to 4 times as many as processors is fairly typical. Given that
you're dealing with an extremely I/O heavy application, a few more
than that might make sense, but not a whole lot.

What's most important is that you *not* tie a thread to a request
though -- the number of threads is a tunable parameter of the pool,
independent of the number of HTTP requests.

> I may have as many 1,000 concurrent HTTP requests. I am
> thinking that each of these threads could append to a single
> file with no conflict (the OS sequencing these operations)
> as long as the append is immediately flushed or buffering is
> turned off.

I don't know a more polite way to say it, so I'll put it bluntly:
you're wrong. You cannot depend on the OS to order the operations.
Your code needs to enforce the ordering that's necessary. The OS will
supply mechanisms (e.g. mutexes, file locking) you can use to do
that, but the OS isn't going to do much for you at all.

In most cases, ordering tends to be a fairly minor problem. Most of
what you usually need is atomic transactions. Each write becomes an
"all or nothing" proposition, so it either completes, or it gets
rolled back so it's as if it was never even attempted, and once it
starts it runs to completion without any other write from any other
thread getting in its way or interleaving its data with the current
data.

With atomic transactions, ordering is as simple as including the
sequencing as part of your data. Without atomic transactions, you're
sunk -- you have no hope of it working dependably, but also no hope
of it failing dependably, so what you'll almost inevitably end up
with is series of problems followed by attempts at workarounds that
don't really work right either.

> The OCR process(s) would be notified of a new request using
> some sort of IPC (named pipes for now) that also tells it
> the byte offset in the transaction log file to find the
> transaction details. Each transaction will have three
> states:
> (a) Available (Init by web server)
> (b) Pending (updated by OCR process)
> (c) Completed (updated by OCR process)
>
> I am not sure how the OCR process would notify the
> appropriate thread within the web server process of the
> [Completed] event, but, it would use some sort of IPC.

What's the point of doing things this way? Right now, you're planning
to write some data to the transaction log, then send a pointer to
that data to the OCR engine, then the OCR engine reads the
transaction log, does the OCR, updates the transaction log, and
alerts the appropriate thread in the web server.

Instead of sending the data from the input thread to the OCR via the
transaction log (and sending a pointer directly from the input to the
OCR), send the relevant data directly from the input thread to the
OCR engine. Have the OCR engine send the result data directly back
into another queue. Have one pool of threads that takes HTTP
requests, extracts relevant data, and puts it into the request queue.
Have another pool of threads that takes results from the queue and
sends out replies.

If you need a transaction log (e.g. to track usage for billing and
such) insert it into the queuing chain (on either the incoming or
outgoing side, as appropriate). Since it's dealing with queued
requests, it can be single threaded, just writing the relevant data
in order. This avoids all the headaches that arise with multithreaded
access to the database (and usually improves performance, since it
avoids seeking around the disk to read and write data at different
parts of the file).

--
Later,
Jerry.
From: Peter Olcott on

"Hector Santos" <sant9442(a)nospam.gmail.com> wrote in message
news:OUUch5q1KHA.3744(a)TK2MSFTNGP04.phx.gbl...
> Hector Santos wrote:
>
>> Peter Olcott wrote:
>>
>>>> That means you can only handle 10 request per second.
>>>
>>> No it does not. 100 ms is the real-time limit, actual
>>> processing time will average much less than this, about
>>> 10 ms.
>>
>>
>> Now you are even more unrealistic. That means for a 100
>> TPS,
>> you need now need 100 threads.
>
>
> I misspoke here. If your unrealistic transaction time is
> 10 ms, then your 1 OCR processor would be able to handle
> 100 TPS.

Yes. It has been benchmarked at 100 ms for 7200 glyphs, and
the model of the new algorithm benchmarked at ten-fold
faster.

One page of data (based on the size of the image) is the
maximum size of a single unit of work. Larger requests can
be submitted but are placed in a lower priority queue.

>
> But 10 ms processing time is very unrealistic.
>
>> But I want you to lookup the term Thread Quantum.
>
>
> And please do look this up. Its very important Peter.

1 ms under Linux
http://216.154.219.151/tutorials/threads/thread_scheduling_2.shtml

>
>> In short, what you are claiming is that your complete a
>> request and processing in 1 CPU cycle of context
>> switching. A quantum is around ~15 ms on
>> multi-core/processors.
>>
>>>> No matter how you configure it, 10 threads in 1
>>>> process, 10 processes on 1 machine or across machines,
>>>> you need at least 10 handlers to handle the 100 TPS
>>>> with 100 ms transaction times.
>>>
>>> 10 ms transaction time
>>
>>
>> Unrealistic. Dreaming.
>>
>
>
> To prove the point, here is a simple code to show it:
>
> #include <windows.h>
>
> void main(int
>
> DWORD t1 = GetTickCount();
> Sleep(1); // sleep 1 millisecond
> DWORD t2 = GetTickCount();
>
> You will see the t2-t1 is around ~15 ms. Its call a
> QUANTUM, your sleeps is in factors of Quantums:
>
> Sleep(16) --> 2 quantums or ~30 ms
> Sleep(32) --> 3 quantums or ~45 ms
>
> #include <windows.h>
> #include <stdio.h>
>
> void main(char argc, char *argv[])
> {
> DWORD t1 = GetTickCount();
> Sleep(1); // sleep 1 millisecond
> DWORD t2 = GetTickCount();
> printf("Sleep Efficiency: %d\n",t2-t1);
>
> t1 = GetTickCount();
> Sleep(16);
> t2 = GetTickCount();
> printf("Sleep Efficiency: %d\n",t2-t1);
>
> t1 = GetTickCount();
> Sleep(32);
> t2 = GetTickCount();
> printf("Sleep Efficiency: %d\n",t2-t1);
> }
>
> What it means is that in a code that does not do any
> preemption on its own (which will slow it down), just
> natural code, the CPU and OS will preempt you every
> QUANTUM.
>
> I sincerely doubt you can do your OCR processing in less
> than 1 QUANTUM yet alone 10 ms.
>
> Now, here's the thing:
>
> If indeed you can achieve processing in less than 1
> quantum or even 2 quantums, then you really should not be
> worry about anything else because your OCR system would be
> among the fast applications in the world!
>
>
> --
> HLS


From: Peter Olcott on

"Hector Santos" <sant9442(a)nospam.gmail.com> wrote in message
news:ejwoa%23q1KHA.1420(a)TK2MSFTNGP02.phx.gbl...
> Peter Olcott wrote:
>
>>> Now you can use ANY web server that supports CGI, use a
>>> web server with an CMS in it so you can manage user
>>> accounts, etc.
>>
>> I am going to use a web server that has source-code so I
>> won't need any kind of CGI.
>
>
> true, you don't have call CreateProcess("OCR.EXE") and
> worry about the std I/O redirection overhead which is
> good, but your request would be handle the same way,
> dynamically.
>
> Even with the benefit of the doubt of 10 ms processing
> time (which is unrealistic), you can do a true CGI very
> effectively (which mongoose supports) and you now have a
> benchmark to see how much you can dynamically handle.

True CGI probably has 10 ms of extra overhead over the
already resident web server.

>
> Anyway, you got the ideas. Its up to you now to program it
> and learn for yourself whats possible.
>
> --
> HLS


From: Hector Santos on
Joseph M. Newcomer wrote:

>> DWORD t1 = GetTickCount();
>> Sleep(1); // sleep 1 millisecond
> ***
> Actually, this is EXACTLY the same as writing Sleep(15);
> ****


You did not read further.

>> DWORD t2 = GetTickCount();
> ****
> This is meaningless, because it is subject to what is called "gating error". While
> generally you will see the difference as being 15ms, it really deals with how the timer
> tick count is updated relative to the scheduler. I do not believe this is defined.
> ****


Whats meaningless? The difference showing that it will not be 1 ms
sleep? No generally, always! The only way to change that is by using
the multi-media timers, see timeBeginPeriod() which allows you to set
the greatest resolution possible for the hardware - 1 ms. That still
doesn't change your quantum.

>> What it means is that in a code that does not do any preemption on its
>> own (which will slow it down), just natural code, the CPU and OS will
>> preempt you every QUANTUM.
> ****
> Not quite true. It is actually far more complex and subtle than this.


It is very true Joe, and for pedro its good enough to show the key
point that his dream of a complete transaction time of 10 ms is below
a QUANTUM!

> For example, if
> there is anothef compute-bound interactive thread running, you will get quite different
> numbers. And if you create the following app
>
> void main()
> {
> for(;;) {}
> }
>
> and run it at priority 15, you will get some really UGLY results when you run your above
> example at normal priority.
> ****


Right because if there is no other equal priority threads, the OS will
immediately make it active again producing a very high context
switching overhead.

But we are talking about equal priority threads, regardless of class,
no preemption on your part, natural op-code quantum based switches
which the scheduler will attempt to provide equal time slicing to
equal priority threads - period. Nothing untrue about that.

Anyway, the main point is that even with pure code, no hardware
interrupts, which he won't be able to avoid anyway, no sleeps, nothing
that perpetuates context switching, purely natural quantum based
switching, I doubt his OCR code is that fast to complete in less than
1 quantum.

He's totally unrealistic.

--
HLS
From: Peter Olcott on

"Joseph M. Newcomer" <newcomer(a)flounder.com> wrote in
message news:6aspr5dr3kb4npe47j9mu26kbl2ib4s28v(a)4ax.com...
> On Wed, 7 Apr 2010 10:07:02 -0500, "Peter Olcott"
> <NoSpam(a)OCR4Screen.com> wrote:
>
>>
>>Sure so another way to solve this problem is on the rare
>>cases when you do lose a customer's money you simply take
>>their word for it and provide a refund. This also would
>>hurt
>>the reputation though, because this requires the customer
>>to
>>find a mistake that should not have occurred.
> ****
> Incredibly elaborate mechanisms to solve non-problems.
> Simple mechanisms (e.g., "resubmit
> your request") should suffice. Once your requirements
> state what failure modes are

You are not paying attention. I am talking about a server
crash with loss of data after the customer has added money
to their account, but, before this financial transaction has
been saved to offsite backup. They add ten bucks to their
account and I lose track of it because the server crashed
and it was not yet time for my periodic backup.