Can extra processing threads help in this case? [MFC]

Prev: Improving Pete'r Application Performance
Next: Competitors for Pet'e OCR system

From: Hector Santos on 12 Apr 2010 00:18

Peter Olcott wrote:

> "Hector Santos" <sant9442(a)nospam.gmail.com> wrote in message
> news:O2mJq3d2KHA.2284(a)TK2MSFTNGP06.phx.gbl...
>> Peter Olcott wrote:
>>
>> Joe, at least so far we got him to:
>>
>> - Admit to lack of understanding of memory and he
>> himself reduced
>> the loading requirement rather than code for any large
>> memory
>> efficiency methods.
>
> No. Joe was and continues to be wrong that a machine with
> plenty of extra RAM ever needs to page out either a process
> or its data.

No, Joe never said or implied that at all. No one did. The only thing
that you "said" you found that if you load a simple test program (that
doesn't do any real testing at all), that you see ZERO FAULTS after
the initial faults have settled down.

That fact you got initial faults should TELL you that you are still a
candidate for faulting, especially when the system is LOADED. You
have not loaded your system.

>
>> - Admit that his 100 TPS was unrealistic for a 10 ms
>> throughput
>> that lacked consideration for the interfacing
>> processing time
>> outside the vapor ware OCR processor. So he added
>> another
>> 10 ms and reduced the TPS now to 50.
>
> No, the latest analysis indicates that I am back up to 100
> because the webserver and the OCR execute in parallel.

No, it shows EXACTLY what the simple equation

TPS = N * 1000/ WORK LOAD

and the charts I provided to you is SAYING, that if if you want 100
TPS, with a 20 ms WORK LOAD, yuo need N=2 Handlers!

If you want to do this with one handler, than you can only do 50 tps.

But again, this is an idealize equalized loading system - a single
queue with two handlers. One request coming in at a time. That is
not reality unless you synchronize the incoming queuing and perform
load balancing.

But knowing how you think, you will say, that each one is its own WEB
SERVER.

So what? How do you control the request that are coming in. You said
you want 100 TPS, but that load can come in in 500 msecs! Now your
simple equation is:

100 request/500 ms = N /20ms work load

Solve for N and N = 4 handlers, threads, separate processors, who
cares how they are concurrently running for that 500 ms time span,
hyperthreaded or each on their own CPU or machine - you need 4
handlers - period!

>> He basically does not see the queue accumulation!
>
> The only way this site is going to ever get too long of a
> queue is if too many free jobs are submitted. Do you really
> think that this site is ever going to be making $10.00 per
> second? If not then I really don't have to worry about queue
> length. In any case I will keep track of the average and
> peak loads.

Fine, if you are going to do do thread delegation and load balancing,
fine. All I am pointing out in this lesson is that your modeling is
flawed for the work loading you expect to get and will not using this
Many Thread to 1 FIFO queuing framework.

Even without my software experience, I'm a chemical engineer, this is
UNIT OPS 101. College freshman understanding. Even for accountants!

IN = OUT

must be conserved to obtain any level of steady state operation -
otherwise you begin to get chaos, pressures, overflows, EXPLOSIONS!

Here is a quick plan:

1) Get any web server with CGI or PHP script mapping support.

2) Design logic for shared map READ-ONLY meta data so you don't have
the 30-60 load time.

3) Use simple PHP to process your OCR. The OCR can still be your
SINGLE PROCESS non-threaded compiled code. PHP will give you all the
scripting power and support for any SQL engine, direct file or
anything logging you need, including delegation to a one of the OCR
processors.

GET YOUR MEASUREMENTS BASED ON THIS and begin the next step, if
necessary, as you might be surprise that you might be able to handle a
good enough TPS just to get you started for your presentations.

Best of all, you can do the above under LINUX!

If it pays offs to continue and you need better scalability, then you
can begin to explore many of the ideas discussed here to improve it.

Overall, you need to redesign and measure how the OCR processor can
work as a multi-threaded processor. You won't get very far until you
do this.

--
HLS

From: Peter Olcott on 12 Apr 2010 00:29

"Hector Santos" <sant9442(a)nospam.gmail.com> wrote in message
news:O%23HmXKf2KHA.5212(a)TK2MSFTNGP04.phx.gbl...
> Peter Olcott wrote:
>
>> http://en.wikipedia.org/wiki/Priority_inversion
>> If there are no shared resources then there is no
>> priority inversion.
>> Try and provide a valid counter-example of priority
>> inversion without shared resources.
>
>
> You don't to have a dead lock to reveal problems. You can
> get Race Conditions with classic SYNC 101 mistakes like
> this that depends on time synchronizations:
>
> if (NumberOfHighPriorityJobsPending !=0)
> nanosleep(20);

20 milliseconds

>
> Since you like wikipedia, read:
>
> http://en.wikipedia.org/wiki/Race_condition
>
> Whats the point of the above? Are you expecting that the
> value will turn 0 in the nanosleep(20) which is wrong
> anyway. Is that 20 seconds or 20 nanaseconds? Did you
> really mean?
>
> if (NumberOfHighPriorityJobsPending !=0)
> usleep(20);
>
> In either case, you are are in for a RUDE awakening with
> that.
>
> You probably mean:
>
> while (NumberOfHighPriorityJobsPending !=0)
> usleep(20);
>
> which COULD be fine, but you should use an optimized
> kernel object here to wait on.
>
> if (WaitForSingleObject(hPriorityEvent, INFINITE) ==
> WAIT_OBJECT) {
> /// do whatever
> } else {
> /// Not what I expected
> }
>
> When you wait on a kernel object, you won't be spinning
> your thread like you do above.

Event driven is better. I would prefer that the high
priority jobs has absolute priority over the lower priority
jobs. Even better would be if this could be done
efficiently. I think that process priority would work well
enough. That would depend on how the kernel scheduler works,
the frequency and duration of the time slices.

>> You are not explaining with much of any reasoning why you
>> think that one alternative is better than another, and
>> when I finally do get you to explain, it is only that
>> your alternative is better than your misconception of my
>> design, not the design itself.
>
>
> No, your problem is that you are stuck with a framework

One design constraint that won't be changed until system
load requires it is that we must assume a single core
processor with hyperthreading.

> Many Threads to 1 FIFO/OCR process
>
> and everyone is telling you its flawed and why. I'm tried
> different

When they finally get to the why part I point out their
false assumption. A priority Queue may be a great idea with
multiple cores, I will not have those.

> ways using your WORK LOAD which you accepted and began to
> change your TPS.
>
> But you still going to overflow your Many to 1 design,
> especially if you expect to use TIME to synchronize
> everything.

This is not a given, but, using time to synchronize is not
the best idea. It could possibly waste a lot of CPU. So then
four processes with one getting an 80% of the relative share
and the other three sharing about 7%.

>> Exactly what are these ways, and precisely what have I
>> failed to account for?
>
>
> You been told in a dozen ways why it will fail! You are
> OFF in your timing of everything for the most part. You
> think you can achieve what you want with a Many Thread to
> 1 OCR process design at the TPS rates and work load you
> think you can get.
>
> You can't!

Four processes four queues each process reading only from
its own queue. One process having much more process priority
than the rest. Depending upon the frequency and size of the
time slices this could work well on the required single core
processor.

On a quad-core it would have to be adapted possibly using a
single priority queue so that the high priority jobs could
possibly be running four instances at once.

>> I know full and well that the biggest overhead of the
>> process is going to be disk access. I also know full and
>> well that tripling the number of disk access would likely
>> triple overhead. I am not sure that SQLite is not smart
>> enough to do a record number based seek without requiring
>> an index. Even if SQLite is not smart enough to do a
>> record seek without an index, it might still be fast
>> enough.
>
>
> This is what I am saying, WE TOLD YOU WHAT THE LIMITS OF
> SQLITE are and you are not listening. You can do a ROW
> lookup, but you can't do a low level FILE RECORD POSITION
> AND BYTE OFFSET like you think you need, but really don't.

As long as the ROW lookup maps to the file byte offset we
are good. If the ROW lookup must read and maintain an index
just to be able to get to the rows in sequential order, this
may not be acceptable.

> I also told you that while you UPDATE an SQLITE database,
> all your READS are locked!
>
> You refuse to comprehend that.

I knew this before you said it the first time. The practical
implications of this is that SQLite can't handle nearly as
many as simultaneous updates as other row locking systems.
Their docs said 500 transaction per second.

> Again, you can SELECT a row in your table using the proper
> query, but it isn' a direct FILE ACCESS with BYTE OFFSET
> idea and again, SQLITE3 will lock your database during
> updates so all your REQUEST SERVER will be locked in
> reading/writing any table while it is being updated by
> ANYONE.

If it doesn't require a separate index to do this, then the
record number maps to a byte offset. Since record numbers
can be sequenced out-of-order, in at least this instance it
must have something telling it where to go, probably an
index. Hopefully it does not always make an index just in
case someone decides to insert records out-of-sequence.

>> You (and Hector) are definitely right on some things.
>
>
> We are right on EVERYTHING discussed here. There has been
> nothing you stated or posted that indicates any error in
> all suggestions to you.

You and Joe are most often wrong by making false assumptions
about the details of my design and its requirements.

> IDEAL: Many Threads to Many Threads
> WORST: Many Threads to 1 thread

I guess that I am currently back to alternative two which is
many threads or a web server to four OCR processes via four
FIFOS on a single core machine, one process having much more
process priority than the others.

A multi-core processor would probably involve the same thing
except have a single priority queue in-between.

From: Jerry Coffin on 12 Apr 2010 01:18

In article <LuidnT3tuaC7p1_WnZ2dnUVZ_gCdnZ2d(a)giganews.com>,
NoSpam(a)OCR4Screen.com says...

[ ... ]

> Alternative (a) There are four processes with four queues
> one for each process. These processes only care about
> executing the jobs from their own queue. They don't care
> about the jobs in any other queue. The high priority process
> is given a relative process priority that equates to 80% of
> the CPU time of these four processes. The remaining three
> processes get about 7% each. This might degrade the
> performance of the high priority jobs more than the next
> alternative.

There is no such thing with any OS of which I'm aware. At least with
a typical OS, the highest priority task is the *only* one that will
run at any given time. Windows (for one example) does attempt to
prevent starvation of lower priority threads by waking one lower
priority thread every four seconds.

Though the specific details differ, Linux works reasonably similarly.

Neither, however, provides any set of priorities that will give
anything similar to what you've described. It just doesn't exist.

> Alternative (b) each of the low priority jobs checks to see
> if a high priority job is in the queue or is notified by a
> signal that a high priority job is waiting. If a high
> priority job is waiting then each of these low priority jobs
> immediately sleeps for a fixed duration. As soon as they
> wake up these jobs check to see if they should go back to
> sleep or wake up.

This requires that each of those tasks is aware of its own process
scheduling AND of the scheduling of other processes of higher
priority. Worse, without a lot of care, it's subject to race
conditions -- e.g. if a high priority task shows up, for this scheme
to work, it has to stay in the queue long enough for every other task
to check the queue and realize that it needs to sleep, *before* you
start the high priority task -- otherwise, the task that's supposed
to have lower priority will never see that it's in the queue, and
will continue to run.

Bottom line: you're ignoring virtually everything the world has
learned about process scheduling over the last 50 year or so. You're
trying to start over from the beginning on a task that happens to be
quite difficult.

> These processes could even simply poll a shared memory
> location that contains the number of high priority jobs
> currently in the queue. From what the hardware guys have
> told me memory writes and reads can not possibly garble each
> other.

This has the same problem outlined above. It adds the requirement for
a shared memory location, and adding polling code to the OCR tasks.
See above about ignoring what the world has learned about process
scheduling over the last 5 decades or so.

[ ... ]

> Neither of these designs has any of the behavior that you
> mentioned.

No -- they're substantially worse. At least all that did was
occasionally start a lower-priority task out of order.

[ ... ]

> I already figured out a way around that. Everyone must have
> their own user account that must be created by a live human.
> All users are always authenticated against this user
> account. I don't see any loopholes in this on single form of
> protection.

Not even close, and you clearly don't understand the problem at all
yet. The problem is that to authenticate the user you've *already*
created a thread for his connection. The fact that you eventually
decide not to do the OCR for him doesn't change the fact that you've
already spawned a thread. If he makes a zillion attempts at
connecting, even if you eventually reject them all, he's still gotten
you to create a zillion threads to carry out the attempted
authentication for each, and then reject it.

Of course, that also ignores the fact that doing authentication well
is non-trivial itself.

--
Later,
Jerry.

From: Jerry Coffin on 12 Apr 2010 01:57

In article <abidnZAfALb5HF_WnZ2dnUVZ_jydnZ2d(a)giganews.com>,
NoSpam(a)OCR4Screen.com says...

[ ... ]

> No. Joe was and continues to be wrong that a machine with
> plenty of extra RAM ever needs to page out either a process
> or its data.

It's not a question of whether it *needs* to -- it's a simple fact
that with both Windows and Linux, it will *try* to whether it needs
to or not. Windows it's called the working set trimmer -- it's a task
that does nothing but attempt to remove pages from a process' working
set. Offhand, I don't remember the name of the equivalent under
Linux, but it has one (runs as a daemon if memory serves). If you
want to avoid that, you probably want to try to find a provider
running one of the *BSD systems for the server instead -- I haven't
re-checked recently, but at least as of a few years ago, *BSD (at
least most of them) did not have an equivalent of a working set
trimmer.

[ ... ]

> No, the latest analysis indicates that I am back up to 100
> because the webserver and the OCR execute in parallel.

On a single core machine? There are a few pieces that can execute in
parallel (the OCR can use the CPU while the network adapter is
reading or writing data), but with only one core, very little really
happens in parallel -- the whole point of multiple cores (or multiple
processors) is to allow them to *really* do things in parallel,
instead of just switching between processes quickly enough for it to
*look* like they're running in parallel.

[ ... ]

> The only way this site is going to ever get too long of a
> queue is if too many free jobs are submitted. Do you really
> think that this site is ever going to be making $10.00 per
> second? If not then I really don't have to worry about queue
> length. In any case I will keep track of the average and
> peak loads.

Keeping track of average vs. peak load is easy -- dealing with it
(given a task as processor intensive as you've suggested) is not.

Seriously, you'd be a lot better off with a "cloud" computing
provider than one that gives you only a single core. Assuming your
OCR really works, there's a pretty fair chance that the work pattern
will be substantially different than you seem to imagine -- instead
of a page or two at a time, you're (reasonably) likely to receive
scans of an entire book at a time.

This is a scenario where something like Amazon's EC2 would work well
-- you pay only for processor time you actually use, but if you get a
big job, it can run your task on dozens or even thousands of
processors so (for example) all the pages from a book are OCRed in
parallel, the results put back together, and your customer gets his
result back quickly. Then your system goes idle again, and you quit
paying for *any* processor time until another task comes along.

--
Later,
Jerry.

From: Hector Santos on 12 Apr 2010 09:39

On Apr 12, 1:57 am, Jerry Coffin wrote to Peter:

> Seriously, you'd be a lot better off with a "cloud" computing
> provider than one that gives you only a single core. Assuming your
> OCR really works, there's a pretty fair chance that the work pattern
> will be substantially different than you seem to imagine -- instead
> of a page or two at a time, you're (reasonably) likely to receive
> scans of an entire book at a time.
>
> This is a scenario where something like Amazon's EC2 would work well
> -- you pay only for processor time you actually use, but if you get a
> big job, it can run your task on dozens or even thousands of
> processors so (for example) all the pages from a book are OCRed in
> parallel, the results put back together, and your customer gets his
> result back quickly. Then your system goes idle again, and you quit
> paying for *any* processor time until another task comes along.

Interesting proposal.

But this won't eliminate coding requirements? He only leverages the
computing power and scaling needs which is good as one less (big)
issue to deal with, but he still needs to frame his code around the
same principles. In fact, this current framework of Many Threads to 1
FIFO queue might cost him more based on my readings. He will need to
make it dynamic loading for EC2 to minimize his cost.. Right?

--
HLS

First | Prev | Next | Last
Pages: 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136
Prev: Improving Pete'r Application Performance
Next: Competitors for Pet'e OCR system