Can extra processing threads help in this case? [MFC]

Prev: Improving Pete'r Application Performance
Next: Competitors for Pet'e OCR system

From: Peter Olcott on 12 Apr 2010 10:31

"Hector Santos" <sant9442(a)nospam.gmail.com> wrote in message
news:%23aTn4cf2KHA.3568(a)TK2MSFTNGP04.phx.gbl...
> Peter Olcott wrote:

>> No, the latest analysis indicates that I am back up to
>> 100 because the webserver and the OCR execute in
>> parallel.
>
>
> No, it shows EXACTLY what the simple equation
>
> TPS = N * 1000/ WORK LOAD
>
> and the charts I provided to you is SAYING, that if if you
> want 100 TPS, with a 20 ms WORK LOAD, yuo need N=2
> Handlers!
>

hyperthreading = two handlers

> But again, this is an idealize equalized loading system -
> a single queue with two handlers. One request coming in
> at a time. That is not reality unless you synchronize the
> incoming queuing and perform load balancing.

A single queue with two handlers will not by itself provide
the prioritization that I need.

> So what? How do you control the request that are coming
> in. You said you want 100 TPS, but that load can come in
> in 500 msecs! Now your simple equation is:
>
> 100 request/500 ms = N /20ms work load
>
> Solve for N and N = 4 handlers, threads, separate
> processors, who cares how they are concurrently running
> for that 500 ms time span, hyperthreaded or each on their
> own CPU or machine - you need 4 handlers - period!

Most of the wall clock time is in the web server,
communicating with the client and the web server creates one
thread per HTTP request. As far as the OCR processor is
concerned it must finish its high priority jobs in 10 ms,
one at a time in FIFO order. It is not even aware of the
HTTP delays.

>> The only way this site is going to ever get too long of a
>> queue is if too many free jobs are submitted. Do you
>> really think that this site is ever going to be making
>> $10.00 per second? If not then I really don't have to
>> worry about queue length. In any case I will keep track
>> of the average and peak loads.
>
>
> Fine, if you are going to do do thread delegation and load
> balancing, fine. All I am pointing out in this lesson is
> that your modeling is flawed for the work loading you
> expect to get and will not using this
> Many Thread to 1 FIFO queuing framework.

I still see four different queues as a better solution for a
single core processor. It is both simpler and more
efficient. One of the types of jobs will take 210,000 ms and
this job absolutely positively can not screw up my maximum
100 ms real time threshold for my high priority jobs. Joe's
solution is simply broken in this case.

> 1) Get any web server with CGI or PHP script mapping
> support.
I am not going to learn a whole new computer language to do
something that I can already do better and in less time
otherwise.

From: Peter Olcott on 12 Apr 2010 11:05

"Jerry Coffin" <jerryvcoffin(a)yahoo.com> wrote in message
news:MPG.262bf58bf8ba735989863(a)news.sunsite.dk...
> In article
> <LuidnT3tuaC7p1_WnZ2dnUVZ_gCdnZ2d(a)giganews.com>,
> NoSpam(a)OCR4Screen.com says...
>
> [ ... ]
>
>> Alternative (a) There are four processes with four queues
>> one for each process. These processes only care about
>> executing the jobs from their own queue. They don't care
>> about the jobs in any other queue. The high priority
>> process
>> is given a relative process priority that equates to 80%
>> of
>> the CPU time of these four processes. The remaining three
>> processes get about 7% each. This might degrade the
>> performance of the high priority jobs more than the next
>> alternative.
>
> There is no such thing with any OS of which I'm aware. At
> least with
> a typical OS, the highest priority task is the *only* one
> that will
> run at any given time. Windows (for one example) does
> attempt to
> prevent starvation of lower priority threads by waking one
> lower
> priority thread every four seconds.

The alternative that you show quoted above is called time
slicing and has been available for many decades.
>
> Though the specific details differ, Linux works reasonably
> similarly.
>
> Neither, however, provides any set of priorities that will
> give
> anything similar to what you've described. It just doesn't
> exist.

Here is is:
http://en.wikipedia.org/wiki/Nice_(Unix)
I will have to run my own tests to see how the process
priority number map to the relative process priorities that
I provided above. Ultimately the schedule algorithm boils
down to essentially the frequency and duration of a time
slice. There is no need to map to the exact percentage
numbers that I provided.

>> Alternative (b) each of the low priority jobs checks to
>> see
>> if a high priority job is in the queue or is notified by
>> a
>> signal that a high priority job is waiting. If a high
>> priority job is waiting then each of these low priority
>> jobs
>> immediately sleeps for a fixed duration. As soon as they
>> wake up these jobs check to see if they should go back to
>> sleep or wake up.
>
> This requires that each of those tasks is aware of its own
> process
> scheduling AND of the scheduling of other processes of
> higher
> priority. Worse, without a lot of care, it's subject to
> race
> conditions -- e.g. if a high priority task shows up, for
> this scheme
> to work, it has to stay in the queue long enough for every
> other task
> to check the queue and realize that it needs to sleep,
> *before* you
> start the high priority task -- otherwise, the task that's
> supposed
> to have lower priority will never see that it's in the
> queue, and
> will continue to run.

My scheduler will signal all of the low priority jobs that
they need to sleep now. When the high priority queue is
empty, and all of the high priority jobs are completed the
low priority jobs get a signal to wake up now.

> Bottom line: you're ignoring virtually everything the
> world has
> learned about process scheduling over the last 50 year or
> so. You're
> trying to start over from the beginning on a task that
> happens to be
> quite difficult.

I see no other way to provide absolute priority to the high
priority jobs (paying customers) over the low priority jobs
(free users). Also I see no way that this would not work
well. If I get enough high priority jobs that the lower
priority jobs never ever get a chance to run that would be
fantastic. The whole purpose of the free jobs is to get more
paying jobs.

If you see something specifically wrong with this approach
please point out the specific dysfunctional aspect. I see no
possible dysfunctional aspects with this design.

> This has the same problem outlined above. It adds the
> requirement for
> a shared memory location, and adding polling code to the
> OCR tasks.
> See above about ignoring what the world has learned about
> process
> scheduling over the last 5 decades or so.

I already changed this to signals, they don't waste as much
time.

> No -- they're substantially worse. At least all that did
> was
> occasionally start a lower-priority task out of order.

I see no possible sequence of events where this would ever
occur, if you do please point it out detail by detail.

>> I already figured out a way around that. Everyone must
>> have
>> their own user account that must be created by a live
>> human.
>> All users are always authenticated against this user
>> account. I don't see any loopholes in this on single form
>> of
>> protection.
>
> Not even close, and you clearly don't understand the
> problem at all
> yet. The problem is that to authenticate the user you've
> *already*
> created a thread for his connection. The fact that you
> eventually
> decide not to do the OCR for him doesn't change the fact
> that you've
> already spawned a thread. If he makes a zillion attempts
> at
> connecting, even if you eventually reject them all, he's
> still gotten
> you to create a zillion threads to carry out the attempted
> authentication for each, and then reject it.

Block IP long before that.

>
> Of course, that also ignores the fact that doing
> authentication well
> is non-trivial itself.
>
> --
> Later,
> Jerry.

From: Peter Olcott on 12 Apr 2010 11:57

"Hector Santos" <sant9442(a)nospam.gmail.com> wrote in message
news:eLnb3%23f2KHA.4716(a)TK2MSFTNGP06.phx.gbl...
> Peter Olcott wrote:
>
>>> You don't to have a dead lock to reveal problems. You
>>> can get Race Conditions with classic SYNC 101 mistakes
>>> like this that depends on time synchronizations:
>>>
>>> if (NumberOfHighPriorityJobsPending !=0)
>>> nanosleep(20);
>>
>> 20 milliseconds
>
>
> Ok, fair enough, you honestly cited the wrong function,
> but we get

No I didn't
http://linux.die.net/man/2/nanosleep
I did provide the wrong parameters, it should have been
nanosleep(0, 20000000)
20000000 nanoseconds = 20 ms

> Who does the HTTP delegation? Or will you have FOUR
> different web servers with four different FORM pages?
> How do you control the cross domain security and business
> thread entry points where someone is trying to cheat you
> out of a few dollars?

(1) Everyone authenticates
(2) Some are free users, and some are people with money in
the OCR4Screen account.
(3) Web server determiness which are which, as well as the
size of the job.
(4) Web server routes the request to the correct queue.

> Remember you said 4 processors. One for each request type.
> I know you

No that is not what I said. I did not say processors,
(unless speck check screwed it up) I said processes, the
Unix/Linux conception of an independent running program with
its own address space.

> also confusingly also said 2 processors to handle 100 TPS
> again, but does that mean each processor can handle any
> type of request type?

One physical CPU with a single core and two hyperthreads.

> Do you see the conflicts all based on your own
> ever-changing descriptions?

No but I do see the conflicts based on your misreading what
I said.

>> One design constraint that won't be changed until system
>> load requires it is that we must assume a single core
>> processor with hyperthreading.
>
>
> A single CORE does not have HYPERTHRREADING!!! That is
> only possible with a multi-core CPU.

Not according to Intel terminology. Hyperthreading was what
Intel came out with before they came out with multiple
cores. Now that they came out with multiple cores, some of
these multi-core CPUs also have two hyperthreads per core,
and some do not.

>> A priority Queue may be a great idea with multiple cores,
>> I will not have those.
>
>
> So you will have a Multiple-CORE machine where each
> process affinity is set, all a SINGLE CPU machine where
> they is no hyperthreading?

No that is not what I said and yet another false assumption
on your part.

>> This is not a given, but, using time to synchronize is
>> not the best idea.
>
>
> It is never a good idea. So why did you show it?

It would have probably worked well enough, but, changing it
is better. I always design completely before I begin coding,
it only takes a few minutes to change a design it can take
many weeks to change the coding. I am now going with signals
for this alternative.

>> It could possibly waste a lot of CPU. So then four
>> processes with one getting an 80% of the relative share
>> and the other three sharing about 7%.
>
>
> Not related to your design solution, but a cause
> contributing towards your design problem.

So then are there are specific dysfunctional aspects with
the simple time slicing that I proposed immediately above
that you can point out?

>> Four processes four queues each process reading only from
>> its own queue. One process having much more process
>> priority than the rest. Depending upon the frequency and
>> size of the time slices this could work well on the
>> required single core processor.
>
>
> Well, until we get straight what you THINK a "single core"
> processor means, it depends. I think you mean each process
> is assigned a CPU affinity on a multi-core machine.

PENTIUM 4, it might not even have hyperthreading.

> But as I pointed out, they are not as isolated as you
> think:
>
> - HTTP request delegator or proxy?

The web server does this.

> - common customer database for authentication and ACL
> ideas
> (access control list) or four separate databases?

single customer database
>
> - One log for each one or one single log?

A single log, and additionally a separate log for each paid
job stored in the customer database with the output data.

>> On a quad-core it would have to be adapted possibly using
>> a single priority queue so that the high priority jobs
>> could possibly be running four instances at once.
>
>
> So each OCR can handle any job type. Fine. But you need a
> single source manager as indicated above which negates
> your design.

Not at all. The web server delegaes the jobs to queues.

> Don't forget the HTTP response requirements - which is
> another bottle neck since you have to update/log state
> points.

OCR process tells the web server when a specific job is
completed.

>>> This is what I am saying, WE TOLD YOU WHAT THE LIMITS OF
>>> SQLITE are and you are not listening. You can do a ROW
>>> lookup, but you can't do a low level FILE RECORD
>>> POSITION AND BYTE OFFSET like you think you need, but
>>> really don't.
>>
>> As long as the ROW lookup maps to the file byte offset we
>> are good.
>
>
> Then get another data manager because SQLITE will not give
> you this.

And you know this how?

>> If the ROW lookup must read and maintain an index just to
>> be able to get to the rows in sequential order, this may
>> not be acceptable.
>
>
> Huh, you SELECT gives you the order you want - you declare
> it in the SELECT. But its not a FILE OFFSET thing.

Eventually it has to always be a file byte offset thing in
the underlying physical implementation because Unix/Linux
only knows about files in terms of a sequences of bytes. If
I tell the file manager that I want record number 12463 and
each record has exactly 12 bytes then it must seek (12463 -1
* 12) byte offset.

>> I knew this before you said it the first time. The
>> practical implications of this is that SQLite can't
>> handle nearly as many as simultaneous updates as other
>> row locking systems. Their docs said 500 transaction per
>> second.
>
>
> But you won't be doing 500 BULK transactions which is what
> they are saying.
>
> BEGIN TRANSACTIONS
> 1st SQL command ....
> 2st SQL command ....
> ....
> 500th SQL comand
> END TRANSACTIONS
>
> In SQLITE, this is SUPER FAST! But if you don't use the
> BEGIN/END, you will be as SLOW! very SLOW! Those
> BEGIN/END command put a 100000% exclusive lock on the
> database! No other thread or process can touch the
> database for any reason.
>
> Since your HTTP request are not bulk, each one is handled
> separated. At the rate you expect for the TPS, you will be
> MURDERED.

It will merely be reduced to the maximum number of
transactions that can be written to disk. Most of this time
would be disk drive seek time. My cheap drive has about 9 ms
seek time that's 111 seeks per second. So SQLite might be a
bottleneck unless a single transaction takes a single seek.

> Again, it is explained to you, yet you refused to listen.
> SQLITE is not what you want as a shared SQL database among
> multiple WEB and OCR accessors.

The transaction write time may still be a bottleneck for any
other SQL processor.

>>> Again, you can SELECT a row in your table using the
>>> proper query, but it isn' a direct FILE ACCESS with BYTE
>>> OFFSET idea and again, SQLITE3 will lock your database
>>> during updates so all your REQUEST SERVER will be locked
>>> in reading/writing any table while it is being updated
>>> by ANYONE.
>>
>> If it doesn't require a separate index to do this, then
>> the record number maps to a byte offset. Since record
>> numbers can be sequenced out-of-order, in at least this
>> instance it must have something telling it where to go,
>> probably an index. Hopefully it does not always make an
>> index just in case someone decides to insert records
>> out-of-sequence.
>
>
> You don't know what you are talking about. By definition,
> Record numbers are not out of sequence. And what are the
> indices you are

Record numbers can and indeed are capable of being placed
out-of-sequence in SQLite.

> referring to? What (fields) are they based on?

The record number.

>>> IDEAL: Many Threads to Many Threads
>>> WORST: Many Threads to 1 thread
>>
>> I guess that I am currently back to alternative two which
>> is many threads or a web server to four OCR processes via
>> four FIFOS on a single core machine, one process having
>> much more process priority than the others.
>
>
> You see, "Single Core?" you are using the wrong terms.
> Yet, joe and I have a good idea of what you trying to say.

PENTIUM 4
I have been using the right terms all along and you keep
insisting on ignoring them.

From: Peter Olcott on 12 Apr 2010 12:10

"Jerry Coffin" <jerryvcoffin(a)yahoo.com> wrote in message
news:MPG.262c6206211fe8d9989864(a)news.sunsite.dk...
> In article
> <abidnZAfALb5HF_WnZ2dnUVZ_jydnZ2d(a)giganews.com>,
> NoSpam(a)OCR4Screen.com says...
>
> [ ... ]
>
>> No. Joe was and continues to be wrong that a machine with
>> plenty of extra RAM ever needs to page out either a
>> process
>> or its data.
>
> It's not a question of whether it *needs* to -- it's a
> simple fact
> that with both Windows and Linux, it will *try* to whether
> it needs
> to or not. Windows it's called the working set trimmer --
> it's a task

(1) It makes no sense at all that when the system has an
extra 4 GM of RAM that it would need to page out processes
or their data to disk.
(2) In fact it does not do this paging to disk from two
empirical tests one Windows and Linux after twelve hours.

>> No, the latest analysis indicates that I am back up to
>> 100
>> because the webserver and the OCR execute in parallel.
>
> On a single core machine? There are a few pieces that can
> execute in
> parallel (the OCR can use the CPU while the network
> adapter is
> reading or writing data), but with only one core, very
> little really
> happens in parallel -- the whole point of multiple cores
> (or multiple
> processors) is to allow them to *really* do things in
> parallel,
> instead of just switching between processes quickly enough
> for it to
> *look* like they're running in parallel.

A PENTIUM 4 is the design constraint.
After further analysis I may still get 100 TPS because the
heavy loads are:
(1) Writing the transactions to disk
(2) Processing the OCR CPU intensive

> Keeping track of average vs. peak load is easy -- dealing
> with it
> (given a task as processor intensive as you've suggested)
> is not.

There are many simple ways to deal with it. Postpone all non
critical jobs to off peak periods. If this is not enough
then I will use the $10 per second to buy a better solution.

> Seriously, you'd be a lot better off with a "cloud"
> computing
> provider than one that gives you only a single core.
> Assuming your
> OCR really works, there's a pretty fair chance that the
> work pattern
> will be substantially different than you seem to
> imagine -- instead
> of a page or two at a time, you're (reasonably) likely to
> receive
> scans of an entire book at a time.

If these are sent as individual pages they are provided the
highest priority, if these are sent as one very huge PNG
file, they may await off peak processing.

>
> This is a scenario where something like Amazon's EC2 would
> work well
> -- you pay only for processor time you actually use, but
> if you get a
> big job, it can run your task on dozens or even thousands
> of
> processors so (for example) all the pages from a book are
> OCRed in
> parallel, the results put back together, and your customer
> gets his
> result back quickly. Then your system goes idle again, and
> you quit
> paying for *any* processor time until another task comes
> along.

I don't think that this model on my current service provider
provides enough storage.

>
> --
> Later,
> Jerry.

From: Joseph M. Newcomer on 12 Apr 2010 12:20

see below...
On Sun, 11 Apr 2010 18:36:52 -0500, "Peter Olcott" <NoSpam(a)OCR4Screen.com> wrote:

>
>"Hector Santos" <sant9442(a)nospam.gmail.com> wrote in message
>news:%23%23iEA1c2KHA.4332(a)TK2MSFTNGP02.phx.gbl...
>> Peter Olcott wrote:
>>
>>>> So how do your HTTP request get delegated? Four
>>>> separate IP address, sub domains?
>>>>
>>>> free.peter.com
>>>> 1penny.peter.com
>>>> nickle.peter.com
>>>> peso.peter.com
>>>>
>>>> What happens when they cross domain attempts occur?
>>>
>>> I would not be using the complex design that you are
>>> referring to. One domain, one web server, four OCR
>>> processes.
>>
>>
>> So you back to a Many to One Fifo queue. And what happens
>> with the HTTP responses?
>
>They have another FIFO queue in the opposite direction.
>
>>
>>>>
>>>> Right, two different memory locations can not possible
>>>> overflow each other.
>>>
>>> One writer of a single unsigned 32-bit integer at a fixed
>>> shared memory location and three readers.
>>> if (NumberOfHighPriorityJobs != 0)
>>> nanosleep(20);
>>
>>
>> By the web server needs to do a +1 and one of the OCR has
>> to do a -1.
>
>No not on this memory location. This memory location is a
>copy of another memory location that does these things to
>the original value. I don't want to slow down the read of
>this value by using a lock because it will be read in a very
>tight loop.
****
This is the kind of bizarre statement that proves you are clueless. For example, in the
worst case you would be using a spin lock, whose delays are measured in NANOseconds, but
in fact to add or substract 1 means you use InterlockedIncrement or InterlockedDecrement,
which are SINGLE INSTRUCTIONS!

But your fundamental IDEA that "synchronization is too expensive" shows a lack of ability
that is mind-boggling. Someone once told you that synchronization is expensive, and
consequently you now believe ANY synchronization is unacceptable (never mind that it is
essential for correctness...) Ther are a spectrum of synchronization techniques, from the
Interlocked operations (nanoseconds) and spin locks (typically, nanoseconds) through
mutexes (microseconds) to file locks (tens of milliseconds). By having a mindless
knee-jerk reaction to the term "synchronization" you show that you really don't understand
anything about design or implementation.
*****
>
>In other words the original memory location may be read or
>written to as often as once every 10 ms. It has to be
>locked to make sure that it is updated correctly. This copy
>of the memory location could easily be read a million times
>a second or more, don't want to slow this down with a lock.
*****
Updated how? For increment and decrement, you use the hardware (the Interlocked class of
operations). Given that these run in < 5ns, you can update them 200,000,000/sec using the
Interlocked primitives, so a mere million times a second is TWO ORDERS OF MAGNITUDE LESS
than the actual time required. That's a lot of headroom!
*****
>
>>
>> No conflicts, no reader/locker locks? No Interlock
>> increments and decrements?
>
>As long as a simultaneous read and write can not garble each
>other there is no need for any of these things, on this copy
>of the original memory location.
****
Sounds like another amateur "I know how to do synchronizaiton better than anyone else"
design; we see these all the time in the driver world, and they are somewhere between dead
wrong and ROTFLMAO wrong. Sadly, the people who think they know how to invent
synchronization usually have no clue. And invariably get it wrong. As soon as someone
explains something like you have, red flags start waving, and when I ask to see the code,
the code is so obviously incorrect thtat it is nearly impossible to understand how it
could be made right. Other than throwing it out. Remember, I've been doing concurrency
since either 1975 (in real operating systems) or 1968 (in exercises) and I've seen most of
the bad designs and most of the problems, and the bad designs are almost always preceded
by some p-baked rationale like you just gave. So I'm conditioned to reject such
explanations out-of-hand until I see actual synchronization code (so I can point and laugh
at it, which is what always happens...)
joe
****
>
>>
>>
>> --
>> HLS
>
Joseph M. Newcomer [MVP]
email: newcomer(a)flounder.com
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm

First | Prev | Next | Last
Pages: 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137
Prev: Improving Pete'r Application Performance
Next: Competitors for Pet'e OCR system