Can extra processing threads help in this case? [MFC]

Prev: Improving Pete'r Application Performance
Next: Competitors for Pet'e OCR system

From: Joseph M. Newcomer on 11 Apr 2010 14:38

See below...
On Sat, 10 Apr 2010 15:20:51 -0400, Hector Santos <sant9442(a)nospam.gmail.com> wrote:

>Peter Olcott wrote:
>
>
>> Also I will go ahead and use a transacted database for all
>> persistent storage. SQLite is supposed to be good for up to
>> 500 transactions per second, and I only need 100. I just
>> have to make sure that the reliability caveats that SQLite
>> mentions are covered. I am guessing that SQLite might even
>> be smart enough to cover the one efficiency aspect that I
>> was concerned about. It may be able to directly seek using
>> record number and record size to derive the file byte
>> offset. In any case I won't worry about this either.
>
>Take a step back into the balcony.
>
>1) 100 TPS means you a *total work load* is 10 ms per request
>
>You will not be able to get this done in 10 ms. If you don't believe
>this, than you are in for a big heart breaking surprise.
>
>2) SQLITE locks the data file hence the database during updates, so
>all new incoming request will be BLOCKED during updates from other
>already request in the process. That right there gives you more
>delays, contentions issues.
>
>3) SQLITE is a SQL database system. Even though behind the scenes it
>uses a ISAM/BTREE system, YOU don't have access to this. You might be
>able to write some hooks using their virtual access API, but I
>sincerely doubt working at the record and BYTE level is prohibited and
>YOU would BE STUPID to do so. You might was well use a pure ISAM file
>for this. Get another database management API for this. SQLITE3 is
>not what you want if you need file level access.
>
>Your problem is that you are stuck with a 100 TPS which is FAR too
>much for you.
>
>100 TPS is 6000 per minute, 36,000 per hour, 288,000 per 8 hour work
>day! You are OUT of your mind if you think your 10 year outdated
>low-desired OCR idea is that desirable today.
****
This is the problem with many business plans: overoptimistic revenue projections.

Frankly, I think the whole thing is being overengineered to handle the anticipated flood
of usage that will never materialize, or will not be sustained independent of the response
time. We did not even worry about performance of our server manager (which used
transacted databses) until a customer came to use with a specific request they made as a
condition of sale, specifically, being able to handle 400 tpm. Once I demonstrated that
when I saturated my then-10-base-T network and could handle 1300tpm, we had a sale; also,
the first need to actually have measured performance (in the past, the transcted database
overhead was not even noticeable, and it was fast enough for practical server farms).
Perhaps I should redo the experiment now that the entire office network backbone is 1GB
copper. But it doesn't matter.

But note also that the Magic Morphing Requirements that forbade any disk access at all
have been amended to require transacted file access without apparently any chagrin that
this new requirement is in direct opposition to the old requirement. And then, to
guarantee worst possible respons time, the multiqueue-multiserver architecture is proposed
as the only possible solution, so apparently throughput and response time are actually no
longer issues to be discussed, so don't worry about the numbers. I did some third-grade
arithmetic to show why a single-queue multisever approach makes more sense, since
apparently I have to hold his hand at every step of the "reasoning" process instead of
pointing out problems and letting him use simple arithmetic to convince himself that the
ideas suck. Next thing, I'll be explaing how to do long division!
****
>
>The point is not really that you can reach this, but that it means you
>need a 10 ms total turn around time per transaction! And that is 10
>ms you say is the OCR processor time only - you totally ignored the
>time required for everything else - INCLUDING the SQL engine - pick
>one, there is NO WAY you can do that in less than 1 quantum. The
>hardware interrupts alone will break your 10 ms theory.
>
>So you need to get REAL and stop this fantasy ideal design of 10 ms
>throughputs.
****
This requires that (a) disk access take 0ms "wall clock" time (for example, using disks
that have 0ms rotational delay and 0ms seek time) (b) database interface (passing the SQL
query in, parsing it, executing it, returning the record set) take 0ms, (c) TCP/IP connect
and HTTP protocol take 0ms (d) billing transaction time take 0 ms (e) context swap is
instantaneous (somewhere between 0us to 0ns) (f) the scheduler consumes 0% of the CPU time
(g) the kernel entry time for any API call is 0ns (f) there are no other tasks running on
the machine, including the file system, system maintenance tasks, the SQL server, etc..
(h) there are no interupts from any device that might require attention, such as a
keyboard, network card, disk, etc.

Other than those minor details, I'm sure it is achievable.

Key here: does it really matter? The horrible performance of the multiqueue/multiserver
(MQMS) architecture may not even matter, because the percentage utilization will be so low
that it cannot possible make the response worse (I was just yanking his chain about the
need for a single-queue multiserver (SQMS) architecture because that only makes sense if
performance, most particularly, response time, matters. If worst-possible response time
does not deviate from best-possible response time because the data load is close to zero,
then pretty much anything will work; but he seems really concerned with response time, so
I was just having fun with him pointing that his MQMS design guarantees worst possible
response times, uniformly, when there is a real load!)
joe
Joseph M. Newcomer [MVP]
email: newcomer(a)flounder.com
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm

From: Jerry Coffin on 11 Apr 2010 16:10

In article <0KSdnUkKA-hkXVzWnZ2dnUVZ_sWdnZ2d(a)giganews.com>,
NoSpam(a)OCR4Screen.com says...

[ ... ]

> Since I only need to be able to process 100 transactions per
> second the speed of these overhead sort of things should not
> be too critical. I am guessing that the sum total of these
> overhead sort of things will only take about 10 ms per
> transaction most of this being drive head seek time.

As I said before, you really need to build *something* that works to
at least some degree and start doing some measuring. Right now, your
guesses seem to lack any basis beyond what result you'd like to get.

> The two most unresolved issues with my design:
> (1) The best means to provide one higher level process with
> much higher and possibly absolute priority over three other
> types of jobs.
> The two current proposals are:
> (a) Assign a much higher process priority to the high
> priority jobs. The jobs are executed from four different
> FIFO queues.
> (b) Have each of three lower priority jobs explicitly put
> themselves to sleep as soon as a high priority job becomes
> available. The lower priority jobs could be notified by a
> signal.

I don't see where anybody who has a clue has proposed either of these
-- both of them are fairly poor. The idea of four separate queues is
pointless and stupid. As both Joe and I have pointed out, what you
want is a priority queue. With four separate queues, race conditions
are almost inevitable -- for example, you check the top-priority
queue for a job first and find it empty, so you check the second
priority and then the third, and (just for the sake of argument,
we'll assume you find a job there and start it -- but didn't notice
that while you were doing the other checking, a job was inserted into
the top priority queue, so you end up doing a lower-priority job
ahead of a higher priority one.

> (2) The best way(s) to provide inter process communication
> between a web server that inherently has one thread per HTTP
> connection and up to four OCR processes (with a single
> thread each) one for each level of processing priority.

This one is simple: switch to a web server that isn't completely
broken. Seriously, something that's built around a thread-per-
connection model simply has no chance whatsoever of working worth
anything -- ever. Just for one obvious point, this model makes a DoS
attack really trivial -- somebody can just create several thousand
connections, and your system will spend its time switching between
the thousands of connection threads, and (virtually) stop any real
work from getting done.

--
Later,
Jerry.

From: Hector Santos on 11 Apr 2010 17:02

Peter Olcott wrote:

> "Hector Santos" <sant9442(a)nospam.gmail.com> wrote in message
> news:%237Nwl2Y2KHA.6048(a)TK2MSFTNGP06.phx.gbl...
>> Peter Olcott wrote:
>>
>>>> No it doesn't. Its reality. You're the one with a whole
>>>> set of design assumption based on ignorance. I speak
>>>> with engineering experience.
>>> Now you are being asinine, If every little thing takes 1
>>> ms, then it would be at least several days before a
>>> machine was finished rebooting. I guess there is no sense
>>> in paying attention to you any more.
>>
>> Because YOU can't handle the TRUTH and I keep proving it
>> and even and every step in this polluted thread. Joe
>> tells you the truth and you can't handle it. I tell you
>> the truth and you can't handle it.
>>
>> You have a MANY THREAD to 1 FIFO QUEUE
>>
>> - IGNORANCE EQUAL PRESSURE
>>
>> You want to use SQLITE in a WAY it was not designed to
>> work with a complete unrealistic freedom for read/write
>> I/O that even DESIGNED in SQLITE:
>>
>> - IGNORANCE EQUAL PRESSURE
>>
>> You think that because LINUX allows for 1 ms clock ticks
>> that you can get 1 ms Unterrupted QUANTUMS. It doesn't
>> mean you can get 1 ms of real time - but 1 ms of TOTAL
>> time - that is not real time.

>>

>> - IGNORANCE EQUAL PRESSURE
>>
>> You think you have FOUR smooth named piping with the ever
>> design change you have.
>>
>> - IGNORANCE EQUAL PRESSURE
>>
>> You think you have control of PAGING and MEMORY VIRTUATION
>> when you go back and forth on minimizing data lost and
>> maximizing crash recovery:
>>
>> - IGNORANCE EQUAL PRESSURE
>>
>> You have no idea of whats going on, you need to buy 10,000
>> pages worth of books that you can't follow anyway, and still

>> have a 25 year old OS book you forgot to read the 2nd half
>> but want to finish it now thinking it still applies:

>>
>> - IGNORANCE EQUAL PRESSURE
>>
>> You think you have control of PAGING and MEMORY VIRTUATION
>> when you go back and forth on minimizing data lost and
>> maximizing crash recovery:
>>
>> - IGNORANCE EQUAL PRESSURE
>>
>> And whats funny about all this, you won't be able to code
>> for THREADS even thing you go back and forth on whether
>> you will or not. And you can't code for code memory maps.
>> So its all a PIPED DREAM.
>>
>> - IGNORANCE EQUAL PRESSURE

>
> Like I told Joe it is beginning to look like reading the
> 10,000 pages of books that I recently bought is going to be
> much more efficient and effective in proceeding from here.
> One of these books provides the details of internals of the
> Linux kernel.

Nor 1, 2, 4, 10, 20, 30,000 pages is going to help you.

- IGNORANCE EQUAL PRESSURE

--
HLS

From: Peter Olcott on 11 Apr 2010 17:07

"Jerry Coffin" <jerryvcoffin(a)yahoo.com> wrote in message
news:MPG.262bdaa1b1d36c65989861(a)news.sunsite.dk...
> In article
> <0KSdnUkKA-hkXVzWnZ2dnUVZ_sWdnZ2d(a)giganews.com>,
> NoSpam(a)OCR4Screen.com says...
>
> [ ... ]
>
>> Since I only need to be able to process 100 transactions
>> per
>> second the speed of these overhead sort of things should
>> not
>> be too critical. I am guessing that the sum total of
>> these
>> overhead sort of things will only take about 10 ms per
>> transaction most of this being drive head seek time.
>
> As I said before, you really need to build *something*
> that works to
> at least some degree and start doing some measuring. Right
> now, your
> guesses seem to lack any basis beyond what result you'd
> like to get.
>
>> The two most unresolved issues with my design:
>> (1) The best means to provide one higher level process
>> with
>> much higher and possibly absolute priority over three
>> other
>> types of jobs.
>> The two current proposals are:
>> (a) Assign a much higher process priority to the high
>> priority jobs. The jobs are executed from four different
>> FIFO queues.
>> (b) Have each of three lower priority jobs explicitly put
>> themselves to sleep as soon as a high priority job
>> becomes
>> available. The lower priority jobs could be notified by a
>> signal.
>
> I don't see where anybody who has a clue has proposed
> either of these
> -- both of them are fairly poor. The idea of four separate
> queues is
> pointless and stupid. As both Joe and I have pointed out,
> what you
> want is a priority queue. With four separate queues, race
> conditions
> are almost inevitable -- for example, you check the
> top-priority
> queue for a job first and find it empty, so you check the
> second
> priority and then the third, and (just for the sake of
> argument,
> we'll assume you find a job there and start it -- but
> didn't notice
> that while you were doing the other checking, a job was
> inserted into
> the top priority queue, so you end up doing a
> lower-priority job
> ahead of a higher priority one.

One of four different processes only checks its own single
queue.

I think that it only seems stupid because you did not
understand what I am saying. That may be my fault I may not
have explained it well enough.

Alternative (a) There are four processes with four queues
one for each process. These processes only care about
executing the jobs from their own queue. They don't care
about the jobs in any other queue. The high priority process
is given a relative process priority that equates to 80% of
the CPU time of these four processes. The remaining three
processes get about 7% each. This might degrade the
performance of the high priority jobs more than the next
alternative.

Alternative (b) each of the low priority jobs checks to see
if a high priority job is in the queue or is notified by a
signal that a high priority job is waiting. If a high
priority job is waiting then each of these low priority jobs
immediately sleeps for a fixed duration. As soon as they
wake up these jobs check to see if they should go back to
sleep or wake up.

These processes could even simply poll a shared memory
location that contains the number of high priority jobs
currently in the queue. From what the hardware guys have
told me memory writes and reads can not possibly garble each
other. Because of this, the shared memory location would not
even need to be locked. One writer and three readers.

Neither of these designs has any of the behavior that you
mentioned.

>
>> (2) The best way(s) to provide inter process
>> communication
>> between a web server that inherently has one thread per
>> HTTP
>> connection and up to four OCR processes (with a single
>> thread each) one for each level of processing priority.
>
> This one is simple: switch to a web server that isn't
> completely
> broken. Seriously, something that's built around a
> thread-per-
> connection model simply has no chance whatsoever of
> working worth
> anything -- ever. Just for one obvious point, this model
> makes a DoS
> attack really trivial -- somebody can just create several
> thousand
> connections, and your system will spend its time switching
> between
> the thousands of connection threads, and (virtually) stop
> any real
> work from getting done.

I already figured out a way around that. Everyone must have
their own user account that must be created by a live human.
All users are always authenticated against this user
account. I don't see any loopholes in this on single form of
protection.

>
> --
> Later,
> Jerry.

From: Jerry Coffin on 11 Apr 2010 17:36

In article <sc44s5ljun4c26c0i8ptofg4gc8d0euonj(a)4ax.com>,
newcomer(a)flounder.com says...

[ ... ]

> Key here: does it really matter? The horrible performance of the
> multiqueue/multiserver (MQMS) architecture may not even matter,
> because the percentage utilization will be so low that it cannot
> possible make the response worse (I was just yanking his chain'
> about the need for a single-queue multiserver (SQMS) architecture
> because that only makes sense if performance, most particularly,
> response time, matters. If worst-possible response time does not
> deviate from best-possible response time because the data load is
> close to zero, then pretty much anything will work; but he seems
> really concerned with response time, so I was just having fun with
> him pointing that his MQMS design guarantees worst possible
> response times, uniformly, when there is a real load!)

While all true, the real reason he should use a single priority queue
is that there's existing code he can start from, and getting it to
work right isn't particularly difficult. The improved response time
is mostly just a bonus for him.

--
Later,
Jerry.

First | Prev | Next | Last
Pages: 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131
Prev: Improving Pete'r Application Performance
Next: Competitors for Pet'e OCR system