Can extra processing threads help in this case? [MFC]

Prev: Improving Pete'r Application Performance
Next: Competitors for Pet'e OCR system

From: Peter Olcott on 9 Apr 2010 12:08

"Joseph M. Newcomer" <newcomer(a)flounder.com> wrote in
message news:00jur51advnvpktje1f529vocjgan09u67(a)4ax.com...
> See below....
> On Thu, 8 Apr 2010 20:24:48 -0500, "Peter Olcott"
> <NoSpam(a)OCR4Screen.com> wrote:
>
>>OK what is the simplest possible way to make sure that I
>>never ever lose the customer's ten bucks, even if the
>>server
>>crashes before my next backup and this data is lost in the
>>crash?
> ***
> I would use a transacted database system, and record
> several states of the job submitted
> "In the queue", "being processed". "processing completed",
> "results succesfully sent to
> customer", "billing completed". Perhaps fewer states are
> required. Then, upon recovery,
> I would examing these states and determine what recovery
> action was required. Note that
> what I'm talking about here is the specification document;
> the requirements document
> merelly says "Shall not bill customer for undelivered
> results" and stops right there.
>
> Then, in the specification document, the state machine for
> billing would be laid out, and
> a suggestion of a transacted databased to maintain the
> state, and so on. Only when you
> got to the implementation would you worry about a
> transacted database as the
> implementation strategy, or the details of the state
> management and error recovery.
> joe

That is all well and good, and required, but, not exactly
the scenario that I was envisioning. What you proposed would
probably provide good recovery in the event of a power
failure.

The scenario that I was envisioning is all that great
transaction stuff that you just said, and then the server
crashes overwriting the great transacted database data with
garbage data. It does this in-between periodic backups of
this data. How do I protect against this scenario?

> ****
>>
>>> ****
>>>>
>>>>
>>> Joseph M. Newcomer [MVP]
>>> email: newcomer(a)flounder.com
>>> Web: http://www.flounder.com
>>> MVP Tips: http://www.flounder.com/mvp_tips.htm
>>
> Joseph M. Newcomer [MVP]
> email: newcomer(a)flounder.com
> Web: http://www.flounder.com
> MVP Tips: http://www.flounder.com/mvp_tips.htm

From: Peter Olcott on 9 Apr 2010 12:16

"Joseph M. Newcomer" <newcomer(a)flounder.com> wrote in
message news:4ijur5dsfdt3h4i509jq1pdrtcnl3ado2t(a)4ax.com...
> See below...
> On Thu, 08 Apr 2010 22:21:31 -0400, Hector Santos
> <sant9442(a)nospam.gmail.com> wrote:
>
>>Joseph M. Newcomer wrote:
>>
>>> The same way ANY CGI-based process notifies its parent
>>> that it has completed. I believe
>>> this is by closing stdout.
>>
>>
>>Ultimately, the process ending (which must end) is the
>>deciding factor
>>which the parent is waiting on (with a idle timeout in
>>redirected data).
> ****
> As I indicated, it is whatever the normal criterion is.
> My recollection of CGI was the
> closing of stdout meant that there could be no future
> output, and that was the determining
> factor, but I haven't look at this in a decade. The
> problem with the OP is that since he
> doesn't actually look into the details, he hypothesizes
> how the details MUST work by

I will be looking into all these details, I bought a bunch
of books. I want to narrow down exactly which details that I
need to look into.

> falling back on some form of mysticism, and if he sees a
> blank wall, makes some guesses
> based on inadequate information and if the guesses don't
> work, figures there must not be
> such a mechanism. Hence the overconcern with a
> non-problem, notifying a Web server that
> the process has completed its action. The VERY FIRST
> instance of a Web server launching a
> script had such a mechanism, but since he doesn't see it,
> it must be an unsolved problem!

I want there to be good enough communication between the
multiple threads of the web server and the multiple
processes completing the jobs and the client's web browser
that I can know with almost complete certainty that the
client actually received their data product. This is the
point when I deduct the cost of the transaction. I want
errors in this sequence to be as close to impossible as can
be devised.

>
> There must be a Platonic Ideal mechanism; he guesses about
> its existence. From flickering
> shadows on the wall. The rest of us read the
> documentation or the code.
>
> I get so tired of this...
> joe
> ****
> Joseph M. Newcomer [MVP]
> email: newcomer(a)flounder.com
> Web: http://www.flounder.com
> MVP Tips: http://www.flounder.com/mvp_tips.htm

From: Joseph M. Newcomer on 9 Apr 2010 12:48

See below...
On Thu, 8 Apr 2010 19:58:33 -0500, "Peter Olcott" <NoSpam(a)OCR4Screen.com> wrote:

>
>"Hector Santos" <sant9442(a)nospam.gmail.com> wrote in message
>news:%23pvVPm31KHA.4832(a)TK2MSFTNGP04.phx.gbl...
>> Peter Olcott wrote:
>>
>>>
>>> Was your trouble with Windows named pipes? (I won't be
>>> using those).
>>> What IPC did you end up choosing? (I like named pipes
>>> because their buffer can grow to any length).
>>
>> All sorts of methods, beginning with a simple straight
>> shared file.
>
>I am beginning with a simple shared file. The only purpose
>of the other IPC is to inform the process of the event that
>the file has been updated at file offset X, without the need
>for the process to poll for updates. File offset X will
>directly pertain to a specific process queue.
****
Ohh, so you NO LONGER CARE about either power failure or operating system crash?

A "simple shared file" raises so many red flags that I cannot begin to say that "this is
going to be a complete disaster if there is any failure of the app or the operating
system"

But hey, if it gives the illlusion of working under ideal conditions, it MUST be robust
and reliable under all known failure modes, right?
****
>
>>
>> But you can use:
>>
>> TCP/IP <<--- What we use for ICP
>> UPD <<--- What we use for ICP
>> HTTP <<--- What we use for ICP
>> RPC <<--- What we use for ICP
>> DCOM
>
>But as I understand it these will not automatically grow a
>queue to any arbitrary length.
****
Named pipes under Windows do NOT "grow the buffer", or perhaps you have failed to read the
documentation. RTFM is always a useful exercise. Linux does not actually state this is
the behavior. So I'm not sure what you are basing this on. And it is not clear why it
matters.
joe
****
>
>>
>> and others networking protocols:
>>
>> http://msdn.microsoft.com/en-us/library/ee663291(v=VS.85).aspx
>>
>> Here is what MS says about Named Pipes vs TCP/IP
>>
>> http://msdn.microsoft.com/en-us/library/aa178138(SQL.80).aspx
>>
>> And a 2003 Dr. Dobbs article on how to handle named pipes
>> correctly, even though it seems so "simple":
>>
>> http://www.drdobbs.com/architecture-and-design/184416624;jsessionid=BVL3ABP0UVUSJQE1GHPSKH4ATMY32JVN
>>
>
>OK so the Unix/Linux people say that it is well know that MS
>named pipes are borked, yet, they have never had any problem
>with Unix/Linux name pipes.
****
Well, if you are comparing apples and chocolate cupcakes, they are pretty much the same.
Any comparison of linx "named pipes" to Windows "named pipes" has to take into
consideration that they are TOTALLY DIFFERENT mechanisms. Shall I tell my set of Unix
secuirty jokes, or just say "Unix security", which is a joke all by itself? So I tend to
not find ANY comparisons valid. They are two completely different systems, which look
alike only if you stand back a few hundred feet and squint. (Windows has a file system;
linux has a file system; MS-DOS had a file system. They are identical only insofar as
they allow a program to name sequences of bytes stored on a disk. But I've NEVER lost a
file in a Windows crash, and it was common to lose a file, and EVERY TRACE of the file, on
a Unix crash, to the point where I always kept a separate directory of files in the hopes
it would survive the crash. I lost far too many hours due to the unreliability of the
Unix "file system" (if one can dignity anything so unreliable with that name). But since
you know that the file system is utterly reliable, good luck.
joe
****
>
>> Again, remember your bottleneck which you believe WILL NOT
>> EXIST with a swag 10ms unrealistic calculation, once you
>> get to 11ms or more, you have a build up with your 1
>> thread FIFO pipe design - regardless of what method you
>> use.
>
>Its looking more like four processes with one having much
>more priority than the others each reading from one of four
>FIFO queues.
>(1) Paying customer small job (one page of data) This is
>the 10 ms job
>(2) Paying customer large job (more than one page of data)
>(3) Building a new recognizer
>(4) Free trial customer
****
As I pointed out earlier, mucking around with thread priorities is very, very dangerous
and should NOT be used as a method to handle load balancing. I would use a different
approach, such as a single queue kept in sorted order, and because the free trial jobs are
small (rejecting any larger jobs) there should not be a problem with priority inversion.

It is a common misconception that manipulating thread priorities is the way to control
priority, and largely this is a myth. It seems "intuitively obvious" but anything which
intuitively obvious probably is neither obvious nor correct.
joe
****
>
>
>>
>> --
>> HLS
>
Joseph M. Newcomer [MVP]
email: newcomer(a)flounder.com
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm

From: Joseph M. Newcomer on 9 Apr 2010 13:06

See below...
On Thu, 08 Apr 2010 22:16:14 -0400, Hector Santos <sant9442(a)nospam.gmail.com> wrote:

>Peter Olcott wrote:
>
>> "Hector Santos" <sant9442(a)nospam.gmail.com> wrote in message
>> news:ec0DFN41KHA.224(a)TK2MSFTNGP06.phx.gbl...
>>> Peter Olcott wrote:
>>>
>>>>> But you can use:
>>>>>
>>>>> TCP/IP <<--- What we use for ICP
>>>>> UPD <<--- What we use for ICP
>>>>> HTTP <<--- What we use for ICP
>>>>> RPC <<--- What we use for ICP
>>>>> DCOM
>>>> But as I understand it these will not automatically grow
>>>> a queue to any arbitrary length.
>>>
>>> Your queue is as good and fast as you request it, pipe or
>>> otherwise.
>>
>> Some of the above have fixed queue lengths don't they?
>
>
>No, because the question doesn't apply and I doubt you understand it,
>because you have a very primitive understanding of queuing concepts.
>No matter what is stated, you don't seem to go beyond a basic layman
>abstract thinking - FIFO. And your idea of how this "simplicity" is
>applied is flawed because of the lack of basic understanding.
***
Note that I agree absolutely with this! The concept that a fixed-sized queue matters at
all shows a total cluelessness.
***
>
>Again, if you are working for windows, then you need to understand all
>the networking protocols to make any judgment.
>
>>> You didn't go a google. did you? Figures you would
>>> ignore it.
>
> >
>
>> I did and one of the links says something like there aren't
>> any problems with named pipes.
>
>
>There were plenty of links where people had issues - even for LINUS
****
If you ignore the issue of what happens if either side of the pipe fail, or the operating
system crashes. But hey, reliability is not NEARLY as important as having buffer lengths
that grow (if this is actually true of linux named pipes). This is part of the Magic
Morphing Requirements, where "reliability" got replaced with "pipes that don't have fixed
buffer sizes".
****
>
>For what you want to use it for, my engineering sense based on
>experience tells me you will have problems, especially YOU for this
>flawed design of yours. Now you have 4 Named Pipes that you have to
>manage. Is that under 4 threads? But you are not designing for
>threads. One message yes, another no. Is the 1 OCR process going to
>handle all four pipes? Or 4 OCR processes? Does each OCR have their
>own Web Server? Did you work out how the listening servers will bind
>the IPs? Are you using virtual domains? sub-domains? Multi-home IP
>machine?
****
The implmenetation proposals have so many holes in it that they would be a puttter's
dream, or possibly a product of Switzerland. This design guarantees maxium conflict for
resources and maximum unused resources, but what does maximum resource utilization and
minimum respons time have to do with the design? It guarantees priority inversion,
guarantees maximum response time to incoming requests, and any simple queueing simulation
would demonstrate how absolutely insane this design is. But realtime design theory and
queueing theory clearly have no place in this design. Actually, a simple closed-form
analytic model of queueing would show that this guarantees a maximum expected response
time, but even where there ARE closed-form analytic solutions to problems, they must not
interfere with the "let's toss out some buzzwords and connect them together with some
flawed ideas and call the result satisfactory" approach.
****
>
>You really don't know what you are doing, right?
****
Hasn't this been clear for a couple weeks?
****
>
>>> But even then, I can understand why the success. Unix is
>>> not traditionally known to work with threads, and the
>>> piping has permanent storage - your DISK - making it easy
>>> to allow for easy recovery. Simple.
>>
>> The data is not supposed to ever actually hit the disk. I
>> started a whole thread on just that one point.
>
>
>But linux pipes are part of the disk, or did you missed that part,
>forgot or wish not to believe it?
****
But the pipes can grow to any size, and thus memory can accomodate a pipe of unlimited
size.

Here's a little exercise:

Suppose the expected processing time is 10ms.

Suppose the expected interarrival time of requests into the queue is 10ms.

What is the expected queue size?

[Hint: if you answer other than "infinite" you are wrong].

But since we KNOW that linux pipes do not actually go to disk, it is clear that the
machine is capable of expanding kernel memory infinitely to accomodate the incoming queue
size. And the scheduler guarantees anti-starvation, so there can never be a situation
where any queue needs to grow beyond size 1. (Hint: if queue sizes no longer infinite,
and in fact, cannot exceed 1, why does the ability to grow a queue infinitely matter? And
if the queue sizes can grow beyond 1, why is the upper bound not infinity?)

I suppose passing a Ph.D. qualifier in simulation and queueing theory has hopelessly
biased me about what to expect.
joe

Joseph M. Newcomer [MVP]
email: newcomer(a)flounder.com
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm

From: Joseph M. Newcomer on 9 Apr 2010 13:13

See below...
On Thu, 8 Apr 2010 21:51:37 -0500, "Peter Olcott" <NoSpam(a)OCR4Screen.com> wrote:

>
>"Hector Santos" <sant9442(a)nospam.gmail.com> wrote in message
>news:ealujq41KHA.260(a)TK2MSFTNGP05.phx.gbl...
>> Peter Olcott wrote:
>> For what you want to use it for, my engineering sense
>> based on experience tells me you will have problems,
>> especially YOU for this flawed design of yours. Now you
>> have 4 Named Pipes that you have to manage. Is that under
>> 4 threads? But you are not designing for threads. One
>> message yes, another no. Is the 1 OCR process going to
>> handle all four pipes? Or 4 OCR processes? Does each
>> OCR have their own Web Server? Did you work out how the
>> listening servers will bind the IPs? Are you using
>> virtual domains? sub-domains? Multi-home IP machine?
>
>(1) One web server that inherently has by its own design one
>thread per HTTP request
>(2) Four named pipes corresponding to four OCR processes,
>one of these has much higher process priority than the rest.
***
In other words, a design which GUARATEES maximum response time and fails utterly to
provide ANY form of concurrency on important requests! WOW! Let's see if it is possible
to create an even WORSE design (because it is so much easier to create a better design
that is no fun)
****
>(3) The web server threads place items in each of the FIFO
>queues.
***
Silly. A priority-ordered queue with anti-priority-inversion policies makes a LOT more
sense!
****
>(4) The OCR processes work on one job at a time from each of
>the four queues.
****
Let's see, can we make this worse? I don't see how, given how bad a design this is, but
why take the challenge away? Or, make a better design: multiple servers, a single
priority-ordered queue. No, that is too simple and too obvious! All it does is minimize
response time and maximize concurrency, and what possible value could that have in a
design?
****
>
>>
>> You really don't know what you are doing, right?
>>
>>>> But even then, I can understand why the success. Unix is
>>>> not traditionally known to work with threads, and the
>>>> piping has permanent storage - your DISK - making it
>>>> easy to allow for easy recovery. Simple.
>>>
>>> The data is not supposed to ever actually hit the disk. I
>>> started a whole thread on just that one point.
>>
>>
>> But linux pipes are part of the disk, or did you missed
>> that part, forgot or wish not to believe it?
>
>Just the pipe name itself it part of the disk, nothing else
>hits the disk. There are many messages about this on the
>Unix/Linux groups, I stated a whole thread on this:
****
And the pipe grows until what point? It runs out of memory? Ohh, this is a new
interpretation of "unlimited pipe growth" of which I have been previously unaware!

ALL such implementations have limits. You choose to ignore this fact.

And pipes are not robust under crash scenarios of any form. You choose to ignore this
fact, also.
****
>
>Do named pipes have disk I/O ??
****
Believing that this matters is what led to this flawed decision.
joe
****
>
>>
>> --
>> HLS
>
Joseph M. Newcomer [MVP]
email: newcomer(a)flounder.com
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm

First | Prev | Next | Last
Pages: 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122
Prev: Improving Pete'r Application Performance
Next: Competitors for Pet'e OCR system