Can extra processing threads help in this case? [MFC]

Prev: Improving Pete'r Application Performance
Next: Competitors for Pet'e OCR system

From: Joseph M. Newcomer on 8 Apr 2010 21:45

See below...
On Thu, 8 Apr 2010 18:19:31 -0500, "Peter Olcott" <NoSpam(a)OCR4Screen.com> wrote:

>
>"Hector Santos" <sant9442(a)nospam.gmail.com> wrote in message
>news:uBpFdU21KHA.5828(a)TK2MSFTNGP02.phx.gbl...
>> Answer my questions and I'll answer yours (which was done
>> a few times already).
>>
>> If you are going to design for Linux, then;
>>
>> Why are you trolling in a WINDOWS development forum?
>>
>> Why are you here asking/stating design methods that
>> defies logic
>> under Windows when YOU think this logic is sound under
>> UNIX?
>>
>> If you are going to design for Windows, then you better
>> learn how to follow WINDOWS technology and deal with its
>> OS and CPU design guidelines.
>>
>
>I am concurrently carrying on conversations in multiple
>groups. I am talking here because I am getting useful advice
>here. I took Joe's advice about the issues related to file
>I/O buffers and specifically got the answers that I needed
>about these. The most difficult issue with buffers is the
>disk drive's own onboard cache, and it looks like the most
>reliable solution for this issue is to simply turn off write
>buffering.
>
>Was your trouble with Windows named pipes? (I won't be using
>those).
>What IPC did you end up choosing? (I like named pipes
>because their buffer can grow to any length).
****
Actually, nowhere does it say this. In fact, the linux documentation seems to suggest
that a fifo can be blocking on write. And if there is a failure of any process connected
to the pipe fails, what happens to the data in the pipe? (Hint: it is lost) So it is not
a particularly "reliable" mechanism unless there is the equivalent of transactions
confirming receipt of a block of information.

Note that in named pipes in Windows, the buffer sizes are fixed, and an attempt to
WriteFile may succeed but write fewer bytes than the buffer count (which is why, with
WriteFile to a named pipe, it is essential to test for the number of bytes actually
written!)
joe
****
>
>> --
>> HLS
>>
>> Peter Olcott wrote:
>>
>>> "Hector Santos" <sant9442(a)nospam.gmail.com> wrote in
>>> message news:uhFwgp11KHA.140(a)TK2MSFTNGP05.phx.gbl...
>>>> Peter Olcott wrote:
>>>>
>>>>
>>>>> I think that many of these issues may go away by using
>>>>> two half-duplex named pipes one in each direction. No
>>>>> one has yet pointed out any issues with Unix/Linux
>>>>> named pipes. I like named pipes because the implement
>>>>> the FIFO intuitively with minimal learning curve.
>>>>
>>>> I can only hope that one day you will actually begin
>>>> your work, so you can see how great it will work.
>>>>
>>>> Google: named pipe problems
>>>>
>>>> http://www.google.com/search?q=named+pipe+problems&start=0&ie=utf-8&oe=utf-8&client=firefox-a&rls=org.mozilla:en-US:official
>>>>
>>>> When our multi-million dollar server was first under
>>>> design back in the mid 90s, name pipes was going to be
>>>> used. We saw almost immediately how unreliable it was
>>>> a for a high end, high throughput, high multi-thread
>>>> WAN/LAN network server.
>>>
>>> First of all are you talking about named pipes in Windows
>>> or Unix/Linux?
>>>
>>>> Not saying you can make it work, but you will spend more
>>>> time on getting that right than anythingelse and for
>>>> what? A fifo? When there are so many other more
>>>> reliable methods and simpler methods?
>>>>
>>>
>>> What simpler more reliable methods are you referring to
>>> that can provide event based notification between
>>> processes?
>>>
>>>> But hey, it will probably work for you because I
>>>> sincerely doubt you will have the work load you predict
>>>> you will have. You are basing this on a 10ms
>>>> throughput and you won't have that. You can't. Even if
>>>> your OCR is isolated to pure 10 ms computation, which
>>>> LINUS will give you, its surrounding world is YOUR enemy
>>>> that you can't avoid, like your fifo receiver, like file
>>>> I/O logging, your PIPE is a FILE on UNIX as well, which
>>>> has hardware interrupts, like generating results, etc.
>>>>
>>>> Live and learn. Which leads to the questions, if you are
>>>> going to design for Linux, then;
>>>>
>>>> Why are you trolling in a WINDOWS development forum?
>>>>
>>>> Why are you here asking/stating design methods that
>>>> defies logic
>>>> under Windows when YOU think this logic is sound
>>>> under UNIX?
>>>>
>>>> If you are going to design for Windows, then you better
>>>> learn how to follow WINDOWS technology and deal with its
>>>> OS and CPU design guidelines.
>>>>
>>>> --
>>>> HLS
>>>
Joseph M. Newcomer [MVP]
email: newcomer(a)flounder.com
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm

From: Peter Olcott on 8 Apr 2010 22:10

"Joseph M. Newcomer" <newcomer(a)flounder.com> wrote in
message news:561tr5dgf4lptedsnbavf3frg7rk20r0dj(a)4ax.com...
> See below...
> On Thu, 8 Apr 2010 18:19:31 -0500, "Peter Olcott"
> <NoSpam(a)OCR4Screen.com> wrote:
>
>>
>>Was your trouble with Windows named pipes? (I won't be
>>using
>>those).
>>What IPC did you end up choosing? (I like named pipes
>>because their buffer can grow to any length).
> ****
> Actually, nowhere does it say this. In fact, the linux
> documentation seems to suggest
> that a fifo can be blocking on write. And if there is a
> failure of any process connected
> to the pipe fails, what happens to the data in the pipe?
> (Hint: it is lost) So it is not
> a particularly "reliable" mechanism unless there is the
> equivalent of transactions
> confirming receipt of a block of information.

Yes that is it. I don't even acknowledge receipt of the
request until it is committed to the transaction log.
Anything at all that prevents this write also prevents the
acknowledgement of receipt. So basically I never say "I
heard you" until the point where nothing can prevent
completing the transaction.

Then in the event that I do not receive the HTTP
acknowledgement of final receipt of the output data, I roll
the whole transaction back. If the reason for the failure is
anything at all on my end I roll the charges back, but, let
the customer keep the output data for free. If the
connection was lost, then this data is waiting for them the
next time they log in.

One of the Linux/Unix people is recommending MySQL InnoDB
storage engine because it has very good crash recovery.

>
> Note that in named pipes in Windows, the buffer sizes are
> fixed, and an attempt to
> WriteFile may succeed but write fewer bytes than the
> buffer count (which is why, with
> WriteFile to a named pipe, it is essential to test for the
> number of bytes actually
> written!)
> joe
> ****

From: Joseph M. Newcomer on 8 Apr 2010 22:13

See below...
On Wed, 7 Apr 2010 22:02:13 -0600, Jerry Coffin <jerryvcoffin(a)yahoo.com> wrote:

>In article <L9qdndbIjeoeOCHWnZ2dnUVZ_tKdnZ2d(a)giganews.com>,
>NoSpam(a)OCR4Screen.com says...
>
>[ ... ]
>
>> The web server is designed with one thread per HTTP request.
>
>This falls somewhere in the "terrible" to "oh no!" range. You
>normally want a fairly small pool of threads with a queue of tasks
>for the threads to do. The number of threads in the pool is normally
>based on the number of (logical) processors available -- on the order
>of 2 to 4 times as many as processors is fairly typical. Given that
>you're dealing with an extremely I/O heavy application, a few more
>than that might make sense, but not a whole lot.
***
The traditional HTTP server/CGI interface lets the operating system manage the "pool of
threads" by creating new threads when one is needed (using the old Unix
one-thread-per-process model, this means "launch a new instance of the program" and lets
the scheduler deal with the resulting load). The FASTCGI technique keeps the processes
around so the launch cost does not exist. Apache used a process-pool model where a pool
of recently-used programs are kept running "just in case" there is a need for them.

IIS used a thread pool and ISAPI, which mean a DLL was loaded by a thread in the thread
pool; the downside of this was if the ISAPI extension corrupted the heap or took some kind
of failure such as an access fault, the whole IIS went down (let's hear applause fot the
winner of the Dumbest Web Server Design Ever Created award). This has been supplanted by
using CLR components, because the protected object model makes it impossible to corrupt
the heap, and if a component fails, it throws an exception that aborts the execution but
can be caught and handled gracefully by the invoker. But most Web servers do not
implement queues of tasks in the way you suggest.

However, your implementation is easily realizable by using an I/O Completion Port as a
thread queue and setting the maximum concurrency to be the number of CPU cores. You might
have more threads, but a thread that gets blocked on I/O is removed from the thread
concurrency count.
****
>
>What's most important is that you *not* tie a thread to a request
>though -- the number of threads is a tunable parameter of the pool,
>independent of the number of HTTP requests.
>
>> I may have as many 1,000 concurrent HTTP requests. I am
>> thinking that each of these threads could append to a single
>> file with no conflict (the OS sequencing these operations)
>> as long as the append is immediately flushed or buffering is
>> turned off.
>
>I don't know a more polite way to say it, so I'll put it bluntly:
>you're wrong. You cannot depend on the OS to order the operations.
****
This is one of those "magical mechanisms" he is so font of invoking to solve serious
problems. Ignore reality, and attribute to whatever mechanism under discussion some
property that will make the problem go away. Such as the mystical "the OS sequencing
these operations" [how? I know of no mechanism that can do this....] have run some tests
with 1000 threads just to prove a point (to some dweeb who asserted without any proof that
Windows could not support more than 64 threads because there were only 64 Thread Local
Storage slots, proving that he was dumber than a box of rocks)
****
>Your code needs to enforce the ordering that's necessary. The OS will
>supply mechanisms (e.g. mutexes, file locking) you can use to do
>that, but the OS isn't going to do much for you at all.
>
>In most cases, ordering tends to be a fairly minor problem. Most of
>what you usually need is atomic transactions. Each write becomes an
>"all or nothing" proposition, so it either completes, or it gets
>rolled back so it's as if it was never even attempted, and once it
>starts it runs to completion without any other write from any other
>thread getting in its way or interleaving its data with the current
>data.
****
Frankly, I'm curious what "ordering" has to happen here. He did suggest a two-queue model
for low and high priority tasks, showing a complete lack of understanding of realtime
scheduling, introducing the possibility of priority inversion or unused resources.

But this tends to follow the Magic Morphing Requirements model of deciding what the
requirement are, and driving the requirements by buzzword-fixation, senseless use of Tarot
Cards or a Ouija board to predict actual performance, choosing some mechanisms,
attributing to it magical properties (including those which preclude its usage), and
driving the requirements from the implementation details.
****
>
>With atomic transactions, ordering is as simple as including the
>sequencing as part of your data. Without atomic transactions, you're
>sunk -- you have no hope of it working dependably, but also no hope
>of it failing dependably, so what you'll almost inevitably end up
>with is series of problems followed by attempts at workarounds that
>don't really work right either.
****
Keep trying; if enough of us say the same thing long enough (days and days), maybe, just
maybe, the ideas will sink in.
****
>
>> The OCR process(s) would be notified of a new request using
>> some sort of IPC (named pipes for now) that also tells it
>> the byte offset in the transaction log file to find the
>> transaction details. Each transaction will have three
>> states:
>> (a) Available (Init by web server)
>> (b) Pending (updated by OCR process)
>> (c) Completed (updated by OCR process)
>>
>> I am not sure how the OCR process would notify the
>> appropriate thread within the web server process of the
>> [Completed] event, but, it would use some sort of IPC.
>
>What's the point of doing things this way? Right now, you're planning
>to write some data to the transaction log, then send a pointer to
>that data to the OCR engine, then the OCR engine reads the
>transaction log, does the OCR, updates the transaction log, and
>alerts the appropriate thread in the web server.
****
The same way ANY CGI-based process notifies its parent that it has completed. I believe
this is by closing stdout. Alternatively, if the service is multithreaded, and embedded
in the server, then any interthread mechanism will work well, but these are just way, way
too obvious. Instead, he assumes that there is no way for one process to notify another
(in linux, signal() accomplishes this nicely!) so othis falls into the
failure-to-understand-means-you-can't-use-the-mechanism magical lack of mechanisms.
****
>
>Instead of sending the data from the input thread to the OCR via the
>transaction log (and sending a pointer directly from the input to the
>OCR), send the relevant data directly from the input thread to the
>OCR engine.
****
If the OCR engine is embedded, an I/O Completion Port works just dandy. In linux, other
queuing mechanisms can exist.
****
>Have the OCR engine send the result data directly back
>into another queue. Have one pool of threads that takes HTTP
>requests, extracts relevant data, and puts it into the request queue.
>Have another pool of threads that takes results from the queue and
>sends out replies.
****
We pointed out the obvious a couple weeks ago.
****
>
>If you need a transaction log (e.g. to track usage for billing and
>such) insert it into the queuing chain (on either the incoming or
>outgoing side, as appropriate). Since it's dealing with queued
>requests, it can be single threaded, just writing the relevant data
>in order. This avoids all the headaches that arise with multithreaded
>access to the database (and usually improves performance, since it
>avoids seeking around the disk to read and write data at different
>parts of the file).
****
And a crash might mean the customer doesn't get billed. As I said in an earlier message,
the simplest implementation that meets the requirements that no customer pays for
undelivered goods should be sufficient. A solution where no result is unbilled is also
nice, but that may be harder.
joe
****
Joseph M. Newcomer [MVP]
email: newcomer(a)flounder.com
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm

From: Hector Santos on 8 Apr 2010 22:16

Peter Olcott wrote:

> "Hector Santos" <sant9442(a)nospam.gmail.com> wrote in message
> news:ec0DFN41KHA.224(a)TK2MSFTNGP06.phx.gbl...
>> Peter Olcott wrote:
>>
>>>> But you can use:
>>>>
>>>> TCP/IP <<--- What we use for ICP
>>>> UPD <<--- What we use for ICP
>>>> HTTP <<--- What we use for ICP
>>>> RPC <<--- What we use for ICP
>>>> DCOM
>>> But as I understand it these will not automatically grow
>>> a queue to any arbitrary length.
>>
>> Your queue is as good and fast as you request it, pipe or
>> otherwise.
>
> Some of the above have fixed queue lengths don't they?

No, because the question doesn't apply and I doubt you understand it,
because you have a very primitive understanding of queuing concepts.
No matter what is stated, you don't seem to go beyond a basic layman
abstract thinking - FIFO. And your idea of how this "simplicity" is
applied is flawed because of the lack of basic understanding.

Again, if you are working for windows, then you need to understand all
the networking protocols to make any judgment.

>> You didn't go a google. did you? Figures you would
>> ignore it.

>

> I did and one of the links says something like there aren't
> any problems with named pipes.

There were plenty of links where people had issues - even for LINUS

For what you want to use it for, my engineering sense based on
experience tells me you will have problems, especially YOU for this
flawed design of yours. Now you have 4 Named Pipes that you have to
manage. Is that under 4 threads? But you are not designing for
threads. One message yes, another no. Is the 1 OCR process going to
handle all four pipes? Or 4 OCR processes? Does each OCR have their
own Web Server? Did you work out how the listening servers will bind
the IPs? Are you using virtual domains? sub-domains? Multi-home IP
machine?

You really don't know what you are doing, right?

>> But even then, I can understand why the success. Unix is
>> not traditionally known to work with threads, and the
>> piping has permanent storage - your DISK - making it easy
>> to allow for easy recovery. Simple.
>
> The data is not supposed to ever actually hit the disk. I
> started a whole thread on just that one point.

But linux pipes are part of the disk, or did you missed that part,
forgot or wish not to believe it?

--
HLS

From: Hector Santos on 8 Apr 2010 22:21

Joseph M. Newcomer wrote:

> The same way ANY CGI-based process notifies its parent that it has completed. I believe
> this is by closing stdout.

Ultimately, the process ending (which must end) is the deciding factor
which the parent is waiting on (with a idle timeout in redirected data).

--
HLS

First | Prev | Next | Last
Pages: 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120
Prev: Improving Pete'r Application Performance
Next: Competitors for Pet'e OCR system