Can extra processing threads help in this case? [MFC]

Prev: Improving Pete'r Application Performance
Next: Competitors for Pet'e OCR system

From: Peter Olcott on 8 Apr 2010 19:19

"Hector Santos" <sant9442(a)nospam.gmail.com> wrote in message
news:uBpFdU21KHA.5828(a)TK2MSFTNGP02.phx.gbl...
> Answer my questions and I'll answer yours (which was done
> a few times already).
>
> If you are going to design for Linux, then;
>
> Why are you trolling in a WINDOWS development forum?
>
> Why are you here asking/stating design methods that
> defies logic
> under Windows when YOU think this logic is sound under
> UNIX?
>
> If you are going to design for Windows, then you better
> learn how to follow WINDOWS technology and deal with its
> OS and CPU design guidelines.
>

I am concurrently carrying on conversations in multiple
groups. I am talking here because I am getting useful advice
here. I took Joe's advice about the issues related to file
I/O buffers and specifically got the answers that I needed
about these. The most difficult issue with buffers is the
disk drive's own onboard cache, and it looks like the most
reliable solution for this issue is to simply turn off write
buffering.

Was your trouble with Windows named pipes? (I won't be using
those).
What IPC did you end up choosing? (I like named pipes
because their buffer can grow to any length).

> --
> HLS
>
> Peter Olcott wrote:
>
>> "Hector Santos" <sant9442(a)nospam.gmail.com> wrote in
>> message news:uhFwgp11KHA.140(a)TK2MSFTNGP05.phx.gbl...
>>> Peter Olcott wrote:
>>>
>>>
>>>> I think that many of these issues may go away by using
>>>> two half-duplex named pipes one in each direction. No
>>>> one has yet pointed out any issues with Unix/Linux
>>>> named pipes. I like named pipes because the implement
>>>> the FIFO intuitively with minimal learning curve.
>>>
>>> I can only hope that one day you will actually begin
>>> your work, so you can see how great it will work.
>>>
>>> Google: named pipe problems
>>>
>>> http://www.google.com/search?q=named+pipe+problems&start=0&ie=utf-8&oe=utf-8&client=firefox-a&rls=org.mozilla:en-US:official
>>>
>>> When our multi-million dollar server was first under
>>> design back in the mid 90s, name pipes was going to be
>>> used. We saw almost immediately how unreliable it was
>>> a for a high end, high throughput, high multi-thread
>>> WAN/LAN network server.
>>
>> First of all are you talking about named pipes in Windows
>> or Unix/Linux?
>>
>>> Not saying you can make it work, but you will spend more
>>> time on getting that right than anythingelse and for
>>> what? A fifo? When there are so many other more
>>> reliable methods and simpler methods?
>>>
>>
>> What simpler more reliable methods are you referring to
>> that can provide event based notification between
>> processes?
>>
>>> But hey, it will probably work for you because I
>>> sincerely doubt you will have the work load you predict
>>> you will have. You are basing this on a 10ms
>>> throughput and you won't have that. You can't. Even if
>>> your OCR is isolated to pure 10 ms computation, which
>>> LINUS will give you, its surrounding world is YOUR enemy
>>> that you can't avoid, like your fifo receiver, like file
>>> I/O logging, your PIPE is a FILE on UNIX as well, which
>>> has hardware interrupts, like generating results, etc.
>>>
>>> Live and learn. Which leads to the questions, if you are
>>> going to design for Linux, then;
>>>
>>> Why are you trolling in a WINDOWS development forum?
>>>
>>> Why are you here asking/stating design methods that
>>> defies logic
>>> under Windows when YOU think this logic is sound
>>> under UNIX?
>>>
>>> If you are going to design for Windows, then you better
>>> learn how to follow WINDOWS technology and deal with its
>>> OS and CPU design guidelines.
>>>
>>> --
>>> HLS
>>

From: Hector Santos on 8 Apr 2010 20:13

Peter Olcott wrote:

>
> Was your trouble with Windows named pipes? (I won't be using
> those).
> What IPC did you end up choosing? (I like named pipes
> because their buffer can grow to any length).

All sorts of methods, beginning with a simple straight shared file.

But you can use:

TCP/IP <<--- What we use for ICP
UPD <<--- What we use for ICP
HTTP <<--- What we use for ICP
RPC <<--- What we use for ICP
DCOM

and others networking protocols:

http://msdn.microsoft.com/en-us/library/ee663291(v=VS.85).aspx

Here is what MS says about Named Pipes vs TCP/IP

http://msdn.microsoft.com/en-us/library/aa178138(SQL.80).aspx

And a 2003 Dr. Dobbs article on how to handle named pipes correctly,
even though it seems so "simple":

http://www.drdobbs.com/architecture-and-design/184416624;jsessionid=BVL3ABP0UVUSJQE1GHPSKH4ATMY32JVN

Again, remember your bottleneck which you believe WILL NOT EXIST with
a swag 10ms unrealistic calculation, once you get to 11ms or more, you
have a build up with your 1 thread FIFO pipe design - regardless of
what method you use.

--
HLS

From: Peter Olcott on 8 Apr 2010 20:58

"Hector Santos" <sant9442(a)nospam.gmail.com> wrote in message
news:%23pvVPm31KHA.4832(a)TK2MSFTNGP04.phx.gbl...
> Peter Olcott wrote:
>
>>
>> Was your trouble with Windows named pipes? (I won't be
>> using those).
>> What IPC did you end up choosing? (I like named pipes
>> because their buffer can grow to any length).
>
> All sorts of methods, beginning with a simple straight
> shared file.

I am beginning with a simple shared file. The only purpose
of the other IPC is to inform the process of the event that
the file has been updated at file offset X, without the need
for the process to poll for updates. File offset X will
directly pertain to a specific process queue.

>
> But you can use:
>
> TCP/IP <<--- What we use for ICP
> UPD <<--- What we use for ICP
> HTTP <<--- What we use for ICP
> RPC <<--- What we use for ICP
> DCOM

But as I understand it these will not automatically grow a
queue to any arbitrary length.

>
> and others networking protocols:
>
> http://msdn.microsoft.com/en-us/library/ee663291(v=VS.85).aspx
>
> Here is what MS says about Named Pipes vs TCP/IP
>
> http://msdn.microsoft.com/en-us/library/aa178138(SQL.80).aspx
>
> And a 2003 Dr. Dobbs article on how to handle named pipes
> correctly, even though it seems so "simple":
>
> http://www.drdobbs.com/architecture-and-design/184416624;jsessionid=BVL3ABP0UVUSJQE1GHPSKH4ATMY32JVN
>

OK so the Unix/Linux people say that it is well know that MS
named pipes are borked, yet, they have never had any problem
with Unix/Linux name pipes.

> Again, remember your bottleneck which you believe WILL NOT
> EXIST with a swag 10ms unrealistic calculation, once you
> get to 11ms or more, you have a build up with your 1
> thread FIFO pipe design - regardless of what method you
> use.

Its looking more like four processes with one having much
more priority than the others each reading from one of four
FIFO queues.
(1) Paying customer small job (one page of data) This is
the 10 ms job
(2) Paying customer large job (more than one page of data)
(3) Building a new recognizer
(4) Free trial customer

>
> --
> HLS

From: Joseph M. Newcomer on 8 Apr 2010 20:59

See below...
On Thu, 8 Apr 2010 08:54:38 -0500, "Peter Olcott" <NoSpam(a)OCR4Screen.com> wrote:

>
>"Joseph M. Newcomer" <newcomer(a)flounder.com> wrote in
>message news:6aspr5dr3kb4npe47j9mu26kbl2ib4s28v(a)4ax.com...
>> On Wed, 7 Apr 2010 10:07:02 -0500, "Peter Olcott"
>> <NoSpam(a)OCR4Screen.com> wrote:
>>
>>>
>>>Sure so another way to solve this problem is on the rare
>>>cases when you do lose a customer's money you simply take
>>>their word for it and provide a refund. This also would
>>>hurt
>>>the reputation though, because this requires the customer
>>>to
>>>find a mistake that should not have occurred.
>> ****
>> Incredibly elaborate mechanisms to solve non-problems.
>> Simple mechanisms (e.g., "resubmit
>> your request") should suffice. Once your requirements
>> state what failure modes are
>
>You are not paying attention. I am talking about a server
>crash with loss of data after the customer has added money
>to their account, but, before this financial transaction has
>been saved to offsite backup. They add ten bucks to their
>account and I lose track of it because the server crashed
>and it was not yet time for my periodic backup.
****
Actually, I AM paying attention; you are not paying attention. I suggest creating the
MINIMUM amount of complexity that guarantees that the customer is not charged for a
failure; you are attempting to create incredibly elaborate mechanisms that give you the
illusion of 100% reliability. I say: fail and don't charge, or fail and refund, and
implement the smallest, simplest system that satisfies this design.
joe
****
>
>
Joseph M. Newcomer [MVP]
email: newcomer(a)flounder.com
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm

From: Joseph M. Newcomer on 8 Apr 2010 21:16

See below...
On Thu, 8 Apr 2010 14:57:19 -0500, "Peter Olcott" <NoSpam(a)OCR4Screen.com> wrote:

>
>"Joseph M. Newcomer" <newcomer(a)flounder.com> wrote in
>message news:32qpr51j8nlveopnqnj84isof55m4ut00q(a)4ax.com...
>> See below...
>>
>>>Plus what happens if the machine crashes? You lost what
>>>hasn't been
>>>flush. The different is you lower any issues at the
>>>expense at some
>>>speed and open/close overhead which will be very minor for
>>>you.
>> ****
>> Part of the issue here is what is meant by "flush", and a
>> difference between the buffers
>> held in the application and the buffers held in the
>> kernel.
>
>> But this does not seem to be of much concern to our OP,
>> who "knows" that there must be a
>> way to force the caches to the platters (my evidence says
>> it ain't so, or at least it
>> wasn't so with vehment opposition to ever making it so),
>> and so this magical mechanism
>> will solve all the problems.
>> ****
>
>If my understanding is correct fsync() is supposed to handle
>both.
> http://linux.die.net/man/2/fsync
>It might be the case that I must use the low level open()
>command so that there are no application buffers.
****
fflush() will flush application buffers if you are using stdio. and fsync(), if it is
implemented (did you see the section of that SQLLITE discussion that says that it is not
always implemented correctly?)

You may not be able to turn off the onboard disk cache buffering. That's part of the
problem I was referring to. (And yes, it kills hard drive performance)
****
>
>Also the experts seem to be saying that the drive's own
>onboard cache is not much of an issue if there is UPS.
>There are some ways to force some drives to empty their
>onboard cache. The only way that is supposed to always work
>is to turn write buffering off. This can really hurt
>hard-drive performance.
****
Power failure is not much of an issue if you have a UPS, so worrying about what happens
under power failure is not a really high priority in real life.
****
>
>>>It helps to have a single point I/O controller, but how
>>>are you
>>>planning to use this thread? How will you talk to it?
>>>IOW, now you
>>>really need to make sure you have synchronization.
>> ****
>> If one thread handles the file, then no "synchronization"
>> is required because all requests
>> serialze through this one thread. It is an approach
>> called the "agent pattern".
>
>It looks like clarification from the Linux/Unix experts
>indicate that this would be required for my transaction log.
****
Of course, you still have to flush application buffers and flush kernel buffers; putting
it in a single thread still does not guarantee transactional integrity.

You have to decide where your "start transaction" and "end transaction" points are.

Oh yes, it really is hard on the disk drive; I killed on disk drive by running a large
number of tests on a transacted database; it just stopped seeking. But during the tests,
it was seeking ferociously as it made sure the directory blocks were consistent with the
file contents.
****
>
>> This is another confusion; apparently an atomic append is
>> sufficient to guanratee
>> transactional integrity in his fantasy world. In the real
>> world, an atomic append is
>> guaranteed to atomically append data. Whether or not this
>> constitutes a transaction is
>> problematic, and probably incorrect. It can perfectly
>> well guarantee that the in-memory
>> disk-cache image is atomically appended to without EVER
>> saying that it guarantees the
>> commit of this data to the drive. In fact, pwrite
>> documentation is completely silent on
>> this point!
>> *****
>
>>>So it every other IPC concept. For your need, named pipes
>>>is more
>>>complex and can be unreliable and very touchy if you don't
>>>do it
>>>right. I mean, your I/O needs to be 100% precise and that
>>>can't be
>>>done in less than 20-30 lines of code, and for what you
>>>need, 3-4
>>>lines code is sufficient.
>> ****
>> Apparently, he thinks that a database can't a FIFO queue
>> because he once read that SQLLITE
>> doesn't have a record number, or something else silly like
>> that. He missed the idea thata
>> a FIFO queue is a FIFO queue and ANY stream-oriented
>> protocol (including TCP/IP to the
>> local machine!) could be a valid implementation; instead,
>> he fastened on one
>
>And its buffer would automatically grow to any required
>length and automatically shorten as items are removed?
****
Yep. That's EXACTLY what happens. And only and undefined and indeterminate points does
the file system manage to get these updated blocks out to the hard drive (unless you have
a way to force synchronization of the buffers with the magnetic surfaces). So imagine
that you have deleted records in page 1 and added records to page 7. When you delete
records, the other records are "shuffled down" to fill the space. These pages are
committed to disk in opportunistic order, so what is on the platters represents a snapshot
of the in-memory buffers at random states, and the pages on the disk may be inconsistent
with the pages in memory. So you can end up with duplicate records, missing records at
the end, etc.
****
>
>> implmenetation, "named pipe" (which in linux means
>> something completelydifferent from the
>> Windows concept) and intantly fallen in love with it, to
>> the ezclusion of the
>> consideration of any other method. He even asserted,
>> without any substantitating data,
>> that implementing a FIFO using SQLLITE would have
>> unacceptable overheads because it
>> couldn't compute a seek address directly! (DUH! Like that
>> matters! NO DATA == NONSENSE;
>> if there is any concern about performance, ONLY
>> MEASUREMENT WILL PRODUCE MEANINGFUL DATA,
>> but he is so enaboured of his "think" system, or Tarot
>> cards, or Ouija board, or whatever
>> he is using, that real data is not a consideration)
>> *****
>>>
>>>Unless you get Named Pipe class that will do all the work,
>>>error
>>>checking, like error 5/32 sharing violation timings, etc,
>>>exceptions,
>>>proper full duplex communications, you can certainly run
>>>into a ugly
>>>mess. I don't recommend it for you. You don't need it.
>> ****
>
>I think that many of these issues may go away by using two
>half-duplex named pipes one in each direction. No one has
>yet pointed out any issues with Unix/Linux named pipes. I
>like named pipes because the implement the FIFO intuitively
>with minimal learning curve.
****
No, in fact, NONE of them change, at all. Whether you are using two half-duplex pipes
(which is all linux supports, even as named pipes) or a full-duplex pipe (as is supported
in Windows).

If either the server app or the app it spawns fail, the contents of the name pipe will be
lost. Just because nobody bothered to point out the obvious does not mean the problem
does not exist. Low learning curve does not immediately map to robust transacted data
transfer!
joe
****
>
>
Joseph M. Newcomer [MVP]
email: newcomer(a)flounder.com
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm

First | Prev | Next | Last
Pages: 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118
Prev: Improving Pete'r Application Performance
Next: Competitors for Pet'e OCR system