Can extra processing threads help in this case? [MFC]

Prev: Improving Pete'r Application Performance
Next: Competitors for Pet'e OCR system

From: Peter Olcott on 29 Mar 2010 22:39

"Joseph M. Newcomer" <newcomer(a)flounder.com> wrote in
message news:gbs1r5tr89f31ut0jvnovu8nvu2i7qpaph(a)4ax.com...
> See below...
> On Mon, 29 Mar 2010 10:58:56 -0500, "Peter Olcott"
> <NoSpam(a)OCR4Screen.com> wrote:
>
>>> Why do you suddenly add "fault tolerance", then add
>>> "fast
>>> interprocess communication that
>>> is also fault tolerant"? I think you are totally and
>>> utterly clueless here. You do not
>>> understand ANY of the concepts involved.
>>>
>>> If you have fault tolerance, why does it have to be in
>>> the
>>> IPC mechanism? In fact, it
>>> would probably be a Really Bad Idea to try that.
>>
>>Although it may be temporary ignorance on my part that
>>suggested such a thing, I was thinking that it might be
>>simpler to do it this way because every client request
>>will
>>be associated with a financial transaction. Each financial
>>transaction depends upon its corresponding client request
>>and each client request depends upon its corresponding
>>financial transaction. With such mutual dependency it only
>>seemed natural for the underlying representation to be
>>singular.
> ****
> There are several possible approaches here, with different
> tradeoffs.
> (a) get PayPal acknowledgement before starting the
> transaction
> (b) Because the amount is so small, extend credit, and do
> the PayPal processing "offline"
> out of the main processsing thread; in fact, don't even
> request the PayPal debiting until
> the transaction has completed, and if it is refused, put
> the PayPal processing FIRST for
> their next request (thus penalizing those who have had a
> refused transaction); you might
> lost a dime here and there, but you have high performance
> for other than those who aren't
> cleared.
> (c) if the transaction fails, and you have already debited
> the account, have a background
> process credit the account for the failed transaction.

None of the above, although most like (b) They pay me in
advance in at least $1 increments and this amount is placed
in their local server account file. The real time
transaction goes against this local file.

>
> You are confusing IPC with robustness mechanism. IPC is a
> pure and simply a transport
> mechanism; anything about robustness has to be implemented
> external to the IPC.
> joe
>
>>
>>> I guess I'm biased again, having built
>>> several fault-tolerant systems.
>>> joe
>>> ****
>>>>
>>> Joseph M. Newcomer [MVP]
>>> email: newcomer(a)flounder.com
>>> Web: http://www.flounder.com
>>> MVP Tips: http://www.flounder.com/mvp_tips.htm
>>
> Joseph M. Newcomer [MVP]
> email: newcomer(a)flounder.com
> Web: http://www.flounder.com
> MVP Tips: http://www.flounder.com/mvp_tips.htm

From: Joseph M. Newcomer on 29 Mar 2010 22:38

See below...
On Mon, 29 Mar 2010 16:35:54 -0500, "Peter Olcott" <NoSpam(a)OCR4Screen.com> wrote:

>
>It is not that I am stupid it is that for some reason (I am
>thinking intentionally) you fail to get what I am saying.
>I needed something the PREVENTS page faults**, MMF does not
>do that, VirtualLock() does.
****
I never promised that memory mapped files would give you ZERO page faults; I only pointed
out that it can reduce the total number of page faults, and distribute the cost of them
differently than your simplistic model that takes several thousand page faults to load the
data. And I said that in any real world experiment, it is essential to gather the data to
show the exact impact of the architecture.
****
>
>A thing can ONLY be said to be a thing that PREVENTS page
>faults if that thing makes pages faults impossible to occur
>by whatever means.
****
That's assuming you establish that "zero page faults" is essential for meeting your
high-level requirement. You have only said that if you have tens of thousands of page
faults, you cannot meet that requirement, and if you have zero, you have no problem. You
have not established if the limit of page faults is zero, one hundred, two hundred and
thirty-seven, or three thousand. All you know is that your initial startup is massively
slow and you attribute this to the large number of page faults you see. You may be
correct. But without detailed analysis, you have no basis for making the correlation.
****
>
>It is like I am saying I need some medicine to save my life,
>and you are say here is Billy Bob he is exactly what you
>need because Billy Bob does not kill people.
*****
No, but if you want "medicine to save your life" do you take 5mg once a day or 100mg ten
times a day? We are not talking alternatives here, but dosages. And do you take it with
some sort of buffering drug to suppress side effects, or take it straight (try sitting
with a friend undergoing chemo, for three hours each trip, and drive him home, and you
will appreciate what buffering means). You have only established two extreme endpoints
without trying to understand what is really going on. Do you have detailed performance
measurement of the internals of your program? (If not, why not?) Or do you optimize
using the by-guess-and-by-golly method that people love to use (remember my basic
principle, derived over 15 years of performance measurement: "Ask a programmer where the
performance bottleneck is and you will get a wrong answer"? That principle NEVER failed
me in 15 years of doing performance optimization). You actualy DON'T have any detailed
performance numbers; only some gueses where you have established two samples and done
nothing to understand the values between the endpoints! This isn't science, this is as
scientific as tossing darts over your head at the listing behind you and optimizing
whatever subroutine the dart lands on.
****
>
>Everyone else here (whether they admit it or not) can also
>see your communication errors. I don't see how anyone as
>obviously profoundly brilliant as you could be making this
>degree of communication error other than intentionally.
****
When I tell you "you are using the language incorrectly" and explain what is going on, and
give you citations to the downloadable Intel manual, I expect that you will stop using the
language incorrectly, and not continue to insist that your incorrect usage is the correct
usage. You foolishly think that "virtual memory" necessarily means "paging activity", and
in spite of several attempts by Hector and me to explain why you are wrong, you still
insist on using "virtual memory" in an incorrect fashion. Where is the communication
failure here? Not on my side, not on Hector's side (he pointed you to the Russinovich
article). And then you come back, days later, and STILL insist that "virtual memory" ==
"paging activity" it is really hard to believe we are talking to an intelligent human
being. And you still don't have any data to prove that paging is your bottleneck, or to
what degree it is a problem. Instead, you fall back on somebody's four-color marketing
brochure and equate "meeting a realtime window" (and a HUGE one) with "absolute
determinism", which sounds more like a philosophical principle, and insist that without
absolute determinism you cannot meet a realtime window, which I tried to explain is
nonsense. Paging is only ONE possible factor in performance, and you have not even
demonstrated that it matters (you did demonstrate that running two massive processes on a
single core slows things down, which surprises no one).
****
>
>Notice this I am not resorting to ad hominem attacks.
>
****
I've given up trying to be polite. It didn't work. If I explain something ONCE and you
insist on coming back and saying I'm wrong, and persist in using technical language
incorrectly, try to justify your decisions by citing scientifically unsupportable
evidence, tell use we don't know what we're talking about when you have expended zero
effort to read about what we've told you, you are not behaving rationally.

Learn how to do science. Learn what "valid experiment" means. Learn that "engineering"
means, quite often, deriving your information by performing valid experiments, not
thinking that real systems are perfect reflections of oversimplified models described in
textbooks, and that you can infer behavior by just "thinking" about how these systems
work. This ignores ALL good priniciples of engineering, particularly of software
engineering: build it, measure it, improve it. And by MEASURE I mean "RUN VALID
EXPERIMENTS!" You have run two that do not not give any guidance to optimization, just
prove that certain extreme points work or don't work.

Guy L. Steele, Jr., decided that he needed to produce a theoretical upper bound on sorting
(we know the theoretical lower bound is O(n log n). He invented "bogo-sort", which is
essentially "52-pickup". What you do is randomly exchange elements of the array, then
look at it and see if it is in order. If it is in order, you are done, otherwise, try the
random rearrangement again until the vector is in sorted order.

So you have done the equivalent of running qsort (n log n) and bogo-sort and this tells
you nothing about how bubble sort is going to perform. You ran an experiment that
overloaded your machine, and one which had zero page faults, and from this you infer that
ANY paging activity is unacceptable. This is poor science, and anyone who understands
science KNOWS it is poor science. Until you have determined where the "performance knee"
is, you have NO data, nor do you know where your problems are, nor do you know where to
optimize. SO you take a simplified model, run a single test, and you have STILL not
derived anything useful; in fact, your current model is subject to priority inversion and
does not guarantee maximum throughput, even in the ABSENCE of page faults. For those of
us who spent years optimizing code, this is obvious, and I've tried to tell you that your
data is bad, and instead of listening, you insist that your two extreme points are the
only points you need to understand what is going on. Not true, not true at all.
joe

joe

****
Joseph M. Newcomer [MVP]
email: newcomer(a)flounder.com
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm

From: Joseph M. Newcomer on 29 Mar 2010 22:53

See below...
On Mon, 29 Mar 2010 12:12:13 -0500, "Peter Olcott" <NoSpam(a)OCR4Screen.com> wrote:

>
>"Joseph M. Newcomer" <newcomer(a)flounder.com> wrote in
>message news:mcl1r597pthv9priqa6vla6np19l6p0ic1(a)4ax.com...
>> See below...
>> On Mon, 29 Mar 2010 09:57:59 -0500, "Peter Olcott"
>> <NoSpam(a)OCR4Screen.com> wrote:
>>
>>>
>>>"Joseph M. Newcomer" <newcomer(a)flounder.com> wrote in
>>>message news:oesvq55safqsrg8jih8peiaah4uiqt0qi3(a)4ax.com...
>>>> Well, I know the answer, and I think you are behaving in
>>>> yet another clueless fashion. And
>>>> in my earlier reply I told you why. You want "fault
>>>> tolerance" without even understanding
>>>> what that means, and choosing an implementation whose
>>>> fundamental approach to fault
>>>
>>>The only fault tolerance that I want or need can be
>>>provided
>>>very simply. The original specification of fault tolerance
>>>that I provided was much more fault tolerance than would
>>>be
>>>cost-effective. If I really still wanted this level of
>>>fault
>>>tolerance then many of your comments on this subject would
>>>not be moot. Since this degree of fault tolerance has been
>>>determined to never be cost-effective, then any details of
>>>providing this level of fault tolerance become moot.
>>>
>>>The other Pete had greater insights into my own needs than
>>>I
>>>did myself. I will paraphrase what he said. I only need to
>>>avoid losing transactions. When a client makes a request,
>>>I
>>>only need to avoid losing this request until it is
>>>completed. Any faults in-between can be restarted from the
>>>beginning.
>> ****
>> Yout total cluelessness about TCP/IP comes to the fore
>> again. Suppose you have
>> established a connection to the machine. The machine
>> reboots. What happened to that
>> connection? Well, IT NO LONGER EXISTS! So you can't
>> reply over it! Even if you have
>> retained the information about the data to be processed,
>> YOU HAVE NO WAY TO COMMUNICATE TO
>> THE CLIENT MACHINE!
>
>False assumption. A correct statement would be I have no way
>to communicate to the client that you are ware of (see
>below).
****
Email is not the same thing as getting the result back from the server. And users will
not expect to get email if they get a "connection broken" request unless you tell them,
and this requires a timeout a LOT larger than your 500ms magic number.
****
>
>> In what fantasy world does the psychic plane allow you to
>> magically
>> re-establish communication with the client machine?
>
>That one is easy. All users of my system must provide a
>verifiably valid email address. If at any point after the
>client request if fully received the connection is lost, the
>output is sent to the email address.
****
Which violates the 500ms rule, by several orders of magnitude.

I'm curious how you get a "verifiably valid" email address. You might get AN email
address, but "verifiably valid" is a LOT more challenging. THere are some hacks that
increase the probability that the email address is valid, but none which meet the
"verifiably valid" criterion.
***
>
>>
>> And don't tell me you can use the IP address to
>> re-establish connectivity. If you don't
>> understand how NAT works, both at the local level and at
>> the ISP level, you cannot tell me
>> that retaining the IP address can work, because I would
>> immediately know you were wrong.
>> ****
>>>
>>>The key (not all the details, just the essential basis for
>>>making it work) to providing this level of fault tolerance
>>>is to have the webserver only acknowledge web requests
>>>after
>>>the web request have been committed to persistent storage.
>> ****
>> Your spec of dealing with someone pulling the plug, as I
>> explained, is a pointless
>> concern.
>
>And I have already said this preliminary spec has been
>rewritten.
****
So what is it? How can we give any advice on how to meet a spec when we don't even know
what it is any longer?
****
>
>> So why are you worrying about something that has a large
>> negative exponent in
>> its probability (1**-n for n something between 6 and 15)?
>> There are higher-probability
>> events you MIGHT want to worry about.
>> ****
>>>
>>>The only remaining essential element (not every little
>>>detail just the essence) is providing a way to keep track
>>>of
>>>web requests to make sure that they make it to completed
>>>status in a reasonable amount of time. A timeout threshold
>>>and a generated exception report can provide feedback
>>>here.
>> ****
>> But if you have a client timeout, the client can resubmit
>> the request, so there is no need
>> to retain it on the server. So why are you desirous of
>> expending effort to deal with an
>> unlikely event? And implementing complex mechanisms to
>> solve problems that do not require
>
>Every request costs a dime. If the client re-submits the
>same request it costs another dime. Once a request is
>explicitly acknowledged as received, the acknowledgement
>response will also inform them that resubmitting will be
>incur an additional charge.
****
Oh, I get it, "we couldn't deliver, but we are going to charge you anyway". Not a good
business model. You have to make sure that email was received before you charge. Not
easy. We got a lot of flack at the banking system when we truncated instead of rounding,
which the old system did, and people would complay that they only got $137.07 in interest
when they expected to get $137.08. And you would not BELIEVE the flack we got when we had
to implement new Federal tax laws on paychecks, and there were additional "deductions"
(the pay was increased by $0.50/payroll to cover the $0.50 additional charge the
government required, but again the roundoff meant we were getting complains from people
who got $602.37 uner the new system when under the old hand-written checks they got
$602.38. So you had better be prepared, under faiilure scenarios, to PROVE you delivered
the result they paid for, even for $0.10, because SOMEBODY is going to be tracking it!

It will be LOTS of fun!
*****

>
>> solution on the server side? And at no point did you talk
>> about how you do the PayPal
>> credit, and if you are concerned with ANY robustness,
>> THAT's the place you have to worry
>> about it!
>>
>> And how does PayPal and committed transactions sit with
>> your magical 500ms limit and the
>> "no paging, no disk access, ever" requirements?
>> ****
>>>
>>>Please make any responses to the above statement within
>>>the
>>>context of the newly defined much narrower scope of fault
>>>tolerance.
>> ****
>> If by "fault tolerance" you mean "recovering from pulling
>> the plug from the wall" my
>
>No not anymore. Now that I have had some time to think about
>fault tolerance (for the first time in my life) it becomes
>obvious that this will not be the benchmark, except for the
>initial request / request acknowledgement part of the
>process.
***
So what IS your requirements document? SHOW US!
****
>
Joseph M. Newcomer [MVP]
email: newcomer(a)flounder.com
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm

From: Peter Olcott on 29 Mar 2010 22:58

"Joseph M. Newcomer" <newcomer(a)flounder.com> wrote in
message news:hrs1r5l7j283dhcha8oa9n96u1fag61jdu(a)4ax.com...
> See below...
> On Mon, 29 Mar 2010 09:27:15 -0500, "Peter Olcott"
> <NoSpam(a)OCR4Screen.com> wrote:
>
>>I know the difference between threads and a process and
>>see
>>no reason why threads would not work if processes do work
>>the converse not necessarily being true.
> ****
> Memory interference patterns will be completely different
> in a multithreaded single
> process.
> ****
>>
>>What difference is there between threads and processes
>>that
>>is the basis for even the possibility that threads may not
>>work whereas processes do work?
> ****
> It is sad you have to ask this question. See above
> comment. There are substantial
> changes in how the memory system is handled in
> inter-process context switching than
> intra-process context switching. And the consequences are
> quite different. So MAYBE you
> will get comparble performance, MAYBE it will be a lot
> better, and MAYBE it will not be as
> good. I have no idea. But *I* would have run the
> experiment so I would KNOW, and not
> just be guessing based on the wishful-thinking approach
> baed on one irrelevant experiment.
> ****

Basically faster access because less overhead or the same
access because of the same overhead. Yeah so basically you
are saying that you are not sure and I should test it. Of
course I will when the time comes. If the time ever comes
where I really need to upgrade to more than a single core
server, at my dime a transaction prices I could hire a team
to solve this problem for me.

>>Please do not site a laundry list of the differences
>>between
>>threads and processes, please only cite at least one
>>difference between threads and processes along with
>>reasoning to explain why they might not be used whereas
>>threads worked correctly.
> ****
> See above.
> ****
>>
>>Here are two crucial assumptions why I think that threads
>>must work if processes do work:
>>(1) Threads can share memory with each other with most
>>likely less overhead than processes.
> ****
> And higher potential interference. You don't really know.

I would think that the issue would be cache or bus
contention, and I see no specific differences in the way
that threads access memory and the way that processes access
memory that could account for difference between these two
types of contention.

> ****
>>(2) Threads can be scheduled on multiple processor cores,
>>just like processes.
> ****
> See previous comment about memory interferene patterns.
> The point is, I DON'T KNOW, but I
> don't believe in extrapolating from unrelated experiments.
> GET THE &%$#ING DATA!
> ****

It sure is close enough for now, and of course I would test
before rolling out another production server.

>>
>>>
>>> You are going to have to come up with a better proposal
>>> than one that uses the words "some
>>> sort" in it.
>>> joe
>>
>>All that I was saying is that my mind is still open to
>>alternatives than the ones that I originally suggested.
> ****
> It sounded to me like you have not been open to ANY
> alternatives we have suggested, and
> you don't write specifications using phrases like "some
> sort" in them. You write VERY
> specific requirements, and from those requirements,
> generate VERY specific implementation

That is not the way that good architecture is developed.
Back in the early days of structured systems analysis they
called this getting prematurely physical. If you get too
specific too early you lose most any chance of a nearly
optimal design., you have committed the garden path error.

> strategies. Then, those strategies are reviewed and might
> be rejected or accepted, based
> on both technical feasibility and whether or not they
> satisfy the requirements. And you
> have demonstrated that technical feasibility is not one of
> your strongest points.

Some of these things are brand new to me, and this is the
very first time that I have even thought about them.
Categorically exhaustive reasoning can derive a near optimal
solution to most any problem, but, this takes time. Keeping
the focus on categories of ideas instead of detailed
specifics is a much more efficient way to interpolate upon
the near optimal solution. These categories a gradually made
increasingly more specific.

>
> First, though, you need a very precise requirement, one
> which incorporates everything that
> is essential and eliminates anything that is now "moot".
> So we have something to work
> with that is actually self-consistent, and not morphing on
> every restatement. Then a
> detailed implementation specification which tells exactly
> how you plan to meet that
> requirement.
> joe
> ****

See what I just said about categorically exhaustive
reasoning. This same reasoning through you me and Hector
narrowed down the focus to using the HTTP protocol as the
best choice for turning my OCR engine into a web
application. I don't think that a better category could have
possibly been chosen. Categorically exhaustive reasoning
pairs down the decision tree most efficiently.

>>
>>>
>>>>
>>> Joseph M. Newcomer [MVP]
>>> email: newcomer(a)flounder.com
>>> Web: http://www.flounder.com
>>> MVP Tips: http://www.flounder.com/mvp_tips.htm
>>
> Joseph M. Newcomer [MVP]
> email: newcomer(a)flounder.com
> Web: http://www.flounder.com
> MVP Tips: http://www.flounder.com/mvp_tips.htm

From: Peter Olcott on 29 Mar 2010 23:06

"Joseph M. Newcomer" <newcomer(a)flounder.com> wrote in
message news:55u1r5pogli39eiiumdckpbk8fvm0jfnh3(a)4ax.com...
> On Sun, 28 Mar 2010 23:23:09 -0500, "Peter Olcott"
> <NoSpam(a)OCR4Screen.com> wrote:
>
>>
>>"Joseph M. Newcomer" <newcomer(a)flounder.com> wrote in
>>message news:ehcvq55h4l9lrobkpabb2m31ve22nvffd4(a)4ax.com...
>>>I had forgotten that! And I DID tell him about
>>>VirtualLock!
>>>
>>> Denser than depleted uranium.
>>>
>>> What is amazing is his persistence in using technically
>>> incorrect terminology even after
>>> it has been carefully explained to him what is going on.
>>> For example, a desire to
>>> allocate contiguous physical memory should have been the
>>> first clue that he had no idea
>>> what virtual memory was. And his persistence in
>>> equating
>>> virtual memory (which, at its
>>
>>If all that other stuff that is going on under the covers
>>is
>>sufficiently analogous to the simpler case, then it is a
>>communication error to specify these extraneous and
>>essentially irrelevant details. These extra details impede
>>the communication process.
> ***
> ANd you accuse ME of having communication problems! But
> you have serious communication
> problems in that once you are told you are using the
> technical language incorrectly, you
> PERSIST in using it incorrectly, thus impeding any form of
> effective communication.
>
> It is NOT "analogous". You asked a VERY SPECIFIC and
> carefully-worded question, and were
> told that this was not possible. At which point you began
> insisting that it HAD to work
> as you described. These details are NOT "irrelevant" but
> in fact change the nature of the
> problem considerably. A fact which we keep trying to
> explain to you!
> ***
>>
>>In other words the simpler less exactly precise terms are
>>more correct (within the goal of effective communication)
>>than the complex precisely correct terms because the
>>precisely correct terms impede the clarity and conciseness
>>of communication.
> ****
> But the precisely correct terms described what is REALLY
> going on and the slovenly use of
> terms impedes the communication because you are, for
> example, asking for the impossible.

I am primarily an abstract thinker, I almost exclusively
think in terms of abstractions. This has proven to be both
the most efficient and most effective way to process complex
subjects mentally. I scored extremely high on an IQ test in
this specific mode of thinking, something like at least
1/1000. My overall IQ is only as high as the average MD
(much lower than 1/1000). I would not be surprised if your
overall IQ is higher than mine, actually I would expect it.
Your mode of thinking and my mode of thinking seem to be at
opposite end of the spectrum of abstract versus concrete
thinking.

> If you insist on asking for something that is impossible,
> you aren't going to get it, no
> matter how much you want to. And if you insist on the
> impossible, you demonstrate that
> you are clueless, particularly after it has been explained
> to you that you have asked for
> the impossible.If you say "I must have X" and someone says
> "that is never going to happen"
> then you have to accept that you are not going to get "X"
> and stop insisting that you must
> have "X". Those of us who make their livings satisfying
> the needs of customers have, as
> part of our responsibility, making sure that customer
> doesn't have unrealistic
> expectations, and therefore, when we fail to deliver "X"
> means they will say "I wanted
> "X", you did not not give me "X", and therefore I am not
> going to pay you". Been there,
> done that, thirty years ago, and I don't make the same
> mistake twice. Intead, I simply
> say "You are never going to get "X" because that is not
> how the system works. Now here's
> what you ARE going to get, that meets your
> requirements..." Which is why a coherent set
> of requirements is mandatory (a friend of mine has an
> interesting approach to pricing. He
> asks for the requirements document first. If it is
> well-written, his rate is $N/hr. If
> it is badly written, or nonexistent, his rate is
> $1.5*N/hr. He is very upfront about
> this. Based on your morphing spec, I'd suspect he'd
> charge $k*N/hr, for k some multiple
>>= 2, if he had to implement something for you.
> joe
> ****
> Joseph M. Newcomer [MVP]
> email: newcomer(a)flounder.com
> Web: http://www.flounder.com
> MVP Tips: http://www.flounder.com/mvp_tips.htm

First | Prev | Next | Last
Pages: 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89
Prev: Improving Pete'r Application Performance
Next: Competitors for Pet'e OCR system