Can extra processing threads help in this case? [MFC]

Prev: Improving Pete'r Application Performance
Next: Competitors for Pet'e OCR system

From: Hector Santos on 5 Apr 2010 12:22

Peter Olcott wrote:

> "Hector Santos" <sant9442(a)nospam.gmail.com> wrote in message
> news:uKv1yAJ1KHA.3412(a)TK2MSFTNGP05.phx.gbl...
>> Peter Olcott wrote:
>>
>>> Sure right everyone knows that it is very easy to pay the
>>> 50-100% (per anum) rate of return that venture vultures
>>> want from the remaining 3% ownership of the company. I am
>>> sure that this happens hundreds of times every day.
>>
>> If that is what you were told that you had to pay them,
>> then they saw you coming from 100 miles away and basically
>> wanted to politely scare you away. Taking $15 grand from
>> you for a worthless patent can make even the worst people
>> feel guilty, especially over the holidays and need to face
>> family.
>>
>> --
>> HLS
>
> Your ignorance of venture capital (not HTTP protocol) is
> astounding!

Actually, I'm ignorant of many things, including HTTP. Which type of
VC did you go to? Patent Trolls Are Us? It might explains why you
have 100% of nothing.

Ok, you can have the last stamp in this thread. Good Luck with your
vapor ware!

--
HLS

--
HLS

From: Peter Olcott on 5 Apr 2010 14:56

"Hector Santos" <sant9442(a)nospam.gmail.com> wrote in message
news:%23BOJwwN1KHA.6108(a)TK2MSFTNGP06.phx.gbl...
> Peter Olcott wrote:
>
>> "Hector Santos" <sant9442(a)nospam.gmail.com> wrote in
>> message news:uKv1yAJ1KHA.3412(a)TK2MSFTNGP05.phx.gbl...
>>> Peter Olcott wrote:
>>>
>>>> Sure right everyone knows that it is very easy to pay
>>>> the 50-100% (per anum) rate of return that venture
>>>> vultures want from the remaining 3% ownership of the
>>>> company. I am sure that this happens hundreds of times
>>>> every day.
>>>
>>> If that is what you were told that you had to pay them,
>>> then they saw you coming from 100 miles away and
>>> basically wanted to politely scare you away. Taking $15
>>> grand from you for a worthless patent can make even the
>>> worst people feel guilty, especially over the holidays
>>> and need to face family.
>>>
>>> --
>>> HLS
>>
>> Your ignorance of venture capital (not HTTP protocol) is
>> astounding!
>
>
> Actually, I'm ignorant of many things, including HTTP.
> Which type of VC did you go to? Patent Trolls Are Us? It
> might explains why you have 100% of nothing.

It is common knowledge that venture capitalists must seek
enormous returns for their investment to make up for the
fact that most of these investments fail.

You are very smart man with lots of valuable knowledge, yet
your continued insistence on presumption makes you look
foolish. Presumption is always a foolish act.

>
> Ok, you can have the last stamp in this thread. Good Luck
> with your vapor ware!
>
>
> --
> HLS
>
>
>
> --
> HLS

From: Joseph M. Newcomer on 5 Apr 2010 15:31

See below...
On Sat, 3 Apr 2010 18:27:00 -0500, "Peter Olcott" <NoSpam(a)OCR4Screen.com> wrote:

>
>"Joseph M. Newcomer" <newcomer(a)flounder.com> wrote in
>message news:8vdfr5l11iu9e6k3fbp5im74r2hneqc5gb(a)4ax.com...
>> See below...
>> On Fri, 2 Apr 2010 14:32:07 -0500, "Peter Olcott"
>> <NoSpam(a)OCR4Screen.com> wrote:
>>
>>>>>What about FTP? I could do my on-the-fly backup using
>>>>>FTP.
>>>> ***
>>>> OK, and you have now introduced another complex
>>>> mechanism
>>>> into the validation of the state
>>>> diagram. How is it that FTP increases the reliability
>>>> without introducing what I might
>>>> call "gratuitous complexity". Note that your state
>>>> diagram now has to include the entire
>>>> FTP state diagram as a subcomponent. Have you looked at
>>>> the FTP state diagram? Especially
>>>> for the new, restartable, FTP protocols?
>>>> joe
>>>
>>>Yeah right the whole thing is so complex that no one could
>>>every possibly accomplish it so I might as well give up,
>>>right?
>> ****
>> No, it is doable, and vastly more complex things are done
>> every day. But they are done by
>> people who spend the time to LEARN what they are doing,
>> and don't design by throwing darts
>> at a wall full of buzzwords.
>>
>> What is flawed is your design process, in which you
>> identify a problem, and then propose
>> some completely-off-the-wall solution which may or may not
>> be appropriate, and in the case
>> of FTP, is probably the WRONG approach, because it adds
>> complexity without improving
>> reliability or robustness.
>
>This is my most efficient learning mode, and it does
>eventually result in some very excellent designs when
>completed.
****
No, creating off-the-wall solutions when you understand neither the problem domain nor the
relationship of the buzzword to reality does not constitute a good design methodology.

And I cannot think of a single instance in my career (which is longer than FTP has been
around----I was using one of the first versions of FTP, when the protocol was first
invented), when I would have thought "AHA! FTP is the solution to robustness and having
reliable persistent storage!" Of coursae, understanding something about what FTP is and
does helps a lot in rejecting it as a solution.

Perhaps if you used FTP from a command-line interface, where you have to supply login
credentials (you, of course, have a way of protecting the secret of the password!), issue
a ton of commands before you can start the transfer, deal with the transfer, etc., you
would appreciate how complex this actually is. By using "FTP", I presume you mean that
you will interact with an FTP server on the other side and you will be an FTP client. If
you mean something else, of course, you would not have said "FTP". But if, by "FTP", you
meant "storing the files on a remote sever using a proprietary protocol of myt own
invenion, or re-implementing a dedicated FTP-like server running on a different,
proprietary, port, but implmeneting the FTP protocol" then the answer is that you have
added a massive amount of complexity to a problem that transacted databases already
accomplish locally, without actually adding any reliability. Of course, to get more
reliability, using mirrored remote transacted databases would be an approach, but FTP
would be the wrong mechanism, and inventing your own mechanism (given how little you seem
to know about file system robustness) would be a very expensive, lengthy, and unsatisfying
proposition.

(By he way, you did not mention Secure FTP protocol, and the general rule is that if you
use standard FTP, you have immediately compromised password security; in fact there are
screipt-kiddie scripts that simply do packet sniffing waiting for FTP passsword packets to
go by)

Ultimately, we work with probabilities. If you are soling a case that happens once in
10**15 cases, it probably isn't worth worrying about. If it happens once in 10 cases, it
is critical that you solve it. In between its an engineering decision.
****
>
>>
>> I and others have design systems that had to keep running
>> after crashes, and I did it by
>> using a transacted database to keep track of the pending
>> operations. And I spent WEEKS
>> testing every possible failure mode before I released the
>> software to my client (who has
>> yet to find a problem in my design, which has now been
>> selling for ten years). I did NOT
>> toss wild concepts like "pwrite" and "FTP" around as if
>> they would solve the problem;
>
>Heh, but, this was not the very first time that you ever
>designed such a system was it? How would you have approached
>this design if you instead only had some rough ideas about
>how things worked?
****
As I said, I NEVER would have thought of FTP as a solution, because I know its
limitations. And, I would know that if I proposed it, I would be laughed out of the
meeting.

Instead, I would realize that a transacted database was the essential component, and would
have instead looking into transacted database technology, choosing one among a variety of
candidates based on some requirements (including cost) that I had produced.
****
>
>> instead, I analyzed what needed to be handled, and built
>> mechanisms that solved those
>> problems, based on carefully-analyzed state diagrams (I
>> described them to you) and
>> fundamentally reliable mechanisms like a transacted
>> database system at the core of the
>> solution.
>
>I like to fully understand the underlying infrastructure
>before I am fully confident of a design. For example, I now
>know the underlying details of exactly how SQLite can fully
>recover from a power loss. Pretty simple stuff really.
*****
Ask if it is a fully-transacted database and wht recover techniques are implmemented in
it. Talk to a a MySQL expert. Look into what a rollback of a transaction means. These
are specified for most database (my experience in looking at these predates MySQL, so I
don't know what it does; I haven't looked at this technology since 1985 or 1986)

That's all the understaning you need. Intellectual cuiriosity my suggest that you
understand how they implement this, but such understanding is not critical to the decision
process.
****
>
>>>
>>>I ALWAYS determine feasibility BEFORE proceeding with any
>>>further analysis.
>> ****
>> No, you have been tossing buzzwords around as if they are
>> presenting feasible solutions,
>> without justifying why you think they actually solve the
>> problem!
>
>On-the-fly transaction by transaction offsite backups may
>still be a good idea, even if it does not fit any
>pre-existing notions of conventional wisdom.
****
Actually, it does, and "remote mirrored transactions" covers the concept. This is a very
old idea, and right now major New York investment firms I know of are mirroriing every
transaction on severs 50 miles away, just in case of another 9/11 attack. And they were
doing this in the 1990s (the ones who weren't are now doing it!). So the idea is very
old, and you are just discovering it. So why not investigate what is available in
mirrored database support (it costs!),?
****

>I start with
>the most often false premise that all convention wisdom is
>pure hooey.
****
So you invent new hooey in its place?
****
>As this conventional wisdom proves itself item
>by item point by point, I accept the validity of this
>conventional wisdom only on those items and points that it
>specifically proved itself.
****
Let's see if I have this right:

(a) assume everyone else is wrong
(b) propose bizarre designs based on supercificial understanding and buzzword gatheriing
(c) wait for someone to refute them

At which point, you forgot
(d) accuse the people who refute me of being in refute mode and not listening to what I am
saying.
****
>This process makes those aspects
>of conventional wisdom that have room for improvement very
>explicit.
****
I have no idea what "conventional wisdom" is here; to me, the obvious situation is
solvable by a transacted database, and if you want to have 100% recovery in the fact of
incredibly unlikely events (e.g., power failure), you have to use more and more complex
(and expensive) solutions to address these low-probability events.

Perhaps in your world, power failures matter; in my world, they happen once a year, under
carefully controlled conditions that allow for graceful shutdown (the once-a-decade
windstorm or once-a-century blizzard that drop me back to battery backup power, at which
point I execute graceful shutdowns; nearby lightning hits that take out the entire block,
or something else that is going to last for an hour or more...the 1-second failures that
earned our power company the nickname "Duquesne Flicker & Flash" are covered by my UPS
units)

WHy have you fastened on the incredibly-low-probability event "power failure" and why have
you decided to treat it as the most common catastrophe?
****
>
>>
>> I use a well-known and well-understood concept, "atomic
>> transaction", you see the word
>> "atomic" used in a completely different context, and latch
>> onto the idea that the use you
>> saw corresponds to the use I had, which is simply not
>> true. An atomic file operation does
>
>I understood both well. My mind was not fresh on the
>atomicity of transaction until I thought about it again for
>a few minutes.
****
It isn't because we haven't tried to explain it to you.
****
>
>> NOT guarantee transactional integrity. File locks provide
>> a level of atomicity with
>> respect to record updates, but they do not in and of
>> themselves guarantee transactional
>> integrity. THe fundamental issue here is integrity of
>> the file image (which might be in
>
>They do provide one key aspect of exactly how SQLite
>provides transactional integrity.
>
>> the file system cache) and integrity of the file itself
>> (what you see after a crash, when
>> the file system cache may NOT have been flushed to disk!)
>> ****
>
>There are simple way to force this in Unix/Linux, I don't
>bother cluttering my head with their names, I will look them
>up again when the time comes
****
sync

Which actually doesn't, if you read it closely and understand what it does and does not
guarantee. I worked in Unix for 15 years, I know something about the reliability of its
file system. And I went to talks by people (Satyanariana, Ousterhout) who build reliable
file systems on top of Unix in spite of its fundamental limitations.
****
>There are even ways to flush
>the hard drives on-board buffer.
****
And one vendor I talked to at a trade show assured me that they had no way to flush the
onboard hard drive buffers, and when I asked "how do you handle transacted file systems?"
he simply said "We just blame Microsoft" So I know that there is at least one vendor for
which this is not supported. I presume you have talked with the hard drive vendors'
technical support people before you made this statement (given the evidence I have, I
would not trust such a statement until I had verified that the hard drive model we were
using actually supported this capability, and the file system used it, and that the OS had
the necessary SCSI/ATAPI command pass-thru to allow an application to invoke it. But
then, since we had to get a patch to Win2K to make our transacted file system work [the
problem was elevated to a "missiobn critical bug" within Microsoft, and the company I
worked for had enough clout to get the patch], maybe I just have a lot more experience in
this area and am consequently a lot more distrustful of silly statements which do not seem
to have a basis in reality)
>
>>>
>>>If FTP is not reliable, then I am done considering FTP. If
>>>FTP is reliable then it might possibly form a way for
>>>transaction by transaction offsite backup.
>> ****
>> FTP works, but that is not the issue. The REAL issue is,
>> will adding the complexity of an
>> FTP protocol contribute to the reliability of your system
>> in a meaningful fashion, or will
>> it just introduce a larger number of states such that you
>> have more cut-points in the
>> state transition diagram that have to be evaluated for
>> reliability? And, will the result
>
>Since I can not count on something not screwing up it seems
>that at least the financial transactions must have off-site
>backup. I would prefer this to be on a transaction by
>transaction basis, rather than once a period-of-time.
****
But the point is that it should have been OBVIOUS to you that this could not work! Because
if you had done the design that I say you have to do, to identify the state machine that
records transactions and identify each of the cut-points, it would be obvious that
implmenting another incredibly complex state machine within this would lead only to MORE
COMPLEX recovery, not less complex! Assume your transacted database is completely
reliable, look at its recovery/rollback protocols, and see how well they meet your needs
at the cutpoints that involve the transacted database! Compare the state diagram you get
with FTP to the state diagram you have without FTP! This is pretty elementary design
stuff, which should be derivable from basic principles (you don't need to have built a lot
of systems to understand this

You know DFAs. Simply express your transaction model as a DFA, and at every state
transition, you add a new state, "failure". Every state can transition to "failure".
Then, your recovery consists of examining the persistent state up to that point, and
deriving a NEW set of states, that essentially return you to a know point in the state
diagram, where you resume computations and attempt to reach a final state. That's all
there is to it.
joe
****
>
>> be a more effective recovery-from-cut-point or just more
>> complexity? You have failed to
>> take the correct approach to doing the design, so the
>> output of the design process is
>> going to be flawed.
>>
>
>Not at all. I have identified a functional requirement and
>provided a first guess solution. The propose solution is
>mutable, the requirement is not so mutable.
****
The point is that a requirements document leads to a specification document, and a
specification document does NOT specify the implementation details. A specification
document would include an analysis of all the failure cut-points and the necessary
recovery actions. It is then up to someone to derive an implementation from that
specification. When you start saying "FTP" and "pwrite" you are at the implementation
level!
****
>
>> Build a state machine of the transactions. At each state
>> transition, assume that the
>> machine will fail *while executing that arc* of the graph.
>> Then show how you can analyze
>> the resulting intermediate state to determine the correct
>> recovery procedure. If you do
>> this, concepts like "FTP" beome demonstrably
>> inappropriate, because FTP adds dozens of cut
>> points to the state transition diagram, making the
>> recovery that much more complex.
>> *****
>
>More like this:
>(1) I wait until the client gets their final result data.
>(2) Then deduct the dime from their account balance as a
>single atomic transaction.
>(3) Then I send a copy of this transaction to offsite
>backup.
****
Not an unreasonable design. Key here is that you are extending them credit on the
computation, and not charging them until it is delivered; this is akin to having an
inventory you buy in anticipation of sales.
joe
****
>
>>>
>>>It seems that you investigate all of the little nuances of
>>>every detail before even considering feasibly. That is not
>>>very efficient is it?
>> ****
>> But it DOES produce systems that actually WORK and RECOVER
>> from failures.
>
>If instead you would look at this using categories instead
>of details you could get to the same place much quicker, by
>elimination multitudes of details in one fell swoop. I
>admit that I may not even yet have the categories quite
>right, but, then this level of detailed design on this
>subject is all new to me.
>
>>
>> To give you an idea, one of the projects I worked on had
>> an MTBF of 45 minutes, had no
>> recovery, and failure was indeed catastrophic. A year
>> later, it ran for one of (a) six
>> weeks without failing in a way that impacted users (b)
>> failed once a day but recovered so
>> quickly and thoroughly nobody noticed. Actually, (b) was
>> the real situation; I just
>> examined all the cut-points (there were HUNDREDS) and made
>> sure that it could recover. My
>> fallback was that there was an exception, "throw
>> CATASTROPHIC_RESTART_REQUEST" (no, it
>> wasn't C++, it was Bliss-11) and when things got really
>> messed up, I'd throw that, and
>> this would engage in a five-second restart sequence that
>> in effect restarted the app from
>> scratch, and I rebuilt the state from transactionally
>> stored state. The program listing I
>> got was 3/4" thick; a year later, it was 4" thick. THAT's
>> what it takes to make a program
>> with hundreds of cut-points work reliably. This didn't
>> count the 1/2" of kernel mode code
>> I had to write to guarantee a transactional persistent
>> storage. That exception got thrown
>> about once a day, for reasons we could never discover (it
>> appeared to be memory bus
>> failure returning a NULL pointer, but I could recover even
>> from this!)
>
>My above design seems to have minimal complexity. By waiting
>until everything has completely succeeded before charging
>the customer all the complex transaction roll backs prior to
>this point become unnecessary.
>
>>
>> Those "little nuances" are the ONLY things that make the
>> difference between a design that
>> "runs" and one that is "bulletproof".
>>
>
>Some of the nuances are required, some can be made moot
>through using a simpler design.
>
>> The server managerment system I did a decade ago has a
>> massive amount of code in it that
>> essentially never executes, except in very rare and exotic
>> error recovery circumstances,
>> which almost never happen. Its startup code is thousands
>> of lines that try to figure out
>> where the cut-point was, and once this has been
>> determined, provides the necessary
>> recovery.
>
>I could simply start all over from scratch, as long as I
>could count on the original client request's validity.
>
>>
>> So don't talk to me about how to write bulletproof
>> software. I've done it. It is
>> expensive to build. And I know that your current design
>> approach is doing more to
>> generate buzzword solutions than to produce an actual
>> robust implementation.
>> joe
>> ****
>
>What I am trying to accomplish is inherently much simpler
>than the examples that you provided from your experience. I
>can leverage this greatly reduced inherent simplicity to
>derive a design with very high degrees of fault tolerance
>with a much simpler implementation strategy.
>
>>>
>> Joseph M. Newcomer [MVP]
>> email: newcomer(a)flounder.com
>> Web: http://www.flounder.com
>> MVP Tips: http://www.flounder.com/mvp_tips.htm
>
Joseph M. Newcomer [MVP]
email: newcomer(a)flounder.com
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm

From: Peter Olcott on 5 Apr 2010 16:35

"Joseph M. Newcomer" <newcomer(a)flounder.com> wrote in
message news:ro9kr5lk8kad3anflhhcj0iecrvosf381n(a)4ax.com...
> See below...
> On Sat, 3 Apr 2010 18:27:00 -0500, "Peter Olcott"
> <NoSpam(a)OCR4Screen.com> wrote:
>

>>I like to fully understand the underlying infrastructure
>>before I am fully confident of a design. For example, I
>>now
>>know the underlying details of exactly how SQLite can
>>fully
>>recover from a power loss. Pretty simple stuff really.
> *****
> Ask if it is a fully-transacted database and wht recover
> techniques are implmemented in
> it. Talk to a a MySQL expert. Look into what a rollback
> of a transaction means. These
> are specified for most database (my experience in looking
> at these predates MySQL, so I
> don't know what it does; I haven't looked at this
> technology since 1985 or 1986)
>
> That's all the understaning you need. Intellectual
> cuiriosity my suggest that you
> understand how they implement this, but such understanding
> is not critical to the decision
> process.

No. I need a much deeper understanding to approximate an
optimal mix of varied technologies. A transacted database
only solves one aspect of one problem, it does not even
solve every aspect of even this one problem.

> ****
>>
>>>>
>>>>I ALWAYS determine feasibility BEFORE proceeding with
>>>>any
>>>>further analysis.
>>> ****
>>> No, you have been tossing buzzwords around as if they
>>> are
>>> presenting feasible solutions,
>>> without justifying why you think they actually solve the
>>> problem!
>>
>>On-the-fly transaction by transaction offsite backups may
>>still be a good idea, even if it does not fit any
>>pre-existing notions of conventional wisdom.
> ****
> Actually, it does, and "remote mirrored transactions"
> covers the concept. This is a very
> old idea, and right now major New York investment firms I
> know of are mirroriing every
> transaction on severs 50 miles away, just in case of
> another 9/11 attack. And they were
> doing this in the 1990s (the ones who weren't are now
> doing it!). So the idea is very
> old, and you are just discovering it. So why not
> investigate what is available in
> mirrored database support (it costs!),?
> ****

No need to buy this It is easy enough to build from scratch.
It may double the complexity of my system, but, then this
system is really pretty simple anyway. Now that you gave me
the right terminology "remote mirrored transactions", I
could do a little search to see if their are any good
pre-existing design patterns. I already have a relatively
simple one in my head. The main piece that I had forgotten
about is the design pattern that SQLite provides on exactly
how to go about protecting against a power loss. I already
knew this one, but forgot about it.

>
> >I start with
>>the most often false premise that all convention wisdom is
>>pure hooey.
> ****
> So you invent new hooey in its place?
> ****
>>As this conventional wisdom proves itself item
>>by item point by point, I accept the validity of this
>>conventional wisdom only on those items and points that it
>>specifically proved itself.
> ****
> Let's see if I have this right:
>
> (a) assume everyone else is wrong
> (b) propose bizarre designs based on supercificial
> understanding and buzzword gatheriing
> (c) wait for someone to refute them

Basically only count on statements to the extent that they
are completely understood. Initially this process is (as you
have seen) quite chaotic. As comprehension grows and the
boundaries are better understood the process seems much more
reasonable. Eventually a near optimal solution is
interpolated upon.

>
> At which point, you forgot
> (d) accuse the people who refute me of being in refute
> mode and not listening to what I am
> saying.
> ****

This process necessarily must reject credibility as a basis
of truth. To the extent the supporting reasoning is not
provided, statements must be tentatively rejected.

>>This process makes those aspects
>>of conventional wisdom that have room for improvement very
>>explicit.
> ****
> I have no idea what "conventional wisdom" is here; to me,
> the obvious situation is
> solvable by a transacted database, and if you want to have
> 100% recovery in the fact of
> incredibly unlikely events (e.g., power failure), you have
> to use more and more complex
> (and expensive) solutions to address these low-probability
> events.

Yet another false assumption, this is the main source of
your mistakes. SQLite is absolutely free and its
architecture inherently provides for a power loss fault
recovery.

>
> Perhaps in your world, power failures matter; in my world,
> they happen once a year, under
> carefully controlled conditions that allow for graceful
> shutdown (the once-a-decade
> windstorm or once-a-century blizzard that drop me back to
> battery backup power, at which
> point I execute graceful shutdowns; nearby lightning hits
> that take out the entire block,
> or something else that is going to last for an hour or
> more...the 1-second failures that
> earned our power company the nickname "Duquesne Flicker &
> Flash" are covered by my UPS
> units)

That sound perfectly reasonable to me, and exactly what I
would expect. If one can also protect against a power loss
failure, and it only costs a tiny little bit of execution
time, then why not do this?

>
> WHy have you fastened on the incredibly-low-probability
> event "power failure" and why have
> you decided to treat it as the most common catastrophe?
> ****

It is one element on the list of possible faults. It might
be helpful if you could provide a list in order of
probability of the most frequently occurring faults. I am
sure that you could do this much better than I could right
now.

>>
>>>
>>> I use a well-known and well-understood concept, "atomic
>>> transaction", you see the word
>>> "atomic" used in a completely different context, and
>>> latch
>>> onto the idea that the use you
>>> saw corresponds to the use I had, which is simply not
>>> true. An atomic file operation does
>>
>>I understood both well. My mind was not fresh on the
>>atomicity of transaction until I thought about it again
>>for
>>a few minutes.
> ****
> It isn't because we haven't tried to explain it to you.
> ****
>>
>>> NOT guarantee transactional integrity. File locks
>>> provide
>>> a level of atomicity with
>>> respect to record updates, but they do not in and of
>>> themselves guarantee transactional
>>> integrity. THe fundamental issue here is integrity of
>>> the file image (which might be in
>>
>>They do provide one key aspect of exactly how SQLite
>>provides transactional integrity.
>>
>>> the file system cache) and integrity of the file itself
>>> (what you see after a crash, when
>>> the file system cache may NOT have been flushed to
>>> disk!)
>>> ****
>>
>>There are simple way to force this in Unix/Linux, I don't
>>bother cluttering my head with their names, I will look
>>them
>>up again when the time comes
> ****
> sync
>
> Which actually doesn't, if you read it closely and
> understand what it does and does not
> guarantee. I worked in Unix for 15 years, I know
> something about the reliability of its
> file system. And I went to talks by people (Satyanariana,
> Ousterhout) who build reliable
> file systems on top of Unix in spite of its fundamental
> limitations.
> ****
>>There are even ways to flush
>>the hard drives on-board buffer.
> ****
> And one vendor I talked to at a trade show assured me that
> they had no way to flush the
> onboard hard drive buffers, and when I asked "how do you
> handle transacted file systems?"
> he simply said "We just blame Microsoft" So I know that
> there is at least one vendor for
> which this is not supported. I presume you have talked
> with the hard drive vendors'
> technical support people before you made this statement
> (given the evidence I have, I
> would not trust such a statement until I had verified that
> the hard drive model we were
> using actually supported this capability, and the file
> system used it, and that the OS had
> the necessary SCSI/ATAPI command pass-thru to allow an
> application to invoke it. But
> then, since we had to get a patch to Win2K to make our
> transacted file system work [the
> problem was elevated to a "missiobn critical bug" within
> Microsoft, and the company I
> worked for had enough clout to get the patch], maybe I
> just have a lot more experience in
> this area and am consequently a lot more distrustful of
> silly statements which do not seem
> to have a basis in reality)

One might skip all of this and simply not count a
transaction as completed until another process sees this
transaction in the file. I would estimate from my somewhat
limited knowledge that this might work.

>>Since I can not count on something not screwing up it
>>seems
>>that at least the financial transactions must have
>>off-site
>>backup. I would prefer this to be on a transaction by
>>transaction basis, rather than once a period-of-time.
> ****
> But the point is that it should have been OBVIOUS to you
> that this could not work! Because
> if you had done the design that I say you have to do, to
> identify the state machine that
> records transactions and identify each of the cut-points,
> it would be obvious that
> implmenting another incredibly complex state machine
> within this would lead only to MORE
> COMPLEX recovery, not less complex! Assume your
> transacted database is completely
> reliable, look at its recovery/rollback protocols, and see
> how well they meet your needs
> at the cutpoints that involve the transacted database!
> Compare the state diagram you get
> with FTP to the state diagram you have without FTP! This
> is pretty elementary design
> stuff, which should be derivable from basic principles
> (you don't need to have built a lot
> of systems to understand this

>
> You know DFAs. Simply express your transaction model as a
> DFA, and at every state
> transition, you add a new state, "failure". Every state
> can transition to "failure".
> Then, your recovery consists of examining the persistent
> state up to that point, and
> deriving a NEW set of states, that essentially return you
> to a know point in the state
> diagram, where you resume computations and attempt to
> reach a final state. That's all
> there is to it.
> joe
> ****

Sounds good.

>>
>>> be a more effective recovery-from-cut-point or just more
>>> complexity? You have failed to
>>> take the correct approach to doing the design, so the
>>> output of the design process is
>>> going to be flawed.
>>>
>>
>>Not at all. I have identified a functional requirement and
>>provided a first guess solution. The propose solution is
>>mutable, the requirement is not so mutable.
> ****
> The point is that a requirements document leads to a
> specification document, and a
> specification document does NOT specify the implementation
> details. A specification
> document would include an analysis of all the failure
> cut-points and the necessary
> recovery actions. It is then up to someone to derive an
> implementation from that
> specification. When you start saying "FTP" and "pwrite"
> you are at the implementation
> level!
> ****

This may generally be the preferred approach. This case is
different. A big part of this whole process is me learning
the boundaries of the set of categories of solutions. Quite
often (in this case) the nature of the solution feeds back
into the requirements thus changing the requirements.
Example if one can also protect against a power failure with
little effort or expense, then let's do this too. This only
requires using the SQLite design pattern, or tools that use
this pattern.

I am still considering that a file be the primary means of
inter-process communication specifically because a file is
persistent. To do this well I must fully understand things
such as the SQLite fault tolerant design pattern, and many
other things.

>>
>>> Build a state machine of the transactions. At each
>>> state
>>> transition, assume that the
>>> machine will fail *while executing that arc* of the
>>> graph.
>>> Then show how you can analyze
>>> the resulting intermediate state to determine the
>>> correct
>>> recovery procedure. If you do
>>> this, concepts like "FTP" beome demonstrably
>>> inappropriate, because FTP adds dozens of cut
>>> points to the state transition diagram, making the
>>> recovery that much more complex.
>>> *****
>>
>>More like this:
>>(1) I wait until the client gets their final result data.
>>(2) Then deduct the dime from their account balance as a
>>single atomic transaction.
>>(3) Then I send a copy of this transaction to offsite
>>backup.
> ****
> Not an unreasonable design. Key here is that you are
> extending them credit on the
> computation, and not charging them until it is delivered;
> this is akin to having an
> inventory you buy in anticipation of sales.
> joe
> ****

No I don't generally extend credit. They must pay in advance
in at least one dollar increments. When the transaction
begins I check their balance. Because of sequencing issues
this simple design could result in a negative balance some
of the time. I would rather err on the client's side and on
the side of simplicity, at least initially. In the long run
I would still err on the client's side, but, maybe have some
added complexity.

For example I could deduct the full amount at the beginning
and then possibly have to roll back the transaction at many
possible failure points.

From: Joseph M. Newcomer on 5 Apr 2010 16:49

See below...
On Fri, 2 Apr 2010 10:39:45 -0500, "Peter Olcott" <NoSpam(a)OCR4Screen.com> wrote:

>
>"Joseph M. Newcomer" <newcomer(a)flounder.com> wrote in
>message news:923cr5t67b0tv6mjc0lpi8mnoo6e89mv69(a)4ax.com...
>> See below...
>> On Thu, 1 Apr 2010 14:16:30 -0500, "Peter Olcott"
>> <NoSpam(a)OCR4Screen.com> wrote:
>>>> || The term "Sparse Matrix" is taken to have the common
>>>> meaning
>>>> || of the term "Sparse" combined with the common
>>>> computer
>>>> science
>>>> || meaning of the term "Matrix", a two dimensional array
>>>> of elements.
>>>
>>> http://en.wikipedia.org/wiki/Sparse_matrix
>>>To be unambiguously distinguished from:
>>>
>>>In the subfield of numerical analysis, a sparse matrix is
>>>a
>>>matrix populated primarily with zeros (Stoer & Bulirsch
>>>2002, p. 619). The term itself was coined by Harry M.
>>>Markowitz.
>> ****
>> Actually, a sparse (2-D, to simplify the discussion)
>> matrix is a matrix of dimension [m*n]
>> whose values at some coordinates [i,j] is not defined.
>> 0.0 is a defined value, and
>> therefore a matrix with 0 values is not "sparse".
>> Essentially, a sparse matrix is a
>> matrix for which the mapping [i x j] -> value is not a
>> total mapping.
>>
>> A numerical analyst is free to redefine this in some other
>> terms, but the essence of a
>> sparse matrix is the failure to have a total mapping. The
>> trick on "comb vectors" is to
>> create an alternate representation that maintains the
>> partial mapping of the original
>> while implementing it as a vector such that the mapping
>> [i]->value is nearly total (the
>> difference represents the ability to do perfect
>> compaction). So in the abstract world, a
>> sparse 2-D matrix is characterized by a partial mapping
>> (forall)i (forall)j [i X j]. I
>> could try to write out the fully general characterization,
>> but without using Microsoft
>> Equation Editor it is clumsy. To interpret this any other
>> way is to misinterpret the
>> definition, or, as the numerical analysts have apparently
>> redefined it, to change the
>> definition to something else.
>> joe
>
>At the time that I wrote the above patent it seemed that the
>computer science definition of the term was comparable to
>the numerical analysis definition. This same definition
>seems to predominate the use of the term when the term
>[sparse matrix] is searched on google.
>
>I hung out in the misc.int.property group while I was
>prosecuting my patent. One of the things that I found there
>was that a patent has to be so clear that even a great
>effort to intentionally misconstrue its meaning will fail. I
>tried to let this standard be my guide.
*****
Yes, and the point is...? This is a requirement of a patent, that between the set of
claims and the descriptions of the preferred embodiments, anyone who is experienced in the
field can reproduce your work. Failure to disclose would invalidate a patent. So what is
new about the fact that you had to produce full disclosure?

All you are saying here is that you have tried to write a patent according to the rules
for writring a patent.

It has nothing to do with sparse matrices, no matter what you think they mean.

Consider the state transition matrix for a parser. Essentially the columns might
represent terminal symbols (including classes of terminals) and the rows represent
nonterminals (for an LR(1) grammar such as is accepted by YACC, this is how the parse
table is represented). Note in the case of parser generators, they might accept an input
specification more general than LR(1) and reduce it to LR(1) internally; a failure to
reduce it to LR(1) is flagged as an error in the grammar and must be fixed.

For a large set of states, being in a particular production and finding a terminal which
does not match it an error. We represent these by, for example, putting a 0 in the parse
table indicating "not a valid state". Thus, for N a nonterminal and t a terminal, the
index Nxt is a partial mapping which has (many, typically mostly) invalid mappings. An
invalid mapping is a syntax error in the string being parsed. So we have a sparse matrix,
one with incomplete mappings. Because of the limitations of representation, we encode
those invalid mapping in a total matrix as 0 values. It does not change the fact that the
mapping is partial; all it does is change the mapping so we are working in a total mapping
because that is what a rectangular array representation requires. YOu MUST have a total
mapping for a simple rectangular array, or your code doesn't work. Then, given, that
total mapping, you designate some particular value (e.g., 0) to mean "incomplete mapping
indicator") and as a metaconcept imposed on the total mapping you use that value to
indicate the partial mapping. Now, if you want to properly represent a sparse array in
minimal storage, you encode the detection of invalid state in other ways, so that you end
up with a more compact implementation of the partial mapping. That's what "comb vectors"
were all about: a new encoding of partial matrices. Because the machine does not have
any way to encode a partial mapping on a conventional [i,j]=>value map, we need to
implement an [i,j]=>undefined mapping on top of the rectangular mapping. A numerical
analyst might say the values are "mostly 0" but that is an interpretation of a sparse
matrix according to one encoding in one problem domain. Note that in a system in which
all values are valid, any encoding [i,j]=>value should produce a valid value, so strictly
speaking in a numerical analysis problem there is no way to distinguish a valid 0 from an
error 0. Or maybe it doesn't matter. In parse tables, the 0 value is the "invalid next
state" value, so whether we encode it directly in the mapping or derive it from a more
complex encoding doesn't matter to the upper level. Note that a matrix that encoded the
value 1 most everywhere could ALSO be encoded as a "sparse matrix" where 1 (let's call it
the identify mapping) is the most common value.

The ideal sparse matrix representation would be a Huffman encoding of the entire
matrix-as-sequential-addresses, but the mapping is now incredibly expensive; to determine
the mapping of [i,j] the matrix compression must be fully decoded each time, but only the
value of the [i,j] has to be kept, so as the decompression proceeds, all values other than
[i,j] are discarded and you don't need tons of extra space. Or you can Huffman-encode the
invidividual rows. Or use a 2-D mapping much like G4 faxes use for encoding, but they are
easy because the values are binary (black or white). So dobn't confuse an implementation
or a representation with a concept.

What was interesting about the comb-vector encoding was that it achieved 80% of the
theoretical optimum encoding, had constant-time access for any [i,j] pair, and could be
easily computed from the original matrix and from an mxn matrix had a space requirement
which was O(m+n+f(m,n)) when f(m,n) represented the packed result and was always < m*n.
I used a "first fit" aproach which was computationally fast and gave us between 80% and
90% of the predicted optimum; essentially, I allowed for (by simply recoding one
subroutine), best-fit (which was O( (m*n)) and optimum which was O((m*n)**p) for p >=2,
Essentially, it was in the extreme case the bin-packing problem which is NP-hard). But
the first-fit was O(m+n) in complexity, good enough for a production piece of code, and
typically reduced the matrix to 10% of its original size, I think 6% in the best case
(that is, using the size compression, I was able to get O(m + n + 0.1(m * n) == 1.1m +
1.1n space out of m*n). And that was Good Enough.
joe
****
>
(
Joseph M. Newcomer [MVP]
email: newcomer(a)flounder.com
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm

First | Prev | Next | Last
Pages: 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107
Prev: Improving Pete'r Application Performance
Next: Competitors for Pet'e OCR system