Can extra processing threads help in this case? [MFC]

Prev: Improving Pete'r Application Performance
Next: Competitors for Pet'e OCR system

From: Hector Santos on 5 Apr 2010 19:09

Peter Olcott wrote:

> That is exactly what the SQLite design pattern does, and it
> allows clean restarts even after power failures.

Not quite. See below.

Well, I'm happy to read you took the suggestion to use SQLITE instead
of MYSQL nearly two weeks ago:

http://groups.google.com/group/microsoft.public.vc.mfc/msg/14d6c0bddb8b4a91

But you continue to misunderstand the main reason I suggested it, or
maybe you +did, because it was an appropriate simple tool for simple
applications.

It was because your application is so primitive, you need a primitive
SQL engine with little overhead but could only adequately supports
SINGLE ACCESS operations which is what your primitive FIFO design is.

SQLITE does not support record level operations and it completely
locks the entire database during write or update operations. In other
words, you can't read the database during any writes/updates.

But thats ok, for you, you are a single accessor application, you will
only be doing writes with no contention nor need to allow for reading
at the same time and you will lose more in SQL overhead with
transactional based SQL servers, like MSQL.

Therefore SQLITE is perfect for your lite weight, vapor ware
application.

However, just like any other OPEN FILE with memory caching. If your
dog chews your wall plug and your UPS-less machine dies, you will have
integrity issues just like any other opened file when the machine crashes.

Your, any data is ALWAYS only as good as the last write and flush.
No flushing, it open for lost. PERIOD.

--
HLS

From: Hector Santos on 5 Apr 2010 19:46

Peter Olcott wrote:

>> Serializing these operations to a single threa does not
>> guarantee transactional integrity.
>> See my previous discussion of the use of a hypothetical
>> BEGIN_TRANSACTION primitive.
>>
>> If you believe a single thread guarantees transactional
>> integrity, you are wrong.
>> joe
>
> It does gurantee that:
> (1) Read balance
> (2) Deduct 10 cents from balance
> (3) Write updated balance
> don't have any intervening updates between steps.

He's right Joe. For a Single Thread Access only SQLITE implementation,
he can guarantee no write contention issues. Perfect for any
application that doesn't have more than one thread. <grin>

But he doesn't understand is that when step 3 is running, step 1 is
locked for any other thread. But he doesn't need that, its a FIFO
based accessor :)

>> Actually, at the defail level, it IS hard. The assumption
>> is that for some pair of
>
> I don't see what is hard about using this design pattern,
> and I don't see how this could fail:

It isn't hard at all. That is why I suggested SQLITE for your simple
application idea with a single accessor concept.

> http://sqlite.org/atomiccommit.html
> You have a complete audit trail of every detail up to the
> point of failure.

HA! One thing I like about SQLITE is that they keep it real. They know
their place in SQL and will make no excuse for it, no incorrect
fantasy about what it CAN NOT do.

Basically what it means is that WRITE/FLUSHING are all done at the
same time because as I said above, the DATAFILE is LOCKED during
Write/update operations, hence you get the idea of "SnapShot" journal
for the integrity of the data file where there are no other
contention. But it doesn't have record level ideas to even CONSIDER
sector and cluster operations. Its a WHOLE or NOT at all. 0% or 100%
, ALL or Nothing. It can only do that with a SINGLE WRITE ACCESSOR
idea. See:

http://www.sqlite.org/atomiccommit.html#atomicsector

Step 3.7 is required - FLUSH. If your machine crashed before that -
you lose integrity! See 3.4 on LOCKING:

3.4 Obtaining A Reserved Lock

A single reserve lock can coexist with multiple shared
locks from other processes. However, there can only be a
single reserved lock on the database file. Hence only a
single process can be attempting to write to the database
at one time.

And one thing it doesn't mention is the OPPOSITE. If a process is
doing a SELECT (read only access), the database is LOCKED for any
write (INSERT, DELETE, UPDATE) access until the SELECT is complete.

Which is good; simple, not hard, easy for 99% of the people to
implement but I doubt you can, and will work very nicely for a single
accessor FIFO application.

In just in case, you don't understand the limitations read the
Appropriate Usages page:

http://sqlite.org/whentouse.html

You don't need a real SQL server or RDBMS since you don't have any
need for any one of the following:

- Multi-access Client/Server Application
- High-volume Website
- Very large dataset
- High Concurrency

--
HLS

From: Joseph M. Newcomer on 5 Apr 2010 20:20

See below....
On Mon, 5 Apr 2010 17:29:39 -0500, "Peter Olcott" <NoSpam(a)OCR4Screen.com> wrote:

>
>"Joseph M. Newcomer" <newcomer(a)flounder.com> wrote in
>message news:r7jkr55trmr5langp6gukohsb6d7dq9lu3(a)4ax.com...
>>>In the case where pread() and pwrite() must work together
>>>to
>>>provide a single atomic operation, such as the case of
>>>deducting ten cents from a customer's account balance you
>>>are correct. A simpler way to provide this atomicity for
>>>my
>>>purposes might be to only have a single thread that
>>>updates
>>>the client's balances. In this case there is no need of
>>>contriving a record lock on a system that lacks this
>>>inherent capability.
>> ****
>> Actually, they do not provide a single atomic operation,
>> and this should be obvious. The
>> CORRECT way to provide the atomicity is use a transacted
>> database. Nothing else is going
>> to be reliable, unless, of course, you re-implement
>> transacted databases on your own.
>
>When one is referring to transactions one is referring to at
>least one read operation combined with at least one write
>operation forming the single atomic transaction unit. This
>is useful when one may be deducting a dime from the
>customer's current account balance.
****
Huh? In what fantasy world did you come up with this idea? This in essence says a simple
file lock will guarantee transactional integrity, which in fact is NOT true! By declaring
the transaction "atomic" they mean that no matter WHAT happens, including the motherboard
catching fire, that at the end of the day, the operation either COMPLETED AS SPECIFIED or
NEVER HAPPENED AT ALL. There is no intermediate position possible, that is, where some
updates actually happened and others did not, so that the integrity of the database is
compromised. Instead, you have confused atomic actions on a file with atomic
transactions. Note that I can set a lock, read the data, write the data back, and release
the lock, and if the system crashes, the file is corrupted! Yet your silly requirement
for transactional atomicity has been met! Furthermore, if only SOME of the pages have
been written back to the file, and others are still pending, according to your definition,
transactional integrity has been maintained, but in reality the contents of the file might
be nonsense! Why do you perisist on using your personal definitions for what are
well-understood technical concepts? I would not recognize a simple file locking protocol
as guaranteeing transactional integrity, nor would any competent DMBS person. Yet you
think this is sufficient! You are WRONG! Get it through your head, TRANSACTIONAL
INTEGRITY IS A COMPLEX PROCESS BY WHICH THE TRUTH OF A DATA CONSISTENCY PREDICATE IS
GUARANTEED IN THE PRESENSCE OF UNRELIABLE HARDWARE AND/OR SOFTWARE.

A "transaction" requires that a set of composite updates be treated as an atomic update;
that is, the integrity predicate is maintained on the data. While there may be multiple
reads or multiple writers, the ultimate measure is the ability to maintain the predicates
about data integrity.

So by your deifnition

Lock(record #)
Read(record #)
.....update data
Write(record #)
Unlock(record #)

is a valid transactional operation on a database. But in fact, this is complete nonsense,
because if that record is never written to the disk, you cannot guarantee what is in it
after a system crash at an arbitrarry point. The reason is that consider that operation
as a File Modification Action (FMA)

Then we could ideally issue

FMA1
FMA2
FMA3

and you would think that the data was intact! BUt no, the truth is that FMAn for any n is
performed ON THE DATA IN THE FILE SYSTEM CACHE and is not necessarily committed to the
hard drive in any order, so the order in which the pages are written out are

FMA3
FMA1

and before the pages involved in FMA2 are written out, the motherboard explodes into
shards of copper and phenolic resin. So a proper TRANSACTION has not been committed, and
any data in FMA3 or FMA1 whose correctness depends on the results of FMA2 being on disk is
now incorrect. In a transacted file system, we would write

BEGIN_TRANSACTION();
FMA1
FMA2
FMA3
END_TRANSACTION()

and if there were ANY failure, then in effect NONE of these changes appear. Whether they
are discarded or queued up for retry is an design/implementation detail of the file system
itself; for example, it can be the case that they are queued up and the complete above
sequence is RE-EXECUTED, or it can be totally deleted so NOTHING has changed, but you will
NEVER have a partial change! You would think after I've explained transactional systems
to you several times you would have caught on, but instead, you have confused this with
the much more trivial "file lock" which is a completely unrelated piece of the action.

Also, you have to understand what is mean by "nested transactions" where, even though all
the inner transactions have been successfully committed, they have to be rolled back if
the outer transaction aborts, whether from programmatic request or because of catastrophic
failure.

This is very sad. You aren't even CLOSE to grasping the essence of transacted file
systems. You once again stopped boiling when you saw a thick sludge, and thought you had
the answer.

Locks do not guarantee transactional integrity. Only transacted file systems, or
transaction mechanisms built on top of unreliable file systems, accomplish this.
joe
*****
*****
>
>There are other less inclusive forms of atomicity that can
>also be relevant. If one thread needs to write to a record
>in a file (specific byte offset within the file), and
>another thread needs to append a record (fixed length set of
>bytes) to a file it is useful to know that these operations
>can complete without interfering with each other. This level
>of atomicity is sufficient for updating my transaction log,
>yet insufficient for updating the customer's account
>balance.
>
>>
>> If you think otherwise, you do not have a clue about what
>> a transacted database does, why
>> they are important, and why there are no alternatives!
>> ****
>>>
>>>In other less complex cases such as updating a status flag
>>>from [InProccesing] to [Completed] it would seem that the
>>>single guarantee that the pwrite() operation itself is
>>>atomic would provide sufficient atomicity for the current
>>>design (only one writer per record).
>> ****
>> pwrite() does not guarantee an atomic transaction. Your
>> failure to comprehend this after
>
>It doesn't have to, there are no transactions involved.
>
>> I have explained it several times is why I keep being
>> forced to apply the term "stupid".
>
>Because you are not paying enough attention to what I am
>saying.
>
>> THERE IS NO GUARANTEE! You did not find one in the
>> documentation, and you will not find
>
>There is a guarantee that two pwrites() will not collide
>with each other resulting in garbage. This is sufficient for
>the transaction log, yet insufficient for updating the
>customer's account balance.
>
>> one in the implementation. You have confused a concept
>> (atomic modification to a file)
>> with a much more important conept (a transacted database
>> that DOES NOT LOSE INFORMATION AS
>> A CONSEQUENCE OF ANY ABNORMAL TERMINATION OF THE APP OR
>> SYSTEM!)
>>
>
>Which as I have already said, is easily prevented by using
>the design pattern in SQLite.
>
>> Please explain how pwrite guarantees transactional
>> integrity. You obviously have some
>> information which is not in the documentation and cannot
>> be derived by reading the linux
>> source code.
>
>The transaction log does not need transactional integrity.
>It does need crash fault tolerance, which might be construed
>as a form of transactional integrity. I could make all of
>this moot by simply having a single process with a single
>thread that handles all disk writes to a specific resource.
>This would inherently have transaction integrity, and crash
>proofing would be as easy as using the SQLite design
>pattern.
>
>>
>> Consider the following
>>
>> Let f be a file handle
>> Let f be an empty file when this code starts
>>
>> char buf = '*';
>> for(int i = 0; i < SOME_SIZE; i++)
>> pwrite(f,&buf, 1, i);
>>
>> Now, if the systyem fails for ANY REASON in the loop, you
>> will, upon restart, find in the
>> file represented by handle f:
>>
>> (1) SOME_SIZE instances of '*'
>> (2) 0 <= i <= SOME_SIZE instances of '*', depending on the
>> exact value of i when the crash
>> occurred
>> (3) 0 instanceas of '*'
>> (4) Some indeterminate number of instances, m, of '*', 0
>> <= m <= i where i is the value
>> achieved when the system crashed, where m is unrelated to
>> the value of i when the system
>> crashed exxept that it must be less than or equal to it
>>
>> Hint: the correct answer is 4.
>
>Which is very easy to handle using the SQLite design
>pattern.
>
>>
>> Question: suppose the loop completes, and you are well
>> into some other, much later,
>> computation when the crash occurs.
>>
>> Hint: the correct answer is 4.
>>
>> Now, consider the following, where I hypotehsize the
>> existence of two primitives
>> BEGIN_TRANSACTION starts a transaction of the file handle
>> END_TRANSACTiON ends the transaction and commits the
>> changes to the file
>>
>> BEGIN_TRANSACTION(f); // [1]
>> for(int i = 0; i < SOME_SIZE; i++) // [2]
>> prwrite(...as above...); // [3]
>> END_TRANSACTION(f) // [4]
>>
>> If there is an abnormal temrination before line [4]
>> executes completely, the nuimber of
>> instances of * found will be
>> (choose one of the above answers)
>>
>> Hint: the correct answer is 3
>
>I already know what transaction processing is.
>
>>
>> In fact, there are points in the implementaiton of line 4
>> where either the transaction has
>> not been committed, and the correct answer is 3, or the
>> transaction is committed and the
>> correct answer is 1, but there is NEVER any point in the
>> code where any other answer is
>> possible! And that's why transacted systems are HARD to
>> write!
>>
>> Now consider the following:
>>
>> for(int i = 0; i < SOMES_SIZE; i++) // [1]
>> {
>> BEGIN_TRANSACTION(f); // [2]
>> pwrite(...as above...); // [3]
>> END_TRANSACTION(f); // [4]
>> }
>> // [5]
>>
>> Choose one of the answers.
>>
>> Hint: the correct answer is 2, and the exact value depends
>> on whether or not the crash
>> took place on lines1, 2, 3, 4 or 5. If it took place on
>> any line but 5, then the number
>> of *s will be i - 1. If it took place on line 5, the
>> number of *s will be i..
>>
>> If you do not understand the reasons why wrapping the file
>> operation in a hyopthetical
>> transaction makes a difference, go back and undertand what
>> a transacted database does. In
>> the case of a transacted database, the "add new record"
>> operation or "update record"
>> operation replaces pwrite.
>> ****
>
>That is exactly what the SQLite design pattern does, and it
>allows clean restarts even after power failures.
>
>
Joseph M. Newcomer [MVP]
email: newcomer(a)flounder.com
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm

From: Hector Santos on 5 Apr 2010 20:27

Hector Santos wrote:

>>> If you believe a single thread guarantees transactional integrity,
>>> you are wrong.
>>> joe
>>
>> It does gurantee that:
>> (1) Read balance
>> (2) Deduct 10 cents from balance
>> (3) Write updated balance
>> don't have any intervening updates between steps.
>
> He's right Joe. For a Single Thread Access only SQLITE implementation,
> he can guarantee no write contention issues. Perfect for any application
> that doesn't have more than one thread. <grin>
>
> But he doesn't understand is that when step 3 is running, step 1 is
> locked for any other thread. But he doesn't need that, its a FIFO based
> accessor :)

You know Joe, With his low volume, he really doesn't need any SQL
engine at all!

He can easily just write a single text FILE per request and per USER
account file system.

That will help with his target 100ms single transaction at a time FIFO
design need and he will have almost 100% crash restart integrity!

I would consider using an old school simple X12-like EDI format for
its transaction codes and user data fields and he might be able to
sell this idea for his B2B web service considerations with traditional
companies familiar and use EDI!

And whats good about using POTF (plain old text files), he can
leverage existing tools in all OSes:

- He can edit the text files with NOTEPAD or vi.
- He can delete accounts with DEL * or rm *
- He can back it up using zip or cab or gz!
- He can search using dir or ls!

Completely PORTABLE! SIMPLE! CHEAP! FAST! FAULT TOLERANCE! NETWORK
SHARABLE! ATOMIC FOR APPENDS! EXCLUSIVE, READ, WRITE FILE LOCKING!

<grin>

--
HLS

From: Hector Santos on 5 Apr 2010 20:52

Joseph M. Newcomer wrote:

> See below....
> On Mon, 5 Apr 2010 17:29:39 -0500, "Peter Olcott" <NoSpam(a)OCR4Screen.com> wrote:
>
>> When one is referring to transactions one is referring to at
>> least one read operation combined with at least one write
>> operation forming the single atomic transaction unit. This
>> is useful when one may be deducting a dime from the
>> customer's current account balance.
> ****
> Huh? In what fantasy world did you come up with this idea? This in essence says a simple
> file lock will guarantee transactional integrity, which in fact is NOT true! By declaring
> the transaction "atomic" they mean that no matter WHAT happens, including the motherboard
> catching fire, that at the end of the day, the operation either COMPLETED AS SPECIFIED or
> NEVER HAPPENED AT ALL. There is no intermediate position possible, that is, where some
> updates actually happened and others did not, so that the integrity of the database is
> compromised. Instead, you have confused atomic actions on a file with atomic
> transactions. Note that I can set a lock, read the data, write the data back, and release
> the lock, and if the system crashes, the file is corrupted! Yet your silly requirement
> for transactional atomicity has been met! Furthermore, if only SOME of the pages have
> been written back to the file, and others are still pending, according to your definition,
> transactional integrity has been maintained, but in reality the contents of the file might
> be nonsense! Why do you perisist on using your personal definitions for what are
> well-understood technical concepts? I would not recognize a simple file locking protocol
> as guaranteeing transactional integrity, nor would any competent DMBS person. Yet you
> think this is sufficient! You are WRONG! Get it through your head, TRANSACTIONAL
> INTEGRITY IS A COMPLEX PROCESS BY WHICH THE TRUTH OF A DATA CONSISTENCY PREDICATE IS
> GUARANTEED IN THE PRESENSCE OF UNRELIABLE HARDWARE AND/OR SOFTWARE.

Joe, joe, its an abstract concept sufficient for patentability! SQLITE
"None or ALL" file locking is sufficient for a poor's man SQL SnapShot
Journal recording Database System. <g>

> A "transaction" requires that a set of composite updates be treated as an atomic update;
> that is, the integrity predicate is maintained on the data. While there may be multiple
> reads or multiple writers, the ultimate measure is the ability to maintain the predicates
> about data integrity.

But you don't have multiple writers in SQLITE by design so SQLITE has
dumb down those time machine based predicate requirements.

>
> So by your deifnition
>
> Lock(record #)
> Read(record #)
> ....update data
> Write(record #)
> Unlock(record #)
>
> is a valid transactional operation on a database.

Good Enuf for Pedro! <g>

> But in fact, this is complete nonsense,
> because if that record is never written to the disk, you cannot guarantee what is in it
> after a system crash at an arbitrarry point. The reason is that consider that operation
> as a File Modification Action (FMA)

Wayyyyyyyyyyyyyyyyyyy too deep for Pedro!

> This is very sad. You aren't even CLOSE to grasping the essence of transacted file
> systems. You once again stopped boiling when you saw a thick sludge, and thought you had
> the answer.

Joe, a raw text file per record is good enough for him! :)

--
HLS

First | Prev | Next | Last
Pages: 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109
Prev: Improving Pete'r Application Performance
Next: Competitors for Pet'e OCR system