From: Peter Olcott on

"Ian Collins" <ian-news(a)hotmail.com> wrote in message
news:827aifFp8jU17(a)mid.individual.net...
> On 04/ 9/10 12:00 PM, Peter Olcott wrote:
>> "Ian Collins"<ian-news(a)hotmail.com> wrote in message
>> news:8278l3Fp8jU16(a)mid.individual.net...
>>> On 04/ 9/10 11:48 AM, Peter Olcott wrote:
>>>>
>>>> I am trying to have completely reliable writes to a
>>>> transaction log. This transaction log includes
>>>> financial
>>>> transactions. Even if someone pulls the plug in the
>>>> middle
>>>> of a transaction I want to only lose this single
>>>> transaction
>>>> and not have and other missing or corrupted data.
>>>>
>>>> One aspect of the solution to this problem is to make
>>>> sure
>>>> that all disk write are immediately flushed to the
>>>> actual
>>>> platters. It is this aspect of this problem that I am
>>>> attempting to solve in this thread.
>>>
>>> Can't you rely on your database to manage this for you?
>>
>> Not for the transaction log because it will not be in a
>> database. The transaction log file will be the primary
>> means of IPC. Named pipes will provide event notification
>> of
>> changes to the log file, and the file offset of these
>> changes.
>
> It sounds very much (here and in other threads) like you
> are trying to reinvent database transactions. just sore
> everything in a database and signal watchers when data is
> updated. Databases had atomic transactions licked decades
> ago!
>
> --
> Ian Collins

I don't want to make the system much slower than necessary
merely to avoid learning how to do completely reliable file
writes.

There is too much overhead in a SQL database for this
purpose because SQL has no means to directly seek a specific
record, all the overhead of accessing and maintaining
indices would be required. I want to plan on 100
transactions per second on a single core processor because
that is the maximum speed of my OCR process on a single page
of data. I want to spend an absolute minimum time on every
other aspect of processing, and file I/O generally tends to
be the primary bottleneck to performance.

The fastest possible persistent mechanism would be a binary
file that is not a part of a SQL database. All access to
records in this file would be by direct file offset.
Implementing this in SQL could have a tenfold degradation in
performance.

I will be using a SQL database for my user login and account
information.


From: Ian Collins on
On 04/ 9/10 12:43 PM, Peter Olcott wrote:
> "Ian Collins"<ian-news(a)hotmail.com> wrote:
>>
>> It sounds very much (here and in other threads) like you
>> are trying to reinvent database transactions. just sore
>> everything in a database and signal watchers when data is
>> updated. Databases had atomic transactions licked decades
>> ago!
>>
> I don't want to make the system much slower than necessary
> merely to avoid learning how to do completely reliable file
> writes.

The magic word there is "necessary". It's not just the file writes but
whole business with named pipes.

> There is too much overhead in a SQL database for this
> purpose because SQL has no means to directly seek a specific
> record, all the overhead of accessing and maintaining
> indices would be required. I want to plan on 100
> transactions per second on a single core processor because
> that is the maximum speed of my OCR process on a single page
> of data. I want to spend an absolute minimum time on every
> other aspect of processing, and file I/O generally tends to
> be the primary bottleneck to performance.

100 transactions per second isn't that great a demand. Most databases
have RAM based tables, so the only file access would the write through.
The MySQL InnoDB storage engine is optimised for this.

> The fastest possible persistent mechanism would be a binary
> file that is not a part of a SQL database. All access to
> records in this file would be by direct file offset.
> Implementing this in SQL could have a tenfold degradation in
> performance.

Have you benchmarked this? Even if that is so, it might still be 10x
faster than is required.

> I will be using a SQL database for my user login and account
> information.

So you have to opportunity to do some benchmarking.

--
Ian Collins
From: Peter Olcott on

"Ian Collins" <ian-news(a)hotmail.com> wrote in message
news:827d0gFp8jU18(a)mid.individual.net...
> On 04/ 9/10 12:43 PM, Peter Olcott wrote:
>> "Ian Collins"<ian-news(a)hotmail.com> wrote:
>>>
>>> It sounds very much (here and in other threads) like you
>>> are trying to reinvent database transactions. just sore
>>> everything in a database and signal watchers when data
>>> is
>>> updated. Databases had atomic transactions licked
>>> decades
>>> ago!
>>>
>> I don't want to make the system much slower than
>> necessary
>> merely to avoid learning how to do completely reliable
>> file
>> writes.
>
> The magic word there is "necessary". It's not just the
> file writes but whole business with named pipes.

Yeah, but why did you bring this up, aren't named pipes
trivial and fast?

>
>> There is too much overhead in a SQL database for this
>> purpose because SQL has no means to directly seek a
>> specific
>> record, all the overhead of accessing and maintaining
>> indices would be required. I want to plan on 100
>> transactions per second on a single core processor
>> because
>> that is the maximum speed of my OCR process on a single
>> page
>> of data. I want to spend an absolute minimum time on
>> every
>> other aspect of processing, and file I/O generally tends
>> to
>> be the primary bottleneck to performance.
>
> 100 transactions per second isn't that great a demand.
> Most databases have RAM based tables, so the only file
> access would the write through. The MySQL InnoDB storage
> engine is optimised for this.

Exactly how fault tolerant is it with the server's power
cord yanked from the wall?

>
>> The fastest possible persistent mechanism would be a
>> binary
>> file that is not a part of a SQL database. All access to
>> records in this file would be by direct file offset.
>> Implementing this in SQL could have a tenfold degradation
>> in
>> performance.
>
> Have you benchmarked this? Even if that is so, it might
> still be 10x faster than is required.

My time budget is no time at all, (over and above the 10 ms
that my OCR process already used) and I want to get as close
to this as possible. Because of the file caching that you
mentioned it is possible that SQL might be faster.

If there was only a way to have records numbered in
sequential order, and directly access this specific record
by its record number. It seems so stupid that you have to
build, access and maintain a whole index just to access
records by record number.

>
>> I will be using a SQL database for my user login and
>> account
>> information.
>
> So you have to opportunity to do some benchmarking.
>
> --
> Ian Collins


From: Ian Collins on
On 04/ 9/10 01:21 PM, Peter Olcott wrote:
> "Ian Collins"<ian-news(a)hotmail.com> wrote in message
> news:827d0gFp8jU18(a)mid.individual.net...
>> On 04/ 9/10 12:43 PM, Peter Olcott wrote:
>>> "Ian Collins"<ian-news(a)hotmail.com> wrote:
>>>>
>>>> It sounds very much (here and in other threads) like you
>>>> are trying to reinvent database transactions. just sore
>>>> everything in a database and signal watchers when data
>>>> is
>>>> updated. Databases had atomic transactions licked
>>>> decades
>>>> ago!
>>>>
>>> I don't want to make the system much slower than
>>> necessary
>>> merely to avoid learning how to do completely reliable
>>> file
>>> writes.
>>
>> The magic word there is "necessary". It's not just the
>> file writes but whole business with named pipes.
>
> Yeah, but why did you bring this up, aren't named pipes
> trivial and fast?

I don't use them. But I'm sure the time spent on your named pipe thread
would have been plenty of time for benchmarking!

>>> There is too much overhead in a SQL database for this
>>> purpose because SQL has no means to directly seek a
>>> specific
>>> record, all the overhead of accessing and maintaining
>>> indices would be required. I want to plan on 100
>>> transactions per second on a single core processor
>>> because
>>> that is the maximum speed of my OCR process on a single
>>> page
>>> of data. I want to spend an absolute minimum time on
>>> every
>>> other aspect of processing, and file I/O generally tends
>>> to
>>> be the primary bottleneck to performance.
>>
>> 100 transactions per second isn't that great a demand.
>> Most databases have RAM based tables, so the only file
>> access would the write through. The MySQL InnoDB storage
>> engine is optimised for this.
>
> Exactly how fault tolerant is it with the server's power
> cord yanked from the wall?

As good as any. If you want 5 nines reliability you have to go a lot
further than synchronous writes. My main server has highly redundant
raid (thinks to ZFS), redundant PSUs and a UPS. I'm not quite at the
generator stage yet, our power here is very dependable :).

>>> The fastest possible persistent mechanism would be a
>>> binary
>>> file that is not a part of a SQL database. All access to
>>> records in this file would be by direct file offset.
>>> Implementing this in SQL could have a tenfold degradation
>>> in
>>> performance.
>>
>> Have you benchmarked this? Even if that is so, it might
>> still be 10x faster than is required.
>
> My time budget is no time at all, (over and above the 10 ms
> that my OCR process already used) and I want to get as close
> to this as possible. Because of the file caching that you
> mentioned it is possible that SQL might be faster.
>
> If there was only a way to have records numbered in
> sequential order, and directly access this specific record
> by its record number. It seems so stupid that you have to
> build, access and maintain a whole index just to access
> records by record number.

You don't. The database engine does.

--
Ian Collins
From: Peter Olcott on

"Ian Collins" <ian-news(a)hotmail.com> wrote in message
news:827evjFp8jU20(a)mid.individual.net...
> On 04/ 9/10 01:21 PM, Peter Olcott wrote:
>> "Ian Collins"<ian-news(a)hotmail.com> wrote in message
>> news:827d0gFp8jU18(a)mid.individual.net...
>>> On 04/ 9/10 12:43 PM, Peter Olcott wrote:
>>>> "Ian Collins"<ian-news(a)hotmail.com> wrote:
>>> 100 transactions per second isn't that great a demand.
>>> Most databases have RAM based tables, so the only file
>>> access would the write through. The MySQL InnoDB storage
>>> engine is optimised for this.
>>
>> Exactly how fault tolerant is it with the server's power
>> cord yanked from the wall?
>
> As good as any. If you want 5 nines reliability you have
> to go a lot further than synchronous writes. My main
> server has highly redundant raid (thinks to ZFS),
> redundant PSUs and a UPS. I'm not quite at the generator
> stage yet, our power here is very dependable :).
>
>>>> The fastest possible persistent mechanism would be a
>>>> binary
>>>> file that is not a part of a SQL database. All access
>>>> to
>>>> records in this file would be by direct file offset.
>>>> Implementing this in SQL could have a tenfold
>>>> degradation
>>>> in
>>>> performance.
>>>
>>> Have you benchmarked this? Even if that is so, it might
>>> still be 10x faster than is required.
>>
>> My time budget is no time at all, (over and above the 10
>> ms
>> that my OCR process already used) and I want to get as
>> close
>> to this as possible. Because of the file caching that
>> you
>> mentioned it is possible that SQL might be faster.
>>
>> If there was only a way to have records numbered in
>> sequential order, and directly access this specific
>> record
>> by its record number. It seems so stupid that you have to
>> build, access and maintain a whole index just to access
>> records by record number.
>
> You don't. The database engine does.

Does the MySQL InnoDB storage engine have a journal file
like SQLite for crash recovery?

>
> --
> Ian Collins