From: Scott Lurndal on 2 Apr 2010 15:25
"Peter Olcott" <NoSpam(a)OCR4Screen.com> writes:
>"Scott Lurndal" <scott(a)slp53.sl.home> wrote in message
>> In your application, I'd frankly avoid file operations in
>> favor of queues or ring-buffers in a MAP_SHARED mmap(2)
>> region. If you need the
>> queues to be persistent, map a file; otherwise map
>> anonymous (linux) or
>This may not be flushed to disk often enough to meet my
>needs. It seems that append can at least be forced to flush
>to disk immediately. Although forcing it to flush to disk
>may be very inefficient, I am estimating that it won't cost
>much if there are very few bytes being written, far less
>than 512 bytes.
If you don't set O_SYNC or O_DSYNC when you open your file,
the data will _not_ be flushed to disk immediately. It may
be delayed by a considerable period unless you call
fsync(2) or fdatasync(2) to force the flush. Note that 'fflush'
does _not_ require the data to be flushed to disk, just from
the user-mode buffers in libc into kernel-mode buffers in the
With mmap, you can explictly call 'msync(2)' on a specific
address range to flush that range to the backing device if
From: Scott Lurndal on 2 Apr 2010 15:30
David W Noon <dwnoon(a)spamtrap.ntlworld.com> writes:
base64 encoded messages are frowned upon, being not supported by many
From: Peter Olcott on 2 Apr 2010 15:49
"Joe Beanfish" <joe(a)nospam.duh> wrote in message
> On 04/01/10 19:23, Peter Olcott wrote:
>> I am trying to convert my proprietary OCR software into a
>> web application. Initially there will be multiple
>> one for each web request, and a single threaded process
>> servicing these web requests. Eventually there may be
>> multiple threads servicing these web requests.
> I'd use a database to maintain the queue. Sometimes you
> use the filesystem to accomplish database like operations.
> file per record. Separate directories for pending and
> jobs. Mail systems often do that. One file for the mail
> msg, one
> for the headers, and maybe another for status info.
> If using the filesystem as a database use "mv" to
> atomic operations:
> write to tmpfile.pid
> mv tmpfile.pid readytogo.img
> queue reader looks for *.img
I was going to use a single file with binary data and fixed
length records to keep track of all of the web requests. I
also proposed named pipes as the means of notification of
new web requests, and completed requests.
From: Peter Olcott on 2 Apr 2010 15:50
"Scott Lurndal" <scott(a)slp53.sl.home> wrote in message
> "Peter Olcott" <NoSpam(a)OCR4Screen.com> writes:
>>"Scott Lurndal" <scott(a)slp53.sl.home> wrote in message
>>> In your application, I'd frankly avoid file operations
>>> favor of queues or ring-buffers in a MAP_SHARED mmap(2)
>>> region. If you need the
>>> queues to be persistent, map a file; otherwise map
>>> anonymous (linux) or
>>This may not be flushed to disk often enough to meet my
>>needs. It seems that append can at least be forced to
>>to disk immediately. Although forcing it to flush to disk
>>may be very inefficient, I am estimating that it won't
>>much if there are very few bytes being written, far less
>>than 512 bytes.
> If you don't set O_SYNC or O_DSYNC when you open your
> the data will _not_ be flushed to disk immediately. It may
> be delayed by a considerable period unless you call
> fsync(2) or fdatasync(2) to force the flush. Note that
> does _not_ require the data to be flushed to disk, just
> the user-mode buffers in libc into kernel-mode buffers in
> file cache.
> With mmap, you can explictly call 'msync(2)' on a specific
> address range to flush that range to the backing device if
Someone else also brought up the possible issue on flushing
the disk's own onboard buffer. Is this really a problem?
From: Ersek, Laszlo on 2 Apr 2010 16:34
On Fri, 2 Apr 2010, Peter Olcott wrote:
> "Scott Lurndal" <scott(a)slp53.sl.home> wrote in message
>> With mmap, you can explictly call 'msync(2)' on a specific address
>> range to flush that range to the backing device if required.
> Someone else also brought up the possible issue on flushing the disk's
> own onboard buffer. Is this really a problem?
The Linux manual pages for close(2) and fsync(2) allude to this.
I think your server will run on an UPS that will be able to notify the
kernel (via a serial port or so) to shut down cleanly if power is failing,
so I wouldn't worry about the disk hardware. If you do wish to protect
against hardware failures, that's a different weight class.
2.0 Hardware Assumptions
SQLite does not assume that a sector write is atomic. However, it does
assume that a sector write is linear. By "linear" we mean that SQLite
assumes that when writing a sector, the hardware begins at one end of the
data and writes byte by byte until it gets to the other end. The write
might go from beginning to end or from end to beginning. If a power
failure occurs in the middle of a sector write it might be that part of
the sector was modified and another part was left unchanged. The key
assumption by SQLite is that if any part of the sector gets changed, then
either the first or the last bytes will be changed. So the hardware will
never start writing a sector in the middle and work towards the ends. We
do not know if this assumption is always true but it seems reasonable.
I wouldn't even think of reimplementing this. People have dedicated their
lives to research it.
I didn't re-read the linked-to page now, but since SQLite is used by many
applications run by non-privileged users, I doubt SQLite tries to access
any hardware directly. "SQLite is Transactional" nonetheless
<http://sqlite.org/transactional.html>, so I'd assume you don't need
hardware sync either.
.... What about using SQLite for safe job storage, and using the other
mechanisms only for notification, so you don't have to poll?
<http://www.sqlite.org/threadsafe.html> I apologize if this has already