IPC based on name pipe FIFO and transaction log file [Unix Programming]

Prev: LBW 0.1: Linux Binaries on Windows
Next: socket

From: Scott Lurndal on 2 Apr 2010 16:54

"Peter Olcott" <NoSpam(a)OCR4Screen.com> writes:
>
>"Scott Lurndal" <scott(a)slp53.sl.home> wrote in message
>news:Bwrtn.1$rJ2.0(a)news.usenetserver.com...
>> "Peter Olcott" <NoSpam(a)OCR4Screen.com> writes:
>>>
>>>"Scott Lurndal" <scott(a)slp53.sl.home> wrote in message
>>
>>>> In your application, I'd frankly avoid file operations
>>>> in
>>>> favor of queues or ring-buffers in a MAP_SHARED mmap(2)
>>>> region. If you need the
>>>> queues to be persistent, map a file; otherwise map
>>>> anonymous (linux) or
>>>
>>>This may not be flushed to disk often enough to meet my
>>>needs. It seems that append can at least be forced to
>>>flush
>>>to disk immediately. Although forcing it to flush to disk
>>>may be very inefficient, I am estimating that it won't
>>>cost
>>>much if there are very few bytes being written, far less
>>>than 512 bytes.
>>
>> If you don't set O_SYNC or O_DSYNC when you open your
>> file,
>> the data will _not_ be flushed to disk immediately. It may
>> be delayed by a considerable period unless you call
>> fsync(2) or fdatasync(2) to force the flush. Note that
>> 'fflush'
>> does _not_ require the data to be flushed to disk, just
>> from
>> the user-mode buffers in libc into kernel-mode buffers in
>> the
>> file cache.
>>
>> With mmap, you can explictly call 'msync(2)' on a specific
>> address range to flush that range to the backing device if
>> required.
>>
>>
>> scott
>
>Someone else also brought up the possible issue on flushing
>the disk's own onboard buffer. Is this really a problem?
>

Most server class drives don't buffer writes. Many desktop
drives do, to give the illusion of higher performance.

In linux, the hdparm command can be used to change the write caching
state for ATA/IDE/SATA drives. For scsi drives, a mode-select
program will change the write caching state of the drive.

scott

From: David Schwartz on 2 Apr 2010 18:07

On Apr 2, 3:33 am, Vitus Jensen <vi...(a)alter-schwede.de> wrote:

> I'm coming from another platform where the maxime was "a thread should do
> nothing very well" (PF), which I always interpreted as to code your thread
> so that they spend most of their time waiting. So yes, you need a thread
> to wait. 99% of the time it should do nothing else but wait.

That's pretty dumb. You don't need a thread to do nothing. You can do
nothing without a thread.

> Are threads a sparse resource in linux? I thought the limit for a typical
> system is well above 1000.

The problem is not that threads are a scarce resource but that CPU
time is a scarce resource. You don't want to waste it forcing the
scheduler to make the "right" thread run.

> And if a thread is waiting for data appearing on a filehandle how could it
> create context switches? It's just lying there, the only thing it's
> occupying is address space and some kernel memory.

Consider two applications on a system with 2 cores. One uses 20
threads to wait for each of 20 things to happen. One uses 1 thread to
wait for events and then dispatches to a pool. The first one will
require each of the 20 threads to run to service those 20 events,
requiring 20 context switches. The second one will only require three
threads to run to service those 20 events, requiring 3 context
switches.

This has the greatest real-world effect with something like a web
server. Imagine a web server that is handling 20,000 clients. To send
1,500 bytes to each of those 20,000 clients, an application with
20,000 threads all blocking on 'write' to their sockets will need
20,000 context switches.

> > Instead, have one thread that waits until anything is possible. When
> > something is possible, it wakes another thread to wait for the next
> > thing to be possible and it does X, Y, Z, or whatever it was just told
> > is now possible to do.

> In this case you need code to decide what to do when woken up in your
> application. A switch and a call to the corresponding worker routine,
> passing matching context data via stack to that routine.

You need that in the other case too. It's just that you're pushing it
to the kernel and scheduler.

> > This results in far fewer context switches and better utilization of
> > CPU code and data caches. (Of course, if the web part is an
> > insignificant fraction of resource usage, it might not matter.)

> This doesn't result in more context switches (see above) but in more
> application code which puts a heavier load on code and data caches.

It does result in *way* more context switches. Orders of magnitude
more context switches.

> If you start those worker routines as threads, the decision making about
> what worker to run is moved into the kernel which is highly optimised for
> that kind of work.

No, it's not. It cannot be in principle because you have taken away
from it the discretion it would need to find an optimal solution. You
have said "when event X occurs, you must run thread Y". You have
forced the solution on the scheduler. If thread Y is inconvenient to
run, too bad.

> Additionally your worker threads keep their context
> data local and may hide that data structure from other threads/modules
> which give a much cleaner, simpler and safer design.

I don't follow this argument.

> Give threads a chance, they are there for a reason.

Yes, they are there for many reasons, this is just not one of them.
Those reasons include:

1) If you unexpectedly get delayed, say due to a page fault, the
process as a whole can continue to make forward progress.

2) Some kinds of I/O cannot easily be done asynchronously or you
cannot easily pend more than one at a time. Threads let you do this.

3) Without threads, you cannot easily take advantage of more than one
core.

4) In the 90% of the code for a typical application that's not
performance critical, doing it in a thread means you don't have to
worry that *any* blocking from *any* cause will destroy your
performance. You can just write it in a simple, straightforward way
and block if you need to. Without threads, every line of code is
performance critical because if it blocks, the process blocks.

DS

From: David Schwartz on 2 Apr 2010 18:12

On Apr 1, 5:09 pm, sc...(a)slp53.sl.home (Scott Lurndal) wrote:

> A single write or pwrite call on a file with O_APPEND
> is required by the SUS to ensure that the write is performed
> atomically with respect to other writes to the same file which also have
> the O_APPEND flag set. The order, of course, is not
> guaranteed.

Where do you find this? The only thing I can see is SUS guaranteeing
that another file operation can't sneak in between the implied seek
and the start of the write.

Suppose I write two programs. They each open the same file with
O_APPEND and also memory map a randomly-selected 64MB file from a
remote NFS server. Each then does a write to the file opened with
O_APPEND to write the 64MB of memory-mapped data from the remote file
to the local file. Where does SUS require that one process not write
one byte of data until the other process has written every single
byte?

DS

From: Peter Olcott on 2 Apr 2010 22:18

"Ersek, Laszlo" <lacos(a)caesar.elte.hu> wrote in message
news:Pine.LNX.4.64.1004022206050.1774(a)login01.caesar.elte.hu...
> On Fri, 2 Apr 2010, Peter Olcott wrote:
>
>> "Scott Lurndal" <scott(a)slp53.sl.home> wrote in message
>> news:Bwrtn.1$rJ2.0(a)news.usenetserver.com...
>
>>> With mmap, you can explictly call 'msync(2)' on a
>>> specific address range to flush that range to the
>>> backing device if required.
>
>> Someone else also brought up the possible issue on
>> flushing the disk's own onboard buffer. Is this really a
>> problem?
>
> The Linux manual pages for close(2) and fsync(2) allude to
> this.
>
> http://www.kernel.org/doc/man-pages/online/pages/man2/close.2.html
> http://www.kernel.org/doc/man-pages/online/pages/man2/fsync.2.html
>
> I think your server will run on an UPS that will be able
> to notify the kernel (via a serial port or so) to shut
> down cleanly if power is failing, so I wouldn't worry
> about the disk hardware. If you do wish to protect against
> hardware failures, that's a different weight class.
>
> http://sqlite.org/atomiccommit.html
>
> ----v----
> 2.0 Hardware Assumptions
>
> [...]
>
> SQLite does not assume that a sector write is atomic.
> However, it does assume that a sector write is linear. By
> "linear" we mean that SQLite assumes that when writing a
> sector, the hardware begins at one end of the data and
> writes byte by byte until it gets to the other end. The
> write might go from beginning to end or from end to
> beginning. If a power failure occurs in the middle of a
> sector write it might be that part of the sector was
> modified and another part was left unchanged. The key
> assumption by SQLite is that if any part of the sector
> gets changed, then either the first or the last bytes will
> be changed. So the hardware will never start writing a
> sector in the middle and work towards the ends. We do not
> know if this assumption is always true but it seems
> reasonable.
>
> [...]
> ----^----
>
> I wouldn't even think of reimplementing this. People have
> dedicated their lives to research it.
>
> I didn't re-read the linked-to page now, but since SQLite
> is used by many applications run by non-privileged users,
> I doubt SQLite tries to access any hardware directly.
> "SQLite is Transactional" nonetheless
> <http://sqlite.org/transactional.html>, so I'd assume you
> don't need hardware sync either.
>
> ... What about using SQLite for safe job storage, and
> using the other mechanisms only for notification, so you
> don't have to poll?
> <http://www.sqlite.org/threadsafe.html> I apologize if
> this has already been discussed.
>
> lacos

That looks like a good idea. I just bought the book on
Amazon. What other IPC mechanisms might you suggest?

From: Ersek, Laszlo on 3 Apr 2010 00:52

On Fri, 2 Apr 2010, Peter Olcott wrote:

> "Ersek, Laszlo" <lacos(a)caesar.elte.hu> wrote in message
> news:Pine.LNX.4.64.1004022206050.1774(a)login01.caesar.elte.hu...

>> ... What about using SQLite for safe job storage, and
>> using the other mechanisms only for notification, so you
>> don't have to poll?
>> <http://www.sqlite.org/threadsafe.html> I apologize if
>> this has already been discussed.
>
> That looks like a good idea. I just bought the book on
> Amazon. What other IPC mechanisms might you suggest?

One idea might be: write a long-lived daemon, restarted by the init
process if it crashes. The daemon would do the following:

1. create a PID file
2. block SIGUSR1 in the main thread and then install a simplistic handler
3. spawn N worker threads (with SIGUSR1 blocked)
4. pull jobs out of the database and hand them off to the worker
threads until there are no more recently added jobs left
5. wait for SIGUSR1 with sigsuspend()
6. go back to step 4.

Worker threads would process the requests and store the result back into
the DB for later retrieval. (Same or different table.) You have to be very
careful when designing and implementing the state transitions for
individual jobs. Make sure it is no problem to pick up any job in any
state (except the succesful termination state) and to continue / retry
from there. There don't need to be many states. Invent as few as possible.
Don't try to prevent redundant operations after a crash, or in case a
second daemon instance is started erroneously. Rather make sure the
operations (eg. storing the result) are idempotent. This is more robust.
Don't rely on the daemon's presence, rely on persistent job states and
clearly defined elementary state transitions. Treat your daemon as a
single-shot batch utility that happens to have a sometimes functional loop
in it.

The queue between the main thread and the worker threads can have limited
depth. It is no problem if the main thread blocks in step 4 for some time.

Make another, short-lived CGI program invoked by the web server that
stores the new job in the DB (with some unique ID strictly greater than
ID's generated before), one that sends a SIGUSR1 to the process identified
by the PID file thereafter. (Or implement this in PHP or whatever.) If a
SIGUSR1 was pending on that process anyway (eg. due to jobs arriving
quickly in parallel, in a burst), this is idempotent; SIGUSR1 is not
queued. If the daemon was already selecting jobs from the table, it will
wake up immediately in step 5 after finishing the loop and then make a
possibly empty round, but that's no problem.

If you don't trust the PID file to be valid (perhaps you try to send
SIGUSR1 while init is reaping and restarting the crashed daemon), you
could render the CGI program to the daemon itself. If O_CREAT | O_EXCL
succeeds with the PID file, become the daemon, otherwise, send a signal to
the daemon. This is not infallible, but seems good enough. A variation:
try to bind a unix domain datagram socket. EADDRINUSE -> send message to
listening daemon (perhaps after setting O_NONBLOCK); success -> become
listening daemon.

Just my $0.02. Sorry if I misunderstood what you intend to do.

lacos

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7 8
Prev: LBW 0.1: Linux Binaries on Windows
Next: socket