From: Danmath on 12 Aug 2010 11:59
On 12 ago, 00:24, gordonb.a9...(a)burditt.org (Gordon Burditt) wrote:
> I strongly recommend that the application filter out the files ".",
> "..", directories, special files, sockets, FIFOs, and anything
> matching "*.core" as candidates for input files. While you're at
> it, you could filter out "*.tmp" as well.
Filenames are matched with parametered file interfaces defined in
Using .tmp files is ok, altough this requires modification of the
I wanted to know if there is a standard safe non coordination
requiering way to do this.
> Question: with your method of file transfer, imagine a file is
> halfway transferred, then the network cable is cut. Does the partial
> file get left there indefinitely, or does the (FTP, perhaps) daemon
> eventually detect that the transfer has failed and *DELETE* the
> partial incoming file? If not, can it be made to do so? How quickly
> do failed partial incoming files vanish? That's a ballpark figure
> for any timestamp age threshholds.
I would have to look at that, altough it's difficult. Many different
sending on one side. Not easy to get the info on the other side, I
from outside the company. That's why I wanted to know if there is a
safe non coordination requiering way to do this, aparently not, but
insight has been interesting at least.
> >It wouldn't fix the OWCOWC problem, but the current version doesn't
> >either. I just don't like this modification time checking. If I could
> >open the file knowing it will return error if some other application
> >is no writing to it already,
> You want it to return error if some other application is *NOT* writing
> to it already?
No, typing error, meant 'if some other application is writing'.
> Look up fdopen(). The point of this would most likely be to call open()
> with various exotic flags, then proceed with the file copy using stdio
> functions if the open succeeded. You can also go the other way with
> fileno() to get the underlying file descriptor number if, say, you want
> to put locks on it after fopen()ing it.
> I still think the transfer-and-rename approach has a lot to be said
> for it. That could also include initially transferring the file
> into a subdirectory, then renaming it to the top-level directory.
Yes, as a general rule I think that writing to another directory
on the same file sistem and then moving the file atomically to the
input directory is the best solution. Of course the application
creating and filing the file should be in charge of the move/rename
> Another approach, used by things like print spoolers and UUCP, is
> to transfer one or more data files, then transfer a "job" file which
> names the data files to use and what to do with them. The "job"
> file doesn't get created until the associated data files have been
> transferred successfully. The "job" file also tends to be very
> short (fits in one packet, contains a few lines mostly consisting
> of filename(s) ).
That's a methos used in other proceses. Token files are used. These
are empty files. The format of the name describes the type of token
and the name contains fields which give certain information. These
files are created after leaving the data files in the input directory.
In other cases where there are batch process chains, each batch checks
that the previous o next batch in line are not running. "ps -ef | grep
is used for this. So if you have a chain of batches A->B->C->D only A
or B with D can run at the same time. A directory is shared only
batches and each batch uses a temporal directory and moves the file
when it's finished.
Tokens are included to the same scheme in some cases.
From: Rick Jones on 12 Aug 2010 13:51
Gordon Burditt <gordonb.a9lxp(a)burditt.org> wrote:
> Assuming the files are being transferred in with FTP, this isn't
> enough. It is easy for a network hiccup, like one dropped packet,
> to cause the modification time to get a few seconds old. It could
> get a minute old before the TCP connection starts giving errors on
> either end.
In fact it does not have to be FTP, it could be rcp, scp and so on.
And, if this is strictly on the receiving end, it could be much longer
than a minute. It could be two hours or more if the file transfer
application is relying on SO_KEEPALIVE to detect that the connection
is no longer viable. If a TCP connection endpoint is receive-only
then it is only an application-layer timeout or the SO_KEEPALIVE that
will cause it to be terminated when the remote goes away without a
trace - without those, the endpoint will remain open until the system
thankfully, more versed on TCP than O_EXCL :)
Wisdom Teeth are impacted, people are affected by the effects of events.
these opinions are mine, all mine; HP might not want them anyway... :)
feel free to post, OR email to rick.jones2 in hp.com but NOT BOTH...