From: writeson on
Hi all,

I'm writing some code that monitors a directory for the appearance of
files from a workflow. When those files appear I write a command file
to a device that tells the device how to process the file. The
appearance of the command file triggers the device to grab the
original file. My problem is I don't want to write the command file to
the device until the original file from the workflow has been copied
completely. Since these files are large, my program has a good chance
of scanning the directory while they are mid-copy, so I need to
determine which files are finished being copied and which are still
mid-copy.

I haven't seen anything on Google talking about this, and I don't see
an obvious way of doing this using the os.stat() method on the
filepath. Anyone have any ideas about how I might accomplish this?

Thanks in advance!
Doug
From: Larry Bates on
writeson wrote:
> Hi all,
>
> I'm writing some code that monitors a directory for the appearance of
> files from a workflow. When those files appear I write a command file
> to a device that tells the device how to process the file. The
> appearance of the command file triggers the device to grab the
> original file. My problem is I don't want to write the command file to
> the device until the original file from the workflow has been copied
> completely. Since these files are large, my program has a good chance
> of scanning the directory while they are mid-copy, so I need to
> determine which files are finished being copied and which are still
> mid-copy.
>
> I haven't seen anything on Google talking about this, and I don't see
> an obvious way of doing this using the os.stat() method on the
> filepath. Anyone have any ideas about how I might accomplish this?
>
> Thanks in advance!
> Doug

The best way to do this is to have the program that copies the files copy them
to a temporarily named file and rename it when it is completed. That way you
know when it is done by scanning for files with a specific mask.

If that is not possible you might be able to use pyinotify
(http://pyinotify.sourceforge.net/) to watch for WRITE_CLOSE events on the
directory and then process the files.

-Larry

From: Manuel Vazquez Acosta on
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

This seems a synchronization problem. A scenario description could clear
things up so we can help:

Program W (The workflow) copies file F to directory B
Program D (the dog) polls directory B to find is there's any new file F

In this scenario, program D does not know whether F has been fully
copied, but W does.

Solution:
Create a custom lock mechanism. Program W writes a file D/F.lock to
indicate file F is not complete, it's removed when F is fully copied.
I program W crashes in mid-copy both F and F.lock are kept so program D
does not bother to process F. Recovery from the crash in W would another
issue to tackle down.

Best regards,
Manuel.

writeson wrote:
> Hi all,
>
> I'm writing some code that monitors a directory for the appearance of
> files from a workflow. When those files appear I write a command file
> to a device that tells the device how to process the file. The
> appearance of the command file triggers the device to grab the
> original file. My problem is I don't want to write the command file to
> the device until the original file from the workflow has been copied
> completely. Since these files are large, my program has a good chance
> of scanning the directory while they are mid-copy, so I need to
> determine which files are finished being copied and which are still
> mid-copy.
>
> I haven't seen anything on Google talking about this, and I don't see
> an obvious way of doing this using the os.stat() method on the
> filepath. Anyone have any ideas about how I might accomplish this?
>
> Thanks in advance!
> Doug
> --
> http://mail.python.org/mailman/listinfo/python-list
>

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkh04skACgkQI2zpkmcEAhi0eQCgsVqg51fWiwi47jxqtbR8Gz2U
UukAoKm15UAm3KpEyjhsIGQ+68rq8WuU
=UFHi
-----END PGP SIGNATURE-----
From: norseman on

Also available:
pgm-W copies/creates-fills whatever B/dummy
when done, pgm-W renames B/dummy to B/F
pgm-D only scouts for B/F and does it thing when found

Steve
norseman(a)hughes.net


Manuel Vazquez Acosta wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> This seems a synchronization problem. A scenario description could clear
> things up so we can help:
>
> Program W (The workflow) copies file F to directory B
> Program D (the dog) polls directory B to find is there's any new file F
>
> In this scenario, program D does not know whether F has been fully
> copied, but W does.
>
> Solution:
> Create a custom lock mechanism. Program W writes a file D/F.lock to
> indicate file F is not complete, it's removed when F is fully copied.
> I program W crashes in mid-copy both F and F.lock are kept so program D
> does not bother to process F. Recovery from the crash in W would another
> issue to tackle down.
>
> Best regards,
> Manuel.
>
> writeson wrote:
>> Hi all,
>>
>> I'm writing some code that monitors a directory for the appearance of
>> files from a workflow. When those files appear I write a command file
>> to a device that tells the device how to process the file. The
>> appearance of the command file triggers the device to grab the
>> original file. My problem is I don't want to write the command file to
>> the device until the original file from the workflow has been copied
>> completely. Since these files are large, my program has a good chance
>> of scanning the directory while they are mid-copy, so I need to
>> determine which files are finished being copied and which are still
>> mid-copy.
>>
>> I haven't seen anything on Google talking about this, and I don't see
>> an obvious way of doing this using the os.stat() method on the
>> filepath. Anyone have any ideas about how I might accomplish this?
>>
>> Thanks in advance!
>> Doug
>> --
>> http://mail.python.org/mailman/listinfo/python-list
>>
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.9 (GNU/Linux)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
>
> iEYEARECAAYFAkh04skACgkQI2zpkmcEAhi0eQCgsVqg51fWiwi47jxqtbR8Gz2U
> UukAoKm15UAm3KpEyjhsIGQ+68rq8WuU
> =UFHi
> -----END PGP SIGNATURE-----
> --
> http://mail.python.org/mailman/listinfo/python-list
>

From: writeson on
Guys,

Thanks for your replies, they are helpful. I should have included in
my initial question that I don't have as much control over the program
that writes (pgm-W) as I'd like. Otherwise, the write to a different
filename and then rename solution would work great. There's no way to
tell from the os.stat() methods to tell when the file is finished
being copied? I ran some test programs, one of which continously
copies big files from one directory to another, and another that
continously does a glob.glob("*.pdf") on those files and looks at the
st_atime and st_mtime parts of the return value of os.stat(filename).
From that experiment it looks like st_atime and st_mtime equal each
other until the file has finished being copied. Nothing in the
documentation about st_atime or st_mtime leads me to think this is
true, it's just my observations about the two test programs I've
described.

Any thoughts? Thanks!
Doug