|
Prev: perl script to interact with PC port
Next: FAQ 1.15 Where can I get a list of Larry Wall witticisms?
From: nolo contendere on 23 Apr 2008 14:29 Scenario: I am expecting 3 files in a drop directory. They won't necessarily all arrive at the same time. I want to begin processing the each file as soon as it arrives (or as close to arrival time as is reasonable). Would the best way to go about this be to simply have a script that takes a filename as a parameter and marks the file as 'currently processing' when it begins to process the file (or could move the file to a different directory)? I could kick off 3 daemon processes looking in the drop directory, and sleep every 5 secs, for instance. That seems to me, to be a straightforward, if clumsy, approach. I was wondering if there was a module that could accomplish this task more elegantly--Parallel::ForkManager, at least in my experience, doesn't seem entirely suited to this particular task. Or I could code my own fork,exec,wait/waitpid. I know TMTOWTDI, but I was seeking to benefit from others' experience, and for a 'best practice'. Sorry there's no tangible code; this is more of a conceptual question I guess.
From: Peter Makholm on 23 Apr 2008 16:51 nolo contendere <simon.chao(a)fmr.com> writes: > I know TMTOWTDI, but I was seeking to benefit from others' experience, > and for a 'best practice'. If portability isn't a issue, you platform might support some kind of monitoring of parts of the filesystem. Then you can get events when files are created in you spool directory og moved there. Linux::Inotify2 is a linux only-solution I'm using for a couple of scripts. Another usable module coudl be SGI::FAM, which should be supported on a broader range of unices. I have been looking for something like Net::Server for spool dirs a couple of times without finding anything really useful. //Makholm
From: xhoster on 23 Apr 2008 17:24 nolo contendere <simon.chao(a)fmr.com> wrote: > Scenario: > I am expecting 3 files in a drop directory. They won't > necessarily all arrive at the same time. I want to begin processing > the each file as soon as it arrives (or as close to arrival time as is > reasonable). What is the relationship between the 3 files? Presumably, this whole thing will happen more than once, right, otherwise you wouldn't need to automate it? So what is the difference between "3 files show up, and that happens 30 times" and just "90 files show up"? > Would the best way to go about this be to simply have a > script that takes a filename as a parameter and marks the file as > 'currently processing' when it begins to process the file (or could > move the file to a different directory)? > > I could kick off 3 daemon processes looking in the drop directory, and > sleep every 5 secs, for instance. Do the file's contents show up atomically with the file's name? If not, the process could see the file is there and start processing it, even though it is not completely written yet. > > That seems to me, to be a straightforward, if clumsy, approach. I was > wondering if there was a module that could accomplish this task more > elegantly--Parallel::ForkManager, at least in my experience, doesn't > seem entirely suited to this particular task. Why don't you think it is suited? It seems well suited, unless there are details that I am missing (or maybe you are Windows or something where forking isn't as robust). my $pm=Parallel::ForkManager->new(3); foreach my $file (@ARGV) { $pm->start() and next; process($file); $pm->finish(); }; $pm->wait_all_children(); Where process() subroutine first waits for the named $file to exist, then processes it. Xho -- -------------------- http://NewsReader.Com/ -------------------- The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
From: Ted Zlatanov on 23 Apr 2008 17:35 On Wed, 23 Apr 2008 11:29:42 -0700 (PDT) nolo contendere <simon.chao(a)fmr.com> wrote: nc> I am expecting 3 files in a drop directory. They won't nc> necessarily all arrive at the same time. I want to begin processing nc> the each file as soon as it arrives (or as close to arrival time as is nc> reasonable). Would the best way to go about this be to simply have a nc> script that takes a filename as a parameter and marks the file as nc> 'currently processing' when it begins to process the file (or could nc> move the file to a different directory)? nc> I could kick off 3 daemon processes looking in the drop directory, and nc> sleep every 5 secs, for instance. nc> That seems to me, to be a straightforward, if clumsy, approach. I was nc> wondering if there was a module that could accomplish this task more nc> elegantly--Parallel::ForkManager, at least in my experience, doesn't nc> seem entirely suited to this particular task. nc> Or I could code my own fork,exec,wait/waitpid. Get Tie::ShareLite from CPAN. In each process, lock a shared hash and insert an entry for the new file when it's noticed in the idle loop. If the file already exists in the hash, do nothing. The first process to notice the file wins. Now, unlock the hash and work with the file. When done, move the file out, lock the hash again, and remove the entry you inserted. The advantage is that you can store much more in the hash than just the filename, so this is handy for complex processing. Also, no file renaming is needed. A simpler version is just to rename the file to "$file.$$" where $$ is your PID. If, after the rename, the renamed file is there, your process won against the others and you can work with the file. Note there could be name collisions with an existing file, but since PIDs are unique on the machine, you can just remove that bogus file. Just be aware this is the quick and dirty solution. Another approach is to use a Maildir structure, which can handle multiple readers and writers atomically, even over NFS. You just need to map your incoming queue into a Maildir structure; there's no need to actually have mail in the files. This is good if you expect lots of volume, network access, etc. complications to your original model. Ted
From: Ben Morrow on 23 Apr 2008 17:54 Quoth Peter Makholm <peter(a)makholm.net>: > nolo contendere <simon.chao(a)fmr.com> writes: > > > I know TMTOWTDI, but I was seeking to benefit from others' experience, > > and for a 'best practice'. > > If portability isn't a issue, you platform might support some kind of > monitoring of parts of the filesystem. Then you can get events when > files are created in you spool directory og moved there. > > Linux::Inotify2 is a linux only-solution I'm using for a couple of > scripts. Another usable module coudl be SGI::FAM, which should be > supported on a broader range of unices. SGI::FAM only works under Irix. I've been meaning to port it to other systems that support fam (and gamin, the GNU rewrite) but haven't got round to it yet. There is Sys::Gamin, but it doesn't have any tests and doesn't appear to be maintained. Other OS-specific alternatives include IO::KQueue for BSDish systems, and Win32::ChangeNotify for Win32. This seems like a perfect opportunity for someone to write an OS-independant wrapper module, but AFAIK no-one has yet. Ben
|
Next
|
Last
Pages: 1 2 3 Prev: perl script to interact with PC port Next: FAQ 1.15 Where can I get a list of Larry Wall witticisms? |