|
Prev: trying to recursively get the files' owners and permissions as well as an md5sum of the data
Next: [announcement] paexec-0.9.0 released
From: Stephane CHAZELAS on 17 Jun 2008 10:14 2008-06-16, 21:32(-04), Albretch Mueller: > I also need the full path to the file, the modification times of the files, > which find gives you via: > ~ > sh-3.1# find /ramdisk -type f -printf "%A@ %C@ %T@" > ~ > I have tried different things using "find" and "ls" and you can always do > it in two steps: > ~ > sh-3.1# md5sum `find /ramdisk -type f ` | sort > ~ > and > ~ > sh-3.1# ls /ramdisk -lRa > ~ > The closest I have gone (I think) is getting all the info I need (except > the md5sum) from > ~ > sh-3.1# find /ramdisk -type f -printf "%A@ %C@ %T@ %s " -exec ls -l "{}" \; > ~ > But I haven't been able to succesfully cobble up my second snippet with the > last one to get all I want in an efficient way with one piece of Linux/Unix > bash/OS-level script, without having to actually code in a high level lang > ~ > What am I missing, both in my code and conceptually? > ~ [...] What information of ls -l can't you obtain with GNU find's -printf? find . -type f -printf '%M %n %u %g %T@ %A@ %C@ ' -exec md5sum {} \; for instance. -- St�phane
From: Stephane CHAZELAS on 17 Jun 2008 12:18 2008-06-16, 23:46(-04), Albretch Mueller: > Stephane CHAZELAS wrote: > >> find . -type f -printf '%M %n %u %g %T@ %A@ %C@ ' -exec md5sum {} \; > > Well, I was missing the file length "%s" which I included myself ;-) thank > you > ~ > sh-3.1# find . -type f -printf '%M %n %u %g %T@ %A@ %C@ %s ' -exec md5sum -b > {} \; > ~ > but I still don't get right is the part about making sure that this script > is not going to stumble on file names containing spaces and other > non-standard characters for that I have read you must use "-print0 | > xargs -0" declarations The problem is when you post-process that output. How to you want it to be post processed? It's true that the output of the above command cannot be post processed reliably if file names may contain newline characters. You can add a -printf '\0' Or -printf '//\n' (NUL and "\\" are things that won't appear otherwise in the output) to make it clear where each filename ends. [...] > Also, is it safe once you mount a fs that is not native to Linux, say a fs > based on MacOS, BSD, FAT32 or ntfs? > ~ > How do properties of one fs are represented by "find" once you mount it > within Linux/Unix? I know the answer to that q will take more than a script > fix. Could you, please, point me to some good info pertianing the interplay > among these issues? [...] Here, you're only listing their name and anyway, you would be using the same API to access the files on any of those FS. I'm not sure what kind of problem you're concerned about. -- St�phane
From: Stephane CHAZELAS on 17 Jun 2008 12:59 2008-06-17, 00:21(-04), Albretch Mueller: > Also, I have read somewhere that coding like this: > ~ > sh-3.1# md5sum `find . -type f -print0 | xargs -0` > ~ > is better than doing it like: > ~ > sh-3.1# find . -type f -print0 | xargs -0 md5sum > ~ > I actually read what this guy said. (S)He didn't say "faster" or "less > memory taxing", which are both measurable, but "better" because md5sum is > loaded into memory only once > ~ > I don't really know how the OS handles this, so I am asking [...] That's nonsense. find ... -print0 | xargs -0 cmd Tells find to output each filename followed by the NUL character. The NUL character is the one character that cannot occur in a file path on Unix. xargs -0 tells xargs to split it's input on the NUL character and that pass each element resulting of the splitting to the command. So that cmd gets one argument per file found by find which is fine. The only improvement one might suggest is to also use the -r (also GNU specific) option to xargs so that it doesn't run cmd if its input is empty (if find didn't find any file). sh-3.1# md5sum `find . -type f -print0 | xargs -0` couldn't be more wrong. Here, as the cmd is not provided, xargs calls the "echo" command instead. So the files found by find will be passed as arguments to echo. echo is a command that outputs its arguments separated by the space character. It also performs some transformations on those arguments, for instance it transforms the "\n" string into a newline character. Then that output of echo (there can be several instances of echo called) is gathered by the shell (because of `...`) and stored in memory. When xargs has finished, then the *shell* will split all that output. The splitting in `...` is done by default on spaces, tabs and newline characters. Then, for every word resulting from that splitting, the shell performs globbing, that is for every word that contains wildcard characters such as *, ?, [...], the shell will try to expand that to the matching files relative to the current directory. And then, it will pass that big list as arguments to the md5sum command (and contrary to xargs, it will not work around the limitation on the number of arguments). As an example, if you do: touch 'some file with *a* newline character in it, \n and plenty of spaces' find . -type f -print0 will output: some<NL>file with *a* newline character in it, \n and plenty of spaces<NUL> xargs -0 reading that will split it in one argument to echo: some<NL>file with *a* newline character in it, \n and plenty of spaces echo will output: some<NL>file with *a* newline character in it, <NL> and plenty of spaces<NL> `...` will split that into those elements: 1 some 2 file 3 withs 4 *a* 5 newline 6 character 7 in 8 it, 9 and 10 plenty 11 of 12 spaces The 4th one contains wildcards, so is subject to globbing. *a* means any file name that contains "a". And the file happens to match, so the list becomes: 1 some 2 file 3 withs 4 some<NL>file with *a* newline character in it, \n and plenty of spaces 5 newline 6 character 7 in 8 it, 9 and 10 plenty 11 of 12 spaces And those will be passed as arguments to md5sum. -- St�phane
From: Andre Majorel on 18 Jun 2008 02:27
On 2008-06-17, Albretch Mueller <lbrtchx(a)gmail.com> wrote: > Also, I have read somewhere that coding like this: > > sh-3.1# md5sum `find . -type f -print0 | xargs -0` > > is better than doing it like: > > sh-3.1# find . -type f -print0 | xargs -0 md5sum > > I actually read what this guy said. (S)He didn't say "faster" > or "less memory taxing", which are both measurable, but > "better" because md5sum is loaded into memory only once Without going into what's wrong with the first command (St�phane took care of that)... � cmd2 `cmd1` � does generally NOT create fewer cmd2 processes than � cmd1 | xargs cmd2 �. When the size of cmd1's output is below xargs' limit, both commands spawn exactly one cmd2 process. When the size of cmd1's output happens to fall between xargs' limit and the system's limit, � cmd1 | xargs cmd2 � will create one too many cmd2 process. There's no reason for xargs' limit to be different from the system's limit so this is not a common occurrence. When the size of cmd1's output is above the system limit, � cmd1 | xargs cmd2 � will create two or more cmd2 processes and � cmd2 `cmd1` � will just fail. Some would consider that a show stopper. :-> -- Andr� Majorel <URL:http://www.teaser.fr/~amajorel/> "Cette supposition rappelle assez celle de ce pr�dicateur qui, en pleine chaire, faisait remarquer � ses fid�les la bont� de Dieu qui avait plac� les rivi�res aupr�s des villes." -- Alexandre Dumas |