From: Stephane CHAZELAS on
2008-06-16, 21:32(-04), Albretch Mueller:
> I also need the full path to the file, the modification times of the files,
> which find gives you via:
> ~
> sh-3.1# find /ramdisk -type f -printf "%A@ %C@ %T@"
> ~
> I have tried different things using "find" and "ls" and you can always do
> it in two steps:
> ~
> sh-3.1# md5sum `find /ramdisk -type f ` | sort
> ~
> and
> ~
> sh-3.1# ls /ramdisk -lRa
> ~
> The closest I have gone (I think) is getting all the info I need (except
> the md5sum) from
> ~
> sh-3.1# find /ramdisk -type f -printf "%A@ %C@ %T@ %s " -exec ls -l "{}" \;
> ~
> But I haven't been able to succesfully cobble up my second snippet with the
> last one to get all I want in an efficient way with one piece of Linux/Unix
> bash/OS-level script, without having to actually code in a high level lang
> ~
> What am I missing, both in my code and conceptually?
> ~
[...]

What information of ls -l can't you obtain with GNU find's
-printf?

find . -type f -printf '%M %n %u %g %T@ %A@ %C@ ' -exec md5sum {} \;

for instance.

--
St�phane
From: Stephane CHAZELAS on
2008-06-16, 23:46(-04), Albretch Mueller:
> Stephane CHAZELAS wrote:
>
>> find . -type f -printf '%M %n %u %g %T@ %A@ %C@ ' -exec md5sum {} \;
>
> Well, I was missing the file length "%s" which I included myself ;-) thank
> you
> ~
> sh-3.1# find . -type f -printf '%M %n %u %g %T@ %A@ %C@ %s ' -exec md5sum -b
> {} \;
> ~
> but I still don't get right is the part about making sure that this script
> is not going to stumble on file names containing spaces and other
> non-standard characters for that I have read you must use "-print0 |
> xargs -0" declarations

The problem is when you post-process that output.

How to you want it to be post processed?

It's true that the output of the above command cannot be post
processed reliably if file names may contain newline characters.

You can add a -printf '\0'

Or -printf '//\n'

(NUL and "\\" are things that won't appear otherwise in the
output)

to make it clear where each filename ends.

[...]
> Also, is it safe once you mount a fs that is not native to Linux, say a fs
> based on MacOS, BSD, FAT32 or ntfs?
> ~
> How do properties of one fs are represented by "find" once you mount it
> within Linux/Unix? I know the answer to that q will take more than a script
> fix. Could you, please, point me to some good info pertianing the interplay
> among these issues?
[...]

Here, you're only listing their name and anyway, you would be
using the same API to access the files on any of those FS. I'm
not sure what kind of problem you're concerned about.

--
St�phane
From: Stephane CHAZELAS on
2008-06-17, 00:21(-04), Albretch Mueller:
> Also, I have read somewhere that coding like this:
> ~
> sh-3.1# md5sum `find . -type f -print0 | xargs -0`
> ~
> is better than doing it like:
> ~
> sh-3.1# find . -type f -print0 | xargs -0 md5sum
> ~
> I actually read what this guy said. (S)He didn't say "faster" or "less
> memory taxing", which are both measurable, but "better" because md5sum is
> loaded into memory only once
> ~
> I don't really know how the OS handles this, so I am asking
[...]

That's nonsense.

find ... -print0 | xargs -0 cmd

Tells find to output each filename followed by the NUL
character. The NUL character is the one character that cannot
occur in a file path on Unix. xargs -0 tells xargs to split it's
input on the NUL character and that pass each element resulting
of the splitting to the command. So that cmd gets one argument
per file found by find which is fine. The only improvement one
might suggest is to also use the -r (also GNU specific) option
to xargs so that it doesn't run cmd if its input is empty (if
find didn't find any file).

sh-3.1# md5sum `find . -type f -print0 | xargs -0`

couldn't be more wrong.

Here, as the cmd is not provided, xargs calls the "echo" command
instead. So the files found by find will be passed as arguments
to echo. echo is a command that outputs its arguments separated
by the space character. It also performs some transformations on
those arguments, for instance it transforms the "\n" string into
a newline character.

Then that output of echo (there can be several instances of echo
called) is gathered by the shell (because of `...`) and stored
in memory. When xargs has finished, then the *shell* will split
all that output. The splitting in `...` is done by default on
spaces, tabs and newline characters. Then, for every word
resulting from that splitting, the shell performs globbing, that
is for every word that contains wildcard characters such as *,
?, [...], the shell will try to expand that to the matching
files relative to the current directory.

And then, it will pass that big list as arguments to the md5sum
command (and contrary to xargs, it will not work around the
limitation on the number of arguments).

As an example, if you do:

touch 'some
file with *a* newline character in it, \n and plenty of spaces'

find . -type f -print0 will output:

some<NL>file with *a* newline character in it, \n and plenty of spaces<NUL>

xargs -0

reading that will split it in one argument to echo:
some<NL>file with *a* newline character in it, \n and plenty of spaces

echo will output:

some<NL>file with *a* newline character in it, <NL> and plenty of spaces<NL>

`...` will split that into those elements:
1 some
2 file
3 withs
4 *a*
5 newline
6 character
7 in
8 it,
9 and
10 plenty
11 of
12 spaces

The 4th one contains wildcards, so is subject to globbing. *a*
means any file name that contains "a". And the file happens to
match, so the list becomes:

1 some
2 file
3 withs
4 some<NL>file with *a* newline character in it, \n and plenty of spaces
5 newline
6 character
7 in
8 it,
9 and
10 plenty
11 of
12 spaces

And those will be passed as arguments to md5sum.

--
St�phane
From: Andre Majorel on
On 2008-06-17, Albretch Mueller <lbrtchx(a)gmail.com> wrote:

> Also, I have read somewhere that coding like this:
>
> sh-3.1# md5sum `find . -type f -print0 | xargs -0`
>
> is better than doing it like:
>
> sh-3.1# find . -type f -print0 | xargs -0 md5sum
>
> I actually read what this guy said. (S)He didn't say "faster"
> or "less memory taxing", which are both measurable, but
> "better" because md5sum is loaded into memory only once

Without going into what's wrong with the first command (St�phane
took care of that)...

� cmd2 `cmd1` � does generally NOT create fewer cmd2 processes
than � cmd1 | xargs cmd2 �.

When the size of cmd1's output is below xargs' limit, both
commands spawn exactly one cmd2 process.

When the size of cmd1's output happens to fall between xargs'
limit and the system's limit, � cmd1 | xargs cmd2 � will create
one too many cmd2 process. There's no reason for xargs' limit to
be different from the system's limit so this is not a common
occurrence.

When the size of cmd1's output is above the system limit, � cmd1
| xargs cmd2 � will create two or more cmd2 processes and � cmd2
`cmd1` � will just fail. Some would consider that a show stopper. :->

--
Andr� Majorel <URL:http://www.teaser.fr/~amajorel/>
"Cette supposition rappelle assez celle de ce pr�dicateur qui, en
pleine chaire, faisait remarquer � ses fid�les la bont� de Dieu qui
avait plac� les rivi�res aupr�s des villes." -- Alexandre Dumas