From: Shurik on
Hi

I have the below command ( It's run on HP/Sun/AIX servers )

cd ${DIRECTORY_TO_CHECK}

find . -follow -type f ! -name "*.css " ! -name "*.jpg" ! -name
"*.hpp" ! -name "*.gif" ! -name "*.c" ! -name "*.cpp" ! -name
"*.txt" ! -name "*.log" ! -name "*.java" ! -name "*.h" ! -name
"*.html" -perm -o+r -exec ls -la {} \; -exec cksum {} \; | awk '{ A=
$1;B=$9;getline;C=$1; print A"|"C"|"B}'

It's prepare the file with the following structure:
h
-rw-r--r--|214890729|./ACEXML/appans/svcconf/
ACEXML_XML_Svc_Conf_Parser.pc.in
-rw-r--r--|1370781355|./ACEXML/apps/svcconf/
ACEXML_XML_Svc_Conf_Parser.bor
-rw-r--r--|3618598382|./ACEXML/apps/svcconf/
ACEXML_XML_Svc_Conf_Parser_Static.vcproj

The find takes a lot of time, can I change something in order to
improve performance?
From: Stephane CHAZELAS on
2010-04-25, 12:46(-07), Shurik:
> Hi
>
> I have the below command ( It's run on HP/Sun/AIX servers )
>
> cd ${DIRECTORY_TO_CHECK}
>
> find . -follow -type f ! -name "*.css " ! -name "*.jpg" ! -name
> "*.hpp" ! -name "*.gif" ! -name "*.c" ! -name "*.cpp" ! -name
> "*.txt" ! -name "*.log" ! -name "*.java" ! -name "*.h" ! -name
> "*.html" -perm -o+r -exec ls -la {} \; -exec cksum {} \; | awk '{ A=
> $1;B=$9;getline;C=$1; print A"|"C"|"B}'
>
> It's prepare the file with the following structure:
> h
> -rw-r--r--|214890729|./ACEXML/appans/svcconf/
> ACEXML_XML_Svc_Conf_Parser.pc.in
> -rw-r--r--|1370781355|./ACEXML/apps/svcconf/
> ACEXML_XML_Svc_Conf_Parser.bor
> -rw-r--r--|3618598382|./ACEXML/apps/svcconf/
> ACEXML_XML_Svc_Conf_Parser_Static.vcproj
>
> The find takes a lot of time, can I change something in order to
> improve performance?

What is taking time is executing 2 commands per file.

mkfifo ls.fifo
find -L . -type f ! -name "*.css " ! -name "*.jpg" ! -name \
"*.hpp" ! -name "*.gif" ! -name "*.c" ! -name "*.cpp" ! -name \
"*.txt" ! -name "*.log" ! -name "*.java" ! -name "*.h" ! -name \
"*.html" -perm -o+r -exec sh -c '
ls -lLd "$@" >&3 & cksum "$@" & wait' sh {} + 3> ls.fifo |
paste ls.fifo - | awk -vOFS='|' '{print $1,$10,$9}'

(note that will -L (formerly -follow), -type f returns true for
symlinks to regular files (unless that file has already been
accounted for), and cksum most probably does the checksum of the
pointed file, so you probably want the -L option to ls (without
which it won't work anyway because of the -> ... extra fields in
ls output). Also, that solution won't work for filenames
containing blanks or newline characters)

--
Stéphane
From: Shurik on
On Apr 25, 11:09 pm, Stephane CHAZELAS <stephane_chaze...(a)yahoo.fr>
wrote:
> 2010-04-25, 12:46(-07), Shurik:
>
>
>
>
>
> > Hi
>
> > I have the below command ( It's run on HP/Sun/AIX servers )
>
> > cd ${DIRECTORY_TO_CHECK}
>
> > find . -follow -type f ! -name "*.css " ! -name "*.jpg" ! -name
> > "*.hpp" ! -name "*.gif" ! -name "*.c"  ! -name "*.cpp" ! -name
> > "*.txt" ! -name "*.log" ! -name "*.java" ! -name "*.h" ! -name
> > "*.html" -perm -o+r -exec ls -la {} \; -exec cksum {} \; | awk '{ A=
> > $1;B=$9;getline;C=$1; print A"|"C"|"B}'
>
> > It's prepare the file with the following structure:
> > h
> > -rw-r--r--|214890729|./ACEXML/appans/svcconf/
> > ACEXML_XML_Svc_Conf_Parser.pc.in
> > -rw-r--r--|1370781355|./ACEXML/apps/svcconf/
> > ACEXML_XML_Svc_Conf_Parser.bor
> > -rw-r--r--|3618598382|./ACEXML/apps/svcconf/
> > ACEXML_XML_Svc_Conf_Parser_Static.vcproj
>
> > The find takes a lot of time, can I change something in order to
> > improve performance?
>
> What is taking time is executing 2 commands per file.
>
> mkfifo ls.fifo
> find -L . -type f ! -name "*.css " ! -name "*.jpg" ! -name \
>  "*.hpp" ! -name "*.gif" ! -name "*.c"  ! -name "*.cpp" ! -name \
>  "*.txt" ! -name "*.log" ! -name "*.java" ! -name "*.h" ! -name \
>  "*.html" -perm -o+r -exec sh -c '
>   ls -lLd "$@" >&3 & cksum "$@" & wait' sh {} + 3> ls.fifo |
>   paste ls.fifo - | awk -vOFS='|' '{print $1,$10,$9}'
>
> (note that will -L (formerly -follow), -type f returns true for
> symlinks to regular files (unless that file has already been
> accounted for), and cksum most probably does the checksum of the
> pointed file, so you probably want the -L option to ls (without
> which it won't work anyway because of the -> ... extra fields in
> ls output). Also, that solution won't work for filenames
> containing blanks or newline characters)
>
> --
> Stéphane

Stephane, thanks a lot, but I didn't get any output from your command :
(
From: Shurik on
On Apr 26, 3:34 pm, Shurik <shurikgef...(a)gmail.com> wrote:
> On Apr 25, 11:09 pm, Stephane CHAZELAS <stephane_chaze...(a)yahoo.fr>
> wrote:
>
>
>
>
>
> > 2010-04-25, 12:46(-07), Shurik:
>
> > > Hi
>
> > > I have the below command ( It's run on HP/Sun/AIX servers )
>
> > > cd ${DIRECTORY_TO_CHECK}
>
> > > find . -follow -type f ! -name "*.css " ! -name "*.jpg" ! -name
> > > "*.hpp" ! -name "*.gif" ! -name "*.c"  ! -name "*.cpp" ! -name
> > > "*.txt" ! -name "*.log" ! -name "*.java" ! -name "*.h" ! -name
> > > "*.html" -perm -o+r -exec ls -la {} \; -exec cksum {} \; | awk '{ A=
> > > $1;B=$9;getline;C=$1; print A"|"C"|"B}'
>
> > > It's prepare the file with the following structure:
> > > h
> > > -rw-r--r--|214890729|./ACEXML/appans/svcconf/
> > > ACEXML_XML_Svc_Conf_Parser.pc.in
> > > -rw-r--r--|1370781355|./ACEXML/apps/svcconf/
> > > ACEXML_XML_Svc_Conf_Parser.bor
> > > -rw-r--r--|3618598382|./ACEXML/apps/svcconf/
> > > ACEXML_XML_Svc_Conf_Parser_Static.vcproj
>
> > > The find takes a lot of time, can I change something in order to
> > > improve performance?
>
> > What is taking time is executing 2 commands per file.
>
> > mkfifo ls.fifo
> > find -L . -type f ! -name "*.css " ! -name "*.jpg" ! -name \
> >  "*.hpp" ! -name "*.gif" ! -name "*.c"  ! -name "*.cpp" ! -name \
> >  "*.txt" ! -name "*.log" ! -name "*.java" ! -name "*.h" ! -name \
> >  "*.html" -perm -o+r -exec sh -c '
> >   ls -lLd "$@" >&3 & cksum "$@" & wait' sh {} + 3> ls.fifo |
> >   paste ls.fifo - | awk -vOFS='|' '{print $1,$10,$9}'
>
> > (note that will -L (formerly -follow), -type f returns true for
> > symlinks to regular files (unless that file has already been
> > accounted for), and cksum most probably does the checksum of the
> > pointed file, so you probably want the -L option to ls (without
> > which it won't work anyway because of the -> ... extra fields in
> > ls output). Also, that solution won't work for filenames
> > containing blanks or newline characters)
>
> > --
> > Stéphane
>
> Stephane, thanks a lot, but I didn't get any output from your command :
> (


I split my find to the following code:

TEMP_FILE=/tmp/check_3th_$$

cd ${DIRECTORY_TO_CHECK}

find . -follow -type f ! -name "*.css " ! -name "*.jpg" ! -name
"*.hpp" ! -name "*.gif" ! -name "*.c" ! -name "*.cpp" ! -name
"*.txt" ! -name "*.log" ! -name "*.java" ! -name "*.h" ! -name
"*.html" -perm -o+r > ${TEMP_FILE}

cat ${TEMP_FILE} | xargs -n 20 cksum > ${TEMP_FILE}_1

cat ${TEMP_FILE} | xargs -n 1 ls -la > ${TEMP_FILE}_2
paste ${TEMP_FILE}_1 ${TEMP_FILE}_2 | awk '{ print $4"|"$1"|"$3}'

rm -f ${TEMP_FILE}_1 ${TEMP_FILE}_2 ${TEMP_FILE}

Before the split it was taken 9 minutes to run, after the split it's 2
minutes. Can I still improve performance?
From: Jon LaBadie on
Shurik wrote:
>
> I split my find to the following code:
>
> TEMP_FILE=/tmp/check_3th_$$
>
> cd ${DIRECTORY_TO_CHECK}
>
> find . -follow -type f ! -name "*.css " ! -name "*.jpg" ! -name
> "*.hpp" ! -name "*.gif" ! -name "*.c" ! -name "*.cpp" ! -name
> "*.txt" ! -name "*.log" ! -name "*.java" ! -name "*.h" ! -name
> "*.html" -perm -o+r > ${TEMP_FILE}
>
> cat ${TEMP_FILE} | xargs -n 20 cksum > ${TEMP_FILE}_1
>
> cat ${TEMP_FILE} | xargs -n 1 ls -la > ${TEMP_FILE}_2
> paste ${TEMP_FILE}_1 ${TEMP_FILE}_2 | awk '{ print $4"|"$1"|"$3}'
>
> rm -f ${TEMP_FILE}_1 ${TEMP_FILE}_2 ${TEMP_FILE}
>
> Before the split it was taken 9 minutes to run, after the split it's 2
> minutes. Can I still improve performance?

separating the file finding and name exclusion "may" help, particularly
on a multi-cpu system. Eg.

find . -follow -type f -perm -o+r |
grep -E -v '(\.css|\.jpg|\.hpp|\.gif ... |\.html)$' > ${TEMP_FILE}

But I suspect that cksum is using the bulk of your time. It might be
worthwhile to use 'time' to check how long each statement takes so you
know where to try and optimize.