|
From: Jesper Rønn-Jensen on 18 Jun 2008 09:01 Hi there. I'm using this fine script to find all duplicate files in a project: OUTF=rem-duplicates.sh; echo "#! /bin/sh" > $OUTF; find "$@" -type f -print0 | xargs -0 -n1 md5sum | sort --key=1,32 | uniq -w 32 -d --all-repeated=separate | sed -r 's/^[0-9a-f]*( )*//;s/([^a-zA-Z0-9./_-])/\\\1/g;s/(.+)/#rm \1/' >> $OUTF; chmod a+x $OUTF; ls -l $OUTF [from http://elonen.iki.fi/code/misc-notes/remove-duplicate-files/ ] However, it must ignore .svn folders (because basically there is a duplicate file in the hidden svn folder for every versioned file) So my idea was to pipe the find into "grep -v .svn" and then add -- null flag to make sure xargs -0 will get the appropriate input. However, I cant get it to work. Executing grep on each line makes it compute the line of the files -- not the filename: #! /bin/sh OUTF=rem-duplicates.sh; echo "#! /bin/sh" > $OUTF; find "$@" -type f -print0 -exec grep -v .svn '{}' \; | xargs -0 -n1 md5sum | sort --key=1,32 | uniq -w 32 -d --all-repeated=separate | sed -r 's/^[0-9a-f]*( )*//;s/([^a-zA-Z0-9./_-])/\\\1/g;s/(.+)/#rm \1/' >> $OUTF; chmod a+x $OUTF; ls -l $OUTF How do I change this to grep in the filename and not the content of the filename? PS. I also tried to grep in the final file -- but at that time it's too late: a grep only removes the duplicate in the svn folder -- not the file it duplicates and thus giving me a long list with all files in the directory structure. Any help appreciated! Thanks! /Jesper Rønn-Jensen blog: http://justaddwater.dk/
From: Stephane CHAZELAS on 18 Jun 2008 09:06 2008-06-18, 06:01(-07), Jesper R�nn-Jensen: [...] > I'm using this fine script to find all duplicate files in a project: > > OUTF=rem-duplicates.sh; > echo "#! /bin/sh" > $OUTF; > find "$@" -type f -print0 | > xargs -0 -n1 md5sum | > sort --key=1,32 | uniq -w 32 -d --all-repeated=separate | > sed -r 's/^[0-9a-f]*( )*//;s/([^a-zA-Z0-9./_-])/\\\1/g;s/(.+)/#rm > \1/' >> $OUTF; > chmod a+x $OUTF; ls -l $OUTF > > > [from http://elonen.iki.fi/code/misc-notes/remove-duplicate-files/ ] > > > However, it must ignore .svn folders (because basically there is a > duplicate file in the hidden svn folder for every versioned file) [...] Change the find cmd to find "$@" \( ! -name .svn -o -prune \) -type f -print0 It prevents find from descending in the .svn directories. -- St�phane
From: Jesper Rønn-Jensen on 18 Jun 2008 09:27 Stephane CHAZELAS> > find "$@" \( ! -name .svn -o -prune \) -type f -print0 > > It prevents find from descending in the .svn directories. Works like a charm! Thanks a lot for your precise and quick answer! /Jesper
From: Dan Stromberg on 18 Jun 2008 15:24 On Wed, 18 Jun 2008 13:06:52 +0000, Stephane CHAZELAS wrote: > 2008-06-18, 06:01(-07), Jesper Rønn-Jensen: [...] >> I'm using this fine script to find all duplicate files in a project: >> >> OUTF=rem-duplicates.sh; >> echo "#! /bin/sh" > $OUTF; >> find "$@" -type f -print0 | >> xargs -0 -n1 md5sum | >> sort --key=1,32 | uniq -w 32 -d --all-repeated=separate | sed -r >> 's/^[0-9a-f]*( )*//;s/([^a-zA-Z0-9./_-])/\\\1/g;s/(.+)/#rm >> \1/' >> $OUTF; >> chmod a+x $OUTF; ls -l $OUTF >> >> >> [from http://elonen.iki.fi/code/misc-notes/remove-duplicate-files/ ] >> >> >> However, it must ignore .svn folders (because basically there is a >> duplicate file in the hidden svn folder for every versioned file) > [...] > > Change the find cmd to > > find "$@" \( ! -name .svn -o -prune \) -type f -print0 > > It prevents find from descending in the .svn directories. I often use the slightly more concise: find "$@" -name .svn -prune -o -type f -print0
From: Stephane CHAZELAS on 18 Jun 2008 16:30 2008-06-18, 19:24(+00), Dan Stromberg: [...] >> find "$@" \( ! -name .svn -o -prune \) -type f -print0 >> >> It prevents find from descending in the .svn directories. > > I often use the slightly more concise: > > find "$@" -name .svn -prune -o -type f -print0 The less concise way does print the non-directory files called ..svn though. -- St�phane
|
Pages: 1 Prev: Delete/backspace on command line Next: Regular expression for stanza file |