How to find files with the same name -- different case? [Shell]

Prev: is there a bash equivalent of "this" ...
Next: Unix Script to process records in group

From: Randal L. Schwartz on 8 Feb 2010 14:27

>>>>> "laredotornado" == laredotornado <laredotornado(a)zipmail.com> writes:

laredotornado> How would I search for files that have the same name, but potentially
laredotornado> different case, living in the same directory? For example, I would
laredotornado> want to find files like

laredotornado> /dir1/image1.gif
laredotornado> /dir1/IMAGE1.gif

Untested, but I think this'll do:

#!/usr/bin/perl

use File::Find;

my %names;

find sub {
push @{$names{lc $File::Find::name}}, $File::Find::name;
}, "/";

for (values %names) {
next unless @$_ > 1;
print "@$_\n";
}

This'll list all the files with identical name mappings on the
same line, separated by space.

print "Just another Perl hacker,"; # the original

--
Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
<merlyn(a)stonehenge.com> <URL:http://www.stonehenge.com/merlyn/>
Smalltalk/Perl/Unix consulting, Technical writing, Comedy, etc. etc.
See http://methodsandmessages.vox.com/ for Smalltalk and Seaside discussion

From: Maxwell Lol on 12 Feb 2010 06:57

laredotornado <laredotornado(a)zipmail.com> writes:

> Hi,
>
> How would I search for files that have the same name, but potentially
> different case, living in the same directory? For example, I would
> want to find files like
>
> /dir1/image1.gif
> /dir1/IMAGE1.gif

you could also use the less elegant approach
find . | tr 'A-Z' 'a-z' | sort | uniq -c | sort -nr
and look for files that have 2 or more entries

From: David Combs on 21 Feb 2010 20:59

In article <slrnhmuiu8.2v8.usenet-nospam(a)guild.seebs.net>,
Seebs <usenet-nospam(a)seebs.net> wrote:
>On 2010-02-07, laredotornado <laredotornado(a)zipmail.com> wrote:
>> How would I search for files that have the same name, but potentially
>> different case, living in the same directory? For example, I would
>> want to find files like
>
>Within a directory:
>
> for file in *
> do
> if ls | grep -v "^$file\$" | grep -qi "^$file\$"
> then echo "found similar matches for '$file'."
> fi
> done
>

Good Lord, that's O(n^2), right off the bat, an
ls inside an "effective ls", assuming those greps
somehow do what you want.

Why not simply sort them all, using a sort that allows
you to pass an expression that decides on whether
a or b is bigger, in which you do a tr or toLower
or whatever, but it's the UNmodified values that
end up coming out the sort-order that didn't consider
case, then one final run-through of those "sorted"
results, and within every "run" of again tr'd or
toLowered, if they differ in their original form,
you've found at least one of what you're looking for.

What's that so far, n log n + n?

Or, if you want more info, then for each of
those toLower'd runs, you sort THAT, "straight",
then basically run a "uniq" on that.

(If you're running a million strings through it,
I suppose you'd code the uniq by hand, "in line",
because of all the start-up time for the process start-ups.

Or maybe depending on the length of the run you
choose which to do.

Maybe ditto for that inner sort.

Question: does ANY of that make sense? As soon
as I post this, I'll probably realize it's all wet!

David

From: David Combs on 21 Feb 2010 21:15

In article <hkp3cb$3bh$1(a)news.eternal-september.org>,
Ed Morton <mortonspam(a)gmail.com> wrote:
>On 2/7/2010 7:14 PM, Janis Papanagnou wrote:
>> Janis Papanagnou wrote:
>>> laredotornado wrote:
>>>> Hi,
>>>>
>>>> How would I search for files that have the same name, but potentially
>>>> different case, living in the same directory? For example, I would
>>>> want to find files like
>>>>
>>>> /dir1/image1.gif
>>>> /dir1/IMAGE1.gif
>>>>
>>>> but I don't care about
>>>>
>>>> /dir1/image1.gif
>>>> /dir1/dir2/image1.gif
>
>How do you feel about:
>
> /dir1/image1.gif
> /DIR1/image1.gif
>
>or:
>
> /dir1/dir2
> /dir1/DIR2
>
>where dir2 and DIR2 are directories?
>
>>>> ? Hope this question makes sense, - Dave
>>>
>>> This awk program stores the converted filenames that it reads from stdin
>>> in lowercase and prints any new filename that matches case-insensitive...
>>>
>>> awk 'tolower($0) in f ; { f[tolower($0)] }'
>>>
>>> You can feed files from a current directory
>>>
>>> awk 'tolower($0) in f ; { f[tolower($0)] }' *
>>
>> Didn't know what I was thinking with the previous line; should have been
>>
>> ls | awk '...' or ls */* | awk '...' or somesuch.
>>
>>>
>>> or (if case insensitive directories are not a problem) from a directory
>>> tree
>>>
>>> find . | awk 'tolower($0) in f ; { f[tolower($0)] }'
>>>
>>> Just one way to approach the task.
>>
>> And since I am posting anyway I can point to the terse "golf version" as
>> well...
>>
>> find . | awk 'f[tolower($1)]++'
>
>ITYM:
>
> find . | awk 'f[tolower($0)]++'
>
>or if the OP really only cares about files with matching names but does care
>about differentiating directories with different case:
>
> find . -type f |
> awk -F'/' '{file=tolower($NF); sub(/[/][^/]+$/,"",$0)} f[$0 "/" file]++'
>
GOLF is nice, but I'm no Tiger Woods, so maybe you could explain
that a bit.

(To understand how little I understand awk, I'm searching for the
curly brackets I (mistakenly, obviously) thought were required!)

>Usual caveat about file names that contain newlines.
That is a joke, yes? Although I wouldn't
put it past Windows to allow it!

Thanks,

David

From: Janis Papanagnou on 21 Feb 2010 22:02

David Combs wrote:
> In article <hkp3cb$3bh$1(a)news.eternal-september.org>,
> Ed Morton <mortonspam(a)gmail.com> wrote:
>> On 2/7/2010 7:14 PM, Janis Papanagnou wrote:
>>> Janis Papanagnou wrote:
>>>> laredotornado wrote:
>>>>> Hi,
>>>>>
>>>>> How would I search for files that have the same name, but potentially
>>>>> different case, living in the same directory? For example, I would
>>>>> want to find files like
>>>>>
>>>>> /dir1/image1.gif
>>>>> /dir1/IMAGE1.gif
>>>>>
>>>>> but I don't care about
>>>>>
>>>>> /dir1/image1.gif
>>>>> /dir1/dir2/image1.gif
>> How do you feel about:
>>
>> /dir1/image1.gif
>> /DIR1/image1.gif
>>
>> or:
>>
>> /dir1/dir2
>> /dir1/DIR2
>>
>> where dir2 and DIR2 are directories?
>>
>>>>> ? Hope this question makes sense, - Dave
>>>> This awk program stores the converted filenames that it reads from stdin
>>>> in lowercase and prints any new filename that matches case-insensitive...
>>>>
>>>> awk 'tolower($0) in f ; { f[tolower($0)] }'
>>>>
>>>> You can feed files from a current directory
>>>>
>>>> awk 'tolower($0) in f ; { f[tolower($0)] }' *
>>> Didn't know what I was thinking with the previous line; should have been
>>>
>>> ls | awk '...' or ls */* | awk '...' or somesuch.
>>>
>>>> or (if case insensitive directories are not a problem) from a directory
>>>> tree
>>>>
>>>> find . | awk 'tolower($0) in f ; { f[tolower($0)] }'
>>>>
>>>> Just one way to approach the task.
>>> And since I am posting anyway I can point to the terse "golf version" as
>>> well...
>>>
>>> find . | awk 'f[tolower($1)]++'
>> ITYM:
>>
>> find . | awk 'f[tolower($0)]++'
>>
>> or if the OP really only cares about files with matching names but does care
>> about differentiating directories with different case:
>>
>> find . -type f |
>> awk -F'/' '{file=tolower($NF); sub(/[/][^/]+$/,"",$0)} f[$0 "/" file]++'
>>
> GOLF is nice, but I'm no Tiger Woods, so maybe you could explain
> that a bit.

I'll take the minimalist version...

find . -type f | awk 'f[tolower($0)]++'

find(1) lists the ordinary (regular) files and passes them to awk.
Awk takes the filename fed through stdin, each line from find is
stored in $0. Then make the filename in $0 lowercase: tolower($0).
f[abc] is an associative array access with key abc; if the array
element is non-existing it will be created.
Now if the lowercased filename isn't yet in the array, the element
will be created and incremented; for a file, say xyz.txt, the array
element f["xyz.txt"] (which is initially 0) will be set to 1 (by
the ++ operator). The increment will actually happen after the whole
expression is evaluated, so on the first insert it evaluates to 0,
and the array is incremented to 1, and each subsequent array access
to the same array element (i.e. with an equivalent filename) will
evaluate to a value greater 0, and incremented further.
Now considering that awk programs are generally of the form

condition { action }

which means that if the condition is true (non-zero) the respective
action will executed, and that the above awk program is a short form
for

f[tolower($0)]++ { print $0 }

you see that the filename in $0 will be printed for every occurrence
greater to 1.
And if we recall that the filename that is used as key is always used
in a normalized lowercase form then a input sequence of

ABC.xyz
abc.xyz
abc.XYZ
ABC.XYZ

will increment the array element f["abc.xyz"] four times and the elements
that come as #2, #3, #4 will be printed as duplicate.

Janis

>
> (To understand how little I understand awk, I'm searching for the
> curly brackets I (mistakenly, obviously) thought were required!)
>
>
>
>> Usual caveat about file names that contain newlines.
> That is a joke, yes? Although I wouldn't
> put it past Windows to allow it!
>
>
> Thanks,
>
> David
>
>

First | Prev | Next | Last
Pages: 1 2 3 4
Prev: is there a bash equivalent of "this" ...
Next: Unix Script to process records in group