Parsing file names with spaces [Perl]

Prev: FAQ 4.11 How do I get a random number between X and Y?
Next: Path to another server

From: Uri Guttman on 30 Jun 2010 23:28

>>>>> "BM" == Ben Morrow <ben(a)morrow.me.uk> writes:

BM> Quoth "Uri Guttman" <uri(a)StemSystems.com>:
>> >>>>> "BM" == Ben Morrow <ben(a)morrow.me.uk> writes:
>>
BM> ls -l output intentionally uses fixed-width columns, except for the
BM> filename. So
>>
>> normally that is true, but very large files can cause the name column to
>> be shifted over. some ls flavors or options will change the size to use
>> a suffix but you can't count on fixed width there. as i posted it is
>> best to assume fixed width until the size but that is always a number
>> with a possible size suffix so it is easy to match and the rest is the
>> file name.

BM> Meh. Yes, you're probably right. (Now I check ls(1), at least on my
BM> system, it appears most of the fields are variable-width.) Since modern
BM> systems (OS X, at least) allow user- and group names with spaces in,
BM> splitting on space doesn't work either.

yow, that is annoying then. do they use tabs for separators or just more
spaces? if so, you can't really parse ls -l there. a group name could be
a number (or multiple numbers!) which is confused with the size, etc. blecch.

BM> The correct answer, of course, is 'go back several steps and get the
BM> data in a more reasonable format'.

yep. which i have suggested. oh well.

uri

--
Uri Guttman ------ uri(a)stemsystems.com -------- http://www.sysarch.com --
----- Perl Code Review , Architecture, Development, Training, Support ------
--------- Gourmet Hot Cocoa Mix ---- http://bestfriendscocoa.com ---------

From: Ben Morrow on 30 Jun 2010 23:31

Quoth John Kelly <jak(a)isp2dial.com>:
> On Wed, 30 Jun 2010 19:24:07 -0700, J�rgen Exner <jurgenex(a)hotmail.com>
> wrote:
>
> >>I want to extract just the file which contain spaces to work with like:
> >>
> >>file1.zip
> >>file2 onespace.zip
> >>file3 two spaces.zip
> >
> >Easy. split() the line into its 9 elements at any non-empty sequence of
> >spaces and then pick the last one:
> >
> > my $file = (split(/ +/, $_, 9))[8];
> >
>
> That handles blanks, but this will handle all whitespace, such as tabs.
>
> my $file = (split(' ', $_, 9))[8];

....which do not appear in the output of ls(1), except possibly as part
of a filename.

It's also worth noting that none of the solutions offered (except
perhaps File::Listing) handle symlinks.

Ben

From: J�rgen Exner on 30 Jun 2010 23:34

James Egan <jegan473(a)comcast.net> wrote:
>On Wed, 30 Jun 2010 20:05:43 -0400, Uri Guttman wrote:
>>>>>>> "JE" == James Egan <jegan473(a)comcast.net> writes:
>> JE> I should have mentioned that the dates, sizes, names, of the JE>
>> files, might be different, so they won't always start at position JE>
>> 50.
>>
>> so use a regex! it isn't hard to write one to parse out the file from ls
>> output.
>
>Assume the files vary greatly in size. Then the file names may
>not start at position 50 like:

Which is completely irrelevant for the vast majority of regular
expressions.

jue

From: J�rgen Exner on 30 Jun 2010 23:38

Ben Morrow <ben(a)morrow.me.uk> wrote:
>It's also worth noting that none of the solutions offered (except
>perhaps File::Listing) handle symlinks.

Which on the other hand weren't part of his problem statement....

A poorly specified problem necessarily leads to arbitrary guesses and
widely diverging 'solutions'.

jue

From: Tad McClellan on 30 Jun 2010 23:44

James Egan <jegan473(a)comcast.net> wrote:
> On Wed, 30 Jun 2010 20:05:43 -0400, Uri Guttman wrote:
>
>>>>>>> "JE" == James Egan <jegan473(a)comcast.net> writes:
>>
>> JE> I should have mentioned that the dates, sizes, names, of the JE>
>> files, might be different, so they won't always start at position JE>
>> 50.
>>
>> so use a regex! it isn't hard to write one to parse out the file from ls
^^^^^^^^^^^
^^^^^^^^^^^
>> output.

> Assume the files vary greatly in size. Then the file names may
> not start at position 50 like:
>
> -rwxrwxrwx 1 777 22000 2971201 Jan 24 18:17 file1.zip
> -rwxrwxrwx 1 777 22000 9941 Jan 28 18:10 file2 onespace.zip
> -rwxrwxrwx 1 777 22000 3002969941 Jan 29 13:28 file3 two spaces.zip

How's about *you* assume that, and then attempt to use a regex?

We are here to help you with your Perl problem.

We are not here to write your Perl program for you.

It is expected that you will try and do that once we have pointed
you in the right direction.

Oh hell, have a fish.

----------------
#!/usr/bin/perl
use warnings;
use strict;

my @ra = (
"-rwxrwxrwx 1 777 22000 2971201 Jan 24 18:17 file1.zip",
"-rwxrwxrwx 1 777 22000 9941 Jan 28 18:10 file2 onespace.zip",
"-rwxrwxrwx 1 777 22000 3002969941 Jan 29 13:28 file3 two
spaces.zip",
);

my @spacy;
foreach my $ls (@ra) {
$ls =~ s/^(\S+\s+){8}//;
push @spacy, $ls if $ls =~ / /;
}
print "$_\n" for @spacy;

# same thing, but done all at once
@spacy = map {s/^(\S+\s+){8}//; / / ? $_ : ()} @ra;
print "$_\n" for @spacy;
----------------

--
Tad McClellan
email: perl -le "print scalar reverse qq/moc.liamg\100cm.j.dat/"
The above message is a Usenet post.
I don't recall having given anyone permission to use it on a Web site.

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Prev: FAQ 4.11 How do I get a random number between X and Y?
Next: Path to another server