Parsing file names with spaces [Perl]

Prev: FAQ 4.11 How do I get a random number between X and Y?
Next: Path to another server

From: John Kelly on 1 Jul 2010 12:58

On Thu, 1 Jul 2010 18:41:12 +0200, "Peter J. Holzer"
<hjp-usenet2(a)hjp.at> wrote:

>On 2010-07-01 03:46, John Kelly <jak(a)isp2dial.com> wrote:
>> On Wed, 30 Jun 2010 23:26:59 -0400, "Uri Guttman" <uri(a)StemSystems.com>
>> wrote:
>[nothing of importance]
>
>Can you two please take your bickering elsewhere? This is getting
>tireseome.

You're right, it's gone too far.

I will try to exert more willpower and resist the urge. I hope people
understand that means I won't answer, when they question why did I post
this or that.

--
Web mail, POP3, and SMTP
http://www.beewyz.com/freeaccounts.php

From: Peter J. Holzer on 1 Jul 2010 13:02

On 2010-07-01 00:38, Ben Morrow <ben(a)morrow.me.uk> wrote:
>
> Quoth James Egan <jegan473(a)comcast.net>:
>> Assuming an array named @myfiles contained three elements like:
>>
>> -rwxrwxrwx 1 777 22000 2971201 Jan 24 18:17 file1.zip
>> -rwxrwxrwx 1 777 22000 2969941 Jan 28 18:10 file2 onespace.zip
>> -rwxrwxrwx 1 777 22000 2969941 Jan 29 13:28 file3 two spaces.zip
>>
>> I want to extract just the file which contain spaces to work with like:
>>
>> file1.zip
>> file2 onespace.zip
>> file3 two spaces.zip
>>
>>
>> How can I extract the file names which have spaces?
>
> ls -l output intentionally uses fixed-width columns, except for the
> filename. So

Depends on the version of ls. Recent versions of GNU ls vary all the
column widths to fit their contents. So they are always nicely aligned
but different on each listing.

hp

From: Dr.Ruud on 1 Jul 2010 13:15

John Kelly wrote:

> Sometimes people wander into a newsgroup looking for ideas,
> and need a friendly helping hand more than elegant code.

Gentle healers make stinking wounds.

--
Ruud

From: Peter J. Holzer on 1 Jul 2010 16:35

On 2010-07-01 03:17, Tad McClellan <tadmc(a)seesig.invalid> wrote:
> James Egan <jegan473(a)comcast.net> wrote:
>
> [ snip where a nice soul has tried to solve the OP's poorly specified problem ]
>
>> I should have mentioned that the dates, sizes, names, of the files,
>> might be different, so they won't always start at position 50.
>
>
> No, you should not have mentioned that.
>
> You should have provided test data that reflects your real data.

I disagree. He should have mentioned that and quite a few things more
(for example the different date formats, whether user and group names
are always numeric, and if not, whether they can contain spaces, etc.)

Test data is nice but you can never assume that it covers all possible
cases and requirements reverse engineered from a few lines of test
data are almost guaranteed to be incomplete. Besides, why should
everyone in this group have to figure out the requirements when the OP
can do it once?

hp

From: sln on 1 Jul 2010 16:51

On Thu, 01 Jul 2010 11:14:33 -0000, Justin C <justin.0911(a)purestblue.com> wrote:

>On 2010-07-01, Uri Guttman <uri(a)StemSystems.com> wrote:
>>>>>>> "BM" == Ben Morrow <ben(a)morrow.me.uk> writes:
>>
>> BM> ls -l output intentionally uses fixed-width columns, except for the
>> BM> filename. So
>>
>> normally that is true, but very large files can cause the name column to
>> be shifted over. some ls flavors or options will change the size to use
>> a suffix but you can't count on fixed width there. as i posted it is
>> best to assume fixed width until the size but that is always a number
>> with a possible size suffix so it is easy to match and the rest is the
>> file name.
>
>An observation (that may be erroneous) of the output of ls: The second
>to last field is always the time, which contains a colon. How about
>matching /:\d{2}\s+.*\s+.+\b/ ?
^
' ' is in the class defined by . and \s

Given "18:17\040\040file1.zip",
:\d{2}\s+ will match ":17\040", ".*" will match nothing
and "\s+.+\b" will match "\040file1.zip"

Equally, /:\d{2}\s+.+\s+.+\b/
^
will produce the same problem given
"18:17\040\040\040file1.zip"

The solution is to anchor both ends of the filename with a single
character of the class \S, then let backtracking take over the middle
with the 0 or more quantifier \S.*\s.*\S

Test case:

"Jan 24 18:17 file1.zip" =~ /:\d{2}\s+(.*\s+.+)\b/
and print "$1\n";

"Jan 24 18:17 file2.zip" =~ /:\d{2}\s+(.+\s+.+)\b/
and print "$1\n";

"Jan 24 18:17 file3.zip" =~ /:\d{2}\s+(\S.*\s.*\S)\b/
and print "$1\n";

"Jan 24 18:17 file4.zip" =~ /:\d{2}\s+(\S.*\s.*\S)\b/
and print "$1\n";

>
>#!/usr/bin/perl
>
>use strict;
>use warnings;
>
>while (<DATA>) {
> if (/:\d{2}\s+(.*\s+.+)\b/) {
^^^^^^^^^^^^^^^^^^^^^^
/:\d{2}\s+(\S.*\s.*\S)\b/
> print $1, "\n";
> }
>}
>
>__DATA__
>-rwxrwxrwx 1 777 22000 2971201 Jan 24 18:17 file1.zip
>-rwxrwxrwx 1 777 22000 9941 Jan 28 18:10 file2 onespace.zip
>-rwxrwxrwx 1 777 22000 3002969941 Jan 29 13:28 file3 two spaces.zip
>

-sln

First | Prev | Next | Last
Pages: 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Prev: FAQ 4.11 How do I get a random number between X and Y?
Next: Path to another server