From: Markus Dehmann on
I have a convenient way to open possibly gzip'ed files:

open(F, ($f =~ m/\.gz$/) ? "gunzip -c $f |" : "$f");

So, if the file name ends in .gz I send it through gunzip. So far, so
good. (I don't want to use the PerlIO:Gzip module because it's not
installed by default, so it's a hassle.)

But now, my script should be callable in the following ways:
$ cat data | ./script.pl
$ ./script.pl data.gz
$ ./script.pl data

Usually, I would just use the while loop: while(<>){...}. But that does
not read gzip'ed data.

How would you handle that? I could think of the following code, but it's
long and not nice ...

if(defined $ARGV[0] && -f $ARGV[0]){
readFromFile($ARGV[0]);
}else{
readFromStdin();
}

sub readFromFile{
my ($f) = @_;
open(F, ($f =~ m/\.gz$/) ? "gunzip -c $f |" : "$f")
or die("Could not open $f: $!");
while(<F>){
processLine($_);
}
close F;
}

sub readFromStdin{
while(<>){
processLine($_);
}
}

sub processLine{ ... }


Thanks!
Markus
From: attn.steven.kuo@gmail.com on
Markus Dehmann wrote:
> I have a convenient way to open possibly gzip'ed files:
>
> open(F, ($f =~ m/\.gz$/) ? "gunzip -c $f |" : "$f");
>
> So, if the file name ends in .gz I send it through gunzip. So far, so
> good. (I don't want to use the PerlIO:Gzip module because it's not
> installed by default, so it's a hassle.)
>
> But now, my script should be callable in the following ways:
> $ cat data | ./script.pl
> $ ./script.pl data.gz
> $ ./script.pl data
>
> Usually, I would just use the while loop: while(<>){...}. But that does
> not read gzip'ed data.
>
> How would you handle that? I could think of the following code, but it's
> long and not nice ...
>
> if(defined $ARGV[0] && -f $ARGV[0]){
> readFromFile($ARGV[0]);
> }else{
> readFromStdin();
> }

(snipped)

Look under 'perldoc perlopentut'
where the minus (-) file is discussed:

my $input = defined($ARGV[0]) ? $ARGV[0] : '-';
$input = $input =~ /\.gz$/
? "gunzip -c $input |"
: $input ;

open (FH, $input)
or die $!;

process_line($_) while (<FH>);

close FH;

--
Hope this helps,
Steven

From: jgraber on

"attn.steven.kuo(a)gmail.com" <attn.steven.kuo(a)gmail.com> writes:
> Markus Dehmann wrote:
> > I have a convenient way to open possibly gzip'ed files:
> > open(F, ($f =~ m/\.gz$/) ? "gunzip -c $f |" : "$f");
> >
> > So, if the file name ends in .gz I send it through gunzip. So far, so
> > good. (I don't want to use the PerlIO:Gzip module because it's not
> > installed by default, so it's a hassle.)
> >
> > But now, my script should be callable in the following ways:
> > $ cat data | ./script.pl
> > $ ./script.pl data.gz
> > $ ./script.pl data
> >
> > Usually, I would just use the while loop: while(<>){...}. But that does
> > not read gzip'ed data.
> (snipped)
>
> Look under 'perldoc perlopentut'
> where the minus (-) file is discussed:
>
> my $input = defined($ARGV[0]) ? $ARGV[0] : '-';
> $input = $input =~ /\.gz$/
> ? "gunzip -c $input |"
> : $input ;
> open (FH, $input)
> or die $!;
> process_line($_) while (<FH>);
> close FH;

I discovered that my currently installed version of gzip -d
would correctly read plain files, gzipped files (.gz),
and even packed files (.Z). So now I use gzip -d
for everything. According to top, it uses only 1%
of the CPU when called uselessly. It also works
for the occasional file that is gzipped without a .gz
extention, or vice-versa. I remember it working for
$infile = "-" as well, for those gzipped output pipes.

I've been recommending this as the "universal input pipe",
$gzip_pid = open( FH, $fp="/usr/local/bin/gzip -dfc $infile |" )
|| die "Cant open input pipe '$fp' : $!\n";

I'm primarily used to writing in perl4 style.
I'd welcome the likely followup to this post with an
example of a more modern style.
Is this a security hole for the occasionally
maliciously named file like "x;rm -rf / "
?
--
Joel
From: Anno Siegel on
Markus Dehmann <markus.dehmann(a)gmail.com> wrote in comp.lang.perl.misc:
> I have a convenient way to open possibly gzip'ed files:
>
> open(F, ($f =~ m/\.gz$/) ? "gunzip -c $f |" : "$f");
>
> So, if the file name ends in .gz I send it through gunzip. So far, so
> good. (I don't want to use the PerlIO:Gzip module because it's not
> installed by default, so it's a hassle.)
>
> But now, my script should be callable in the following ways:
> $ cat data | ./script.pl
> $ ./script.pl data.gz
> $ ./script.pl data
>
> Usually, I would just use the while loop: while(<>){...}. But that does
> not read gzip'ed data.
>
> How would you handle that? I could think of the following code, but it's
> long and not nice ...

[snip]

/\.gz$/ and $_ = "gunzip -c $_ |" for @ARGV;
print while <>;

Anno
--
If you want to post a followup via groups.google.com, don't use
the broken "Reply" link at the bottom of the article. Click on
"show options" at the top of the article, then click on the
"Reply" at the bottom of the article headers.
From: Markus Dehmann on
jgraber(a)ti.com wrote:
> "attn.steven.kuo(a)gmail.com" <attn.steven.kuo(a)gmail.com> writes:
>
>>Markus Dehmann wrote:
>>
>>>I have a convenient way to open possibly gzip'ed files:
>>>open(F, ($f =~ m/\.gz$/) ? "gunzip -c $f |" : "$f");
>>>
>>>So, if the file name ends in .gz I send it through gunzip. So far, so
>>>good. (I don't want to use the PerlIO:Gzip module because it's not
>>>installed by default, so it's a hassle.)
>>>
>>>But now, my script should be callable in the following ways:
>>>$ cat data | ./script.pl
>>>$ ./script.pl data.gz
>>>$ ./script.pl data
>>>
>>>Usually, I would just use the while loop: while(<>){...}. But that does
>>>not read gzip'ed data.
>>
>>(snipped)
>>
>>Look under 'perldoc perlopentut'
>>where the minus (-) file is discussed:
>>
>>my $input = defined($ARGV[0]) ? $ARGV[0] : '-';
>> $input = $input =~ /\.gz$/
>> ? "gunzip -c $input |"
>> : $input ;
>>open (FH, $input)
>> or die $!;
>>process_line($_) while (<FH>);
>>close FH;
>
>
> I discovered that my currently installed version of gzip -d
> would correctly read plain files, gzipped files (.gz),
> and even packed files (.Z). So now I use gzip -d
> for everything. According to top, it uses only 1%
> of the CPU when called uselessly. It also works
> for the occasional file that is gzipped without a .gz
> extention, or vice-versa. I remember it working for
> $infile = "-" as well, for those gzipped output pipes.
>
> I've been recommending this as the "universal input pipe",
> $gzip_pid = open( FH, $fp="/usr/local/bin/gzip -dfc $infile |" )

Now, a slightly offtopic question:

Why do people often use the full path to an application (like here,
/usr/local/bin/gzip)? That just makes it more unlikely to work, since
my gzip might be in /usr/bin.

Why not just: open(F, "gzip -dfc $infile |");


Same thing with the perl command: Why don't we write
#!perl -w

as the first line of a perl program, and let the $PATH variable figure
out which perl is meant?

Thanks!
Markus