Help on String to array ! [Perl]

Prev: FAQ 7.4 How do I skip some return values?
Next: Simple regex question

From: sln on 11 Mar 2010 12:57

On Thu, 11 Mar 2010 04:43:45 -0800 (PST), jis <jismagic(a)gmail.com> wrote:

>On Mar 11, 11:15�am, "Uri Guttman" <u...(a)StemSystems.com> wrote:
>> >>>>> "j" == jis �<jisma...(a)gmail.com> writes:
>>
>Uri,
>
>I have used the script you have posted with only change in input file
>i get the following results.
> (warning: too few iterations for a reliable count)
> (warning: too few iterations for a reliable count)
> (warning: too few iterations for a reliable count)
> s/iter unpacking regex substring
>unpacking 9.06 -- -27% -34%
>regex 6.59 37% -- -9%
>substring 6.01 51% 10% --
>
>Unpacking still remains the longest to finish.
>
>I use Windows XP professional with a 2Gb RAM. I also have got a 45GB
>free space in my C drive.
>
>DO you see something else different?
>
>thanks,
>jis

You have Windows!
Try this test below. It uses timethis() for $count itterations.
You don't want a partial itteration result given a small time interval.

After you run the code as written, run it by plugging in your file
information and change the $count to 3 itterations.
Go for a cofee break. Post back.

My results:

Unpacking: 12.7929 wallclock secs ( 9.94 usr + 2.84 sys = 12.78 CPU) @ 0.08/s (n=1)
Regex: 29.6103 wallclock secs (29.53 usr + 0.08 sys = 29.61 CPU) @ 0.03/s (n=1)
Substring: 2.85185 wallclock secs ( 2.81 usr + 0.03 sys = 2.84 CPU) @ 0.35/s (n=1)

-sln

-----------------
use strict;
use warnings;

use Benchmark qw(:all :hireswallclock) ;

#---- Uncomment, plug in filename ---------
# use File::Slurp ;
# my $file_name = '/boot/vmlinuz-2.6.28-15-generic' ;
# my $data = read_file( $file_name, binary => 1 ) ;
# #$data = "\x00\x10" ;
# my $hex = unpack 'H*', $data;
#------------------------------------------

my $count = 1; # increase count to 3 after first testing 1

#---- Comment out $hex -------------------
my $hex = 'a0b0c1d2e3f411aabbcc' x 200_000; # about 4MB's
#-----------------------------------------

timethis ($count, \&unpacking, "Unpacking");
timethis ($count, \&regex, "Regex");
timethis ($count, \&substring, "Substring");

sub unpacking {
my @arr = unpack( '(A2)*' , $hex) ;
# print "@arr\n"
}

sub regex {
my @arr = $hex =~ /.{2}/g ; # regex modified
# print "@arr\n"
}

sub substring {
my ($val, $offs, @arr) = ('',0);
while ($val=substr( $hex, $offs, 2)) {
push @arr, $val;
$offs+=2;
}
# print "@arr\n"
}
__END__

From: Uri Guttman on 11 Mar 2010 13:27

>>>>> "j" == jis <jismagic(a)gmail.com> writes:

j> On Mar 11, 11:15�am, "Uri Guttman" <u...(a)StemSystems.com> wrote:
>> >>>>> "j" == jis �<jisma...(a)gmail.com> writes:
>>
>> � j> if i uncommment �regex protion and comment unpack it would take
>> � j> 1minute 25 sec
>>
>> � j> print "bye";
>> � j> print $arr[2]; � �This would take only 9 seconds.
>>
>> � j> I have used a stopwatch to calculate time.
>>
>> as i said, that is a silly way to time programs. and there is no way it
>> would take minutes to do this unless you are on a severely slow cpu or
>> you are low on ram and are disk thrashing. here is my benchmarked
>> version which shows that unpacking (fixed to use A and not C) is the
>> fastest and regex (also fixed to do the simplest but correct thing which
>> is grab 2 chars) ties your code.
>>
>> uncomment out those commented lines to see that this does the same and
>> correct thing in all cases.
>>
>> here is the timing result run for 10 seconds each:
>>
>> � � � � � s/iter � � regex substring unpacking
>> regex � � � 2.11 � � � �-- � � � -0% � � �-25%
>> substring � 2.11 � � � �0% � � � �-- � � �-25%
>> unpacking � 1.58 � � � 33% � � � 33% � � � �--
>>
>> uri
>>
>> use strict;
>> use warnings;
>>
>> use File::Slurp ;
>> use Benchmark qw(:all) ;
>>
>> my $duration = shift || -2 ;
>>
>> my $file_name = '/boot/vmlinuz-2.6.28-15-generic' ;
>>
>> my $data = read_file( $file_name, binary => 1 ) ;
>>
>> #$data = "\x00\x10" ;
>>
>> my $hex = unpack 'H*', $data;
>>
>> # unpacking() ;
>> # regex() ;
>> # substring() ;
>> # exit ;
>>
>> cmpthese( $duration, {
>>
>> � � � � unpacking � � � => \&unpacking,
>> � � � � regex � � � � � => \&regex,
>> � � � � substring � � � => \&substring,
>>
>> } ) ;
>>
>> sub unpacking {
>> � � � � my @arr = unpack( '(A2)*' , $hex) ;
>> # � � � print "@arr\n"
>>
>> }
>>
>> sub regex {
>> � � � � my @arr = $hex =~ /(..{2})/g ;
>> # � � � print "@arr\n"
>>
>> }
>>
>> sub substring {
>>
>> � � � � my ($val, $offs, @arr) = ('',0);
>> � � � � while ($val=substr( $hex, $offs, 2)){
>> � � � � � � � � push @arr, $val;
>> � � � � � � � � $offs+=2;
>> � � � � }
>>
>> # � � � print "@arr\n"
>>
>> }
>>
>> --
>> Uri Guttman �------ �u...(a)stemsystems.com �-------- �http://www.sysarch.com--
>> ----- �Perl Code Review , Architecture, Development, Training, Support ------
>> --------- �Gourmet Hot Cocoa Mix �---- �http://bestfriendscocoa.com---------

j> Uri,

j> I have used the script you have posted with only change in input file
j> i get the following results.
j> (warning: too few iterations for a reliable count)
j> (warning: too few iterations for a reliable count)
j> (warning: too few iterations for a reliable count)
j> s/iter unpacking regex substring
j> unpacking 9.06 -- -27% -34%
j> regex 6.59 37% -- -9%
j> substring 6.01 51% 10% --

j> Unpacking still remains the longest to finish.

j> I use Windows XP professional with a 2Gb RAM. I also have got a 45GB
j> free space in my C drive.

j> DO you see something else different?

i don't have 45GB files nor do i intend to do that. you are disk
thrashing which is the cause of your slowdowns. you are not properly
testing the perl code as your OS I/O is the limiting factor here. learn
how to understand benchmarks better. your test is not legitimate in
comparing the algorithms as the disk I/O dominates.

try it with smaller files that will fit in your ram. not more than .5 gb
given your systems. and with files that large, i would do the conversion
in large chunks in a look to mitigate the i/o and then see which does
better.

uri

--
Uri Guttman ------ uri(a)stemsystems.com -------- http://www.sysarch.com --
----- Perl Code Review , Architecture, Development, Training, Support ------
--------- Gourmet Hot Cocoa Mix ---- http://bestfriendscocoa.com ---------

From: Uri Guttman on 11 Mar 2010 13:30

>>>>> "JWK" == John W Krahn <someone(a)example.com> writes:

JWK> Uri Guttman wrote:
>>
>> sub regex {
>> my @arr = $hex =~ /(..{2})/g ;
>> # print "@arr\n"
>> }

JWK> Shouldn't that be:

JWK> my @arr = $hex =~ /../g ;

JWK> Or:

JWK> my @arr = $hex =~ /.{2}/g ;

JWK> You are capturing *three* characters instead of two.

true. i did my output test and must have optimized this without running
the tests again. anyhow, this whole thing is moot. the OP never said he
had a 25GB file on a 2gb system. slurping in the whole file and then
processing it is disk bound and the 2 char algorithm is irrelevant. i am
out of this thread. the OP doesn't seem to get the concept of
benchmarking or optimizing. let him stick to his substr and stopwatch.

uri

--
Uri Guttman ------ uri(a)stemsystems.com -------- http://www.sysarch.com --
----- Perl Code Review , Architecture, Development, Training, Support ------
--------- Gourmet Hot Cocoa Mix ---- http://bestfriendscocoa.com ---------

From: Peter J. Holzer on 11 Mar 2010 18:31

On 2010-03-11 18:30, Uri Guttman <uri(a)StemSystems.com> wrote:
>>>>>> "JWK" == John W Krahn <someone(a)example.com> writes:
> anyhow, this whole thing is moot. the OP never said he had a 25GB file
> on a 2gb system.

Right. He never said that. So where did you get that information?

He said he had a 4 MB file and 45 GB of free space (the latter is rather
irrelevant, of course).

hp

From: Uri Guttman on 11 Mar 2010 21:47

>>>>> "PJH" == Peter J Holzer <hjp-usenet2(a)hjp.at> writes:

PJH> On 2010-03-11 18:30, Uri Guttman <uri(a)StemSystems.com> wrote:
>>>>>>> "JWK" == John W Krahn <someone(a)example.com> writes:
>> anyhow, this whole thing is moot. the OP never said he had a 25GB file
>> on a 2gb system.

PJH> Right. He never said that. So where did you get that information?

PJH> He said he had a 4 MB file and 45 GB of free space (the latter is rather
PJH> irrelevant, of course).

i misread the 45Gb free disk as the file size. he still never mentioned
the file size. as i showed, the unpack is fastest with the data in
ram. i still would want to know his setup (file size included) to see
why his substr would be fastest. it has to be some very odd thing he is
doing and not telling us. there is no way a substr loop could be faster
than a single call to unpack.

uri

--
Uri Guttman ------ uri(a)stemsystems.com -------- http://www.sysarch.com --
----- Perl Code Review , Architecture, Development, Training, Support ------
--------- Gourmet Hot Cocoa Mix ---- http://bestfriendscocoa.com ---------

First | Prev | Next | Last
Pages: 1 2 3 4 5
Prev: FAQ 7.4 How do I skip some return values?
Next: Simple regex question