From: jis on


Even I want to beleive it should take very less time.
I post the scripts I used for testing.

1. #!/usr/bin/perl
use strict;
use warnings;
my $binary_file="28247101.bin";
open FILE, $binary_file or die "Can't open $binary_file $!\n";
# binmode FILE to supress conversion of line endings
binmode FILE;
undef $/;
my $data = <FILE>;
close FILE;
# convert data to hex form
my $hex = unpack 'H*', $data;
my ($val, $offs, @arr) = ('',0);
#@arr = $hex =~ /[[:xdigit:]]{2}/g;
@arr = unpack("(C2)*",$hex);
print "bye";
print $arr[2]; ( this took 3minuts 25 sec)

if i uncommment regex protion and comment unpack it would take
1minute 25 sec

#!/usr/bin/perl
use strict;
use warnings;
my $binary_file="28247101.bin";
open FILE, $binary_file or die "Can't open $binary_file $!\n";
# binmode FILE to supress conversion of line endings
binmode FILE;
undef $/;
my $data = <FILE>;
close FILE;
# convert data to hex form
my $hex = unpack 'H*', $data;
my $i=0;

my ($val, $offs, @arr) = ('',0);
while ($val=substr( $hex, $offs, 2)){
push @arr, $val;
$offs+=2;
}
print "bye";
print $arr[2]; This would take only 9 seconds.

I have used a stopwatch to calculate time.

Appreciate your help in finding how it can be improved.

thanks,
jis









On Mar 10, 12:51 pm, "Uri Guttman" <u...(a)StemSystems.com> wrote:
> >>>>> "j" == jis  <jisma...(a)gmail.com> writes:
>
>   j> As said regex and unpack took longer time than substr.
>   j> I use Windows. The following are the time taken.
>
>   j> 1. Regex : @arr = $hex =~ /[[:xdigit:]]{2}/g;  - To read  4Mb file
>   j> into an array it took  1min 7 seconds.
>   j> 2. Unpack : @arr = unpack("(C2)*",$hex);    - To read  4Mb file into
>   j> an array it took  3min 26seconds.
>   j> 3. Substr: while ($val=substr( $hex, $offs, 2))
>   j>     {
>   j>         push @arr, $val;
>   j>         $offs+=2;
>   j>     } -  To read  4Mb file into an array it took  11 seconds.
>
> i am sorry, i can't believe it took on the order of minutes to read in a
> file and convert from hex to binary. this is not possible on anything
> but an abacus. given you haven't shown the complete script for each
> version i have to assume your code is broken in some way. also there is
> no way a substr loop would be faster than unpack or a regex. both of
> those would spend all their time in perl's guts while the substr version
> spends most of its time doing slow perl ops in a loop. i say this from
> plenty of experience benchmarking perl code. you can easily write an
> incorrect test of this so i must ask you to post complete working
> programs that exhibit the slowness you claim. i will wager large amounts
> of quatloos i can fix them so the substr will be outed as the slowest
> one.
>
> uri
>
> --
> Uri Guttman  ------  u...(a)stemsystems.com  --------  http://www.sysarch.com--
> -----  Perl Code Review , Architecture, Development, Training, Support ------
> ---------  Gourmet Hot Cocoa Mix  ----  http://bestfriendscocoa.com---------

From: Uri Guttman on
>>>>> "j" == jis <jismagic(a)gmail.com> writes:

j> Even I want to beleive it should take very less time.
j> I post the scripts I used for testing.

j> 1. #!/usr/bin/perl

j> # convert data to hex form
j> my $hex = unpack 'H*', $data;
j> my ($val, $offs, @arr) = ('',0);
j> #@arr = $hex =~ /[[:xdigit:]]{2}/g;
j> @arr = unpack("(C2)*",$hex);

j> my $data = <FILE>;
j> close FILE;
j> # convert data to hex form
j> my $hex = unpack 'H*', $data;
j> my $i=0;

j> my ($val, $offs, @arr) = ('',0);
j> while ($val=substr( $hex, $offs, 2)){
j> push @arr, $val;
j> $offs+=2;
j> }
j> print "bye";
j> print $arr[2]; This would take only 9 seconds.

j> I have used a stopwatch to calculate time.

a stopwatch? you need to learn how to use the Benchmark.pm module.

j> Appreciate your help in finding how it can be improved.

easy. let me do a proper benchmark.

and you should learn how to properly bottom post and not leave my entire
post in the message.

uri


--
Uri Guttman ------ uri(a)stemsystems.com -------- http://www.sysarch.com --
----- Perl Code Review , Architecture, Development, Training, Support ------
--------- Gourmet Hot Cocoa Mix ---- http://bestfriendscocoa.com ---------
From: Uri Guttman on
>>>>> "j" == jis <jismagic(a)gmail.com> writes:

j> if i uncommment regex protion and comment unpack it would take
j> 1minute 25 sec

j> print "bye";
j> print $arr[2]; This would take only 9 seconds.

j> I have used a stopwatch to calculate time.

as i said, that is a silly way to time programs. and there is no way it
would take minutes to do this unless you are on a severely slow cpu or
you are low on ram and are disk thrashing. here is my benchmarked
version which shows that unpacking (fixed to use A and not C) is the
fastest and regex (also fixed to do the simplest but correct thing which
is grab 2 chars) ties your code.

uncomment out those commented lines to see that this does the same and
correct thing in all cases.

here is the timing result run for 10 seconds each:

s/iter regex substring unpacking
regex 2.11 -- -0% -25%
substring 2.11 0% -- -25%
unpacking 1.58 33% 33% --

uri


use strict;
use warnings;

use File::Slurp ;
use Benchmark qw(:all) ;

my $duration = shift || -2 ;

my $file_name = '/boot/vmlinuz-2.6.28-15-generic' ;

my $data = read_file( $file_name, binary => 1 ) ;

#$data = "\x00\x10" ;

my $hex = unpack 'H*', $data;

# unpacking() ;
# regex() ;
# substring() ;
# exit ;

cmpthese( $duration, {

unpacking => \&unpacking,
regex => \&regex,
substring => \&substring,
} ) ;

sub unpacking {
my @arr = unpack( '(A2)*' , $hex) ;
# print "@arr\n"
}

sub regex {
my @arr = $hex =~ /(..{2})/g ;
# print "@arr\n"
}

sub substring {

my ($val, $offs, @arr) = ('',0);
while ($val=substr( $hex, $offs, 2)){
push @arr, $val;
$offs+=2;
}

# print "@arr\n"
}


--
Uri Guttman ------ uri(a)stemsystems.com -------- http://www.sysarch.com --
----- Perl Code Review , Architecture, Development, Training, Support ------
--------- Gourmet Hot Cocoa Mix ---- http://bestfriendscocoa.com ---------
From: jis on
On Mar 11, 11:15 am, "Uri Guttman" <u...(a)StemSystems.com> wrote:
> >>>>> "j" == jis  <jisma...(a)gmail.com> writes:
>
>   j> if i uncommment  regex protion and comment unpack it would take
>   j> 1minute 25 sec
>
>   j> print "bye";
>   j> print $arr[2];    This would take only 9 seconds.
>
>   j> I have used a stopwatch to calculate time.
>
> as i said, that is a silly way to time programs. and there is no way it
> would take minutes to do this unless you are on a severely slow cpu or
> you are low on ram and are disk thrashing. here is my benchmarked
> version which shows that unpacking (fixed to use A and not C) is the
> fastest and regex (also fixed to do the simplest but correct thing which
> is grab 2 chars) ties your code.
>
> uncomment out those commented lines to see that this does the same and
> correct thing in all cases.
>
> here is the timing result run for 10 seconds each:
>
>           s/iter     regex substring unpacking
> regex       2.11        --       -0%      -25%
> substring   2.11        0%        --      -25%
> unpacking   1.58       33%       33%        --
>
> uri
>
> use strict;
> use warnings;
>
> use File::Slurp ;
> use Benchmark qw(:all) ;
>
> my $duration = shift || -2 ;
>
> my $file_name = '/boot/vmlinuz-2.6.28-15-generic' ;
>
> my $data = read_file( $file_name, binary => 1 ) ;
>
> #$data = "\x00\x10" ;
>
> my $hex = unpack 'H*', $data;
>
> # unpacking() ;
> # regex() ;
> # substring() ;
> # exit ;
>
> cmpthese( $duration, {
>
>         unpacking       => \&unpacking,
>         regex           => \&regex,
>         substring       => \&substring,
>
> } ) ;
>
> sub unpacking {
>         my @arr = unpack( '(A2)*' , $hex) ;
> #       print "@arr\n"
>
> }
>
> sub regex {
>         my @arr = $hex =~ /(..{2})/g ;
> #       print "@arr\n"
>
> }
>
> sub substring {
>
>         my ($val, $offs, @arr) = ('',0);
>         while ($val=substr( $hex, $offs, 2)){
>                 push @arr, $val;
>                 $offs+=2;
>         }
>
> #       print "@arr\n"
>
> }
>
> --
> Uri Guttman  ------  u...(a)stemsystems.com  --------  http://www.sysarch.com--
> -----  Perl Code Review , Architecture, Development, Training, Support ------
> ---------  Gourmet Hot Cocoa Mix  ----  http://bestfriendscocoa.com---------

Uri,

I have used the script you have posted with only change in input file
i get the following results.
(warning: too few iterations for a reliable count)
(warning: too few iterations for a reliable count)
(warning: too few iterations for a reliable count)
s/iter unpacking regex substring
unpacking 9.06 -- -27% -34%
regex 6.59 37% -- -9%
substring 6.01 51% 10% --

Unpacking still remains the longest to finish.

I use Windows XP professional with a 2Gb RAM. I also have got a 45GB
free space in my C drive.

DO you see something else different?

thanks,
jis

From: John W. Krahn on
Uri Guttman wrote:
>
> sub regex {
> my @arr = $hex =~ /(..{2})/g ;
> # print "@arr\n"
> }

Shouldn't that be:

my @arr = $hex =~ /../g ;

Or:

my @arr = $hex =~ /.{2}/g ;

You are capturing *three* characters instead of two.



John
--
The programmer is fighting against the two most
destructive forces in the universe: entropy and
human stupidity. -- Damian Conway