Help on String to array ! [Perl]

Prev: FAQ 7.4 How do I skip some return values?
Next: Simple regex question

From: Don Piven on 9 Mar 2010 10:44

John W. Krahn wrote:
> Don Piven wrote:

> No need for a loop:
>
> my @arr = $hex =~ /[[:xdigit:]]{2}/g;
>
> Also, you don't use capturing parentheses in your regular expression so
> $1 will always be empty.

So much for my proofreading :-P You're right, of course.

From: sln on 9 Mar 2010 12:57

On Tue, 9 Mar 2010 03:34:48 -0800 (PST), jis <jismagic(a)gmail.com> wrote:

>Guys,
>
>I have a string $hex which has lets assume "0012345689abcd"

>[snip]

>Unfortunately when i read big files of 4MB size it takes
>like 10mins before it completes execution. No good.
>(i couldnt split it like 00,12 but only like 0,0,1,2)
>
>Then I thought unpack wud be a better idea.
> @arr = unpack("H2",$data); or
>@arr = unpack("H2*",$data);
>
Perl distributions for win32 have a problem with
native realloc(). On these, the larger the dynamic list
generated by the function, the longer it takes.
Linux doesen't have this problem.

In general, if you expect to be splitting up very
large data segments, its better to control the list
external to the function, where push() is better.

Of the 3 types of basic methods: substr/unpack/regexp,
the one thats the fastest seems to be substr().
Additionally, on win32 platforms, any method using a
push is far better.

My platform is Windows in generating the below data.
If you have Linux, your results will be different.
Post your numbers if you can.

-sln

Output:
--------------------
Size of bigstring = 560

Substr/push took: 0.00030303 wallclock secs ( 0.00 usr + 0.00 sys = 0.00 CPU)
Unpack/list took: 0.000344038 wallclock secs ( 0.00 usr + 0.00 sys = 0.00 CPU)
Unpack/push took: 0.000586033 wallclock secs ( 0.00 usr + 0.00 sys = 0.00 CPU)
Regexp/list took: 0.000608206 wallclock secs ( 0.00 usr + 0.00 sys = 0.00 CPU)
Regexp/push took: 0.000404835 wallclock secs ( 0.00 usr + 0.00 sys = 0.00 CPU)

--------------------
Size of bigstring = 5600

Substr/push took: 0.002841 wallclock secs ( 0.00 usr + 0.00 sys = 0.00 CPU)
Unpack/list took: 0.00334311 wallclock secs ( 0.00 usr + 0.00 sys = 0.00 CPU)
Unpack/push took: 0.00657105 wallclock secs ( 0.00 usr + 0.00 sys = 0.00 CPU)
Regexp/list took: 0.00673795 wallclock secs ( 0.00 usr + 0.00 sys = 0.00 CPU)
Regexp/push took: 0.004076 wallclock secs ( 0.00 usr + 0.00 sys = 0.00 CPU)

--------------------
Size of bigstring = 56000

Substr/push took: 0.0301139 wallclock secs ( 0.03 usr + 0.00 sys = 0.03 CPU)
Unpack/list took: 0.0458951 wallclock secs ( 0.05 usr + 0.00 sys = 0.05 CPU)
Unpack/push took: 0.0644789 wallclock secs ( 0.06 usr + 0.00 sys = 0.06 CPU)
Regexp/list took: 0.07149 wallclock secs ( 0.06 usr + 0.00 sys = 0.06 CPU)
Regexp/push took: 0.03965 wallclock secs ( 0.03 usr + 0.00 sys = 0.03 CPU)

--------------------
Size of bigstring = 560000

Substr/push took: 0.309315 wallclock secs ( 0.30 usr + 0.02 sys = 0.31 CPU)
Unpack/list took: 0.723145 wallclock secs ( 0.61 usr + 0.11 sys = 0.72 CPU)
Unpack/push took: 0.640141 wallclock secs ( 0.64 usr + 0.00 sys = 0.64 CPU)
Regexp/list took: 0.927701 wallclock secs ( 0.92 usr + 0.00 sys = 0.92 CPU)
Regexp/push took: 0.516143 wallclock secs ( 0.52 usr + 0.00 sys = 0.52 CPU)

--------------------
Size of bigstring = 5600000

Substr/push took: 3.79988 wallclock secs ( 3.75 usr + 0.06 sys = 3.81 CPU)
Unpack/list took: 40.0264 wallclock secs (34.97 usr + 5.06 sys = 40.03 CPU)
Unpack/push took: 6.71793 wallclock secs ( 6.70 usr + 0.01 sys = 6.72 CPU)
Regexp/list took: 34.6208 wallclock secs (34.56 usr + 0.06 sys = 34.63 CPU)
Regexp/push took: 7.93654 wallclock secs ( 7.89 usr + 0.05 sys = 7.94 CPU)

=======
for my $multiplier (40, 400, 4_000, 40_000, 400_000)
{
my $bigstring = '0012345689abcd' x $multiplier;
print "\n",'-'x20,"\nSize of bigstring = ",length($bigstring),"\n\n";

##
{
my ($val, $offs, @pairs) = ('',0);
my $t0 = new Benchmark;
while ($val=substr( $bigstring, $offs, 2))
{
push @pairs, $val;
$offs+=2;
}
my $t1 = new Benchmark;
print "Substr/push took: ",timestr(timediff($t1, $t0)),"\n";
}
##
{
my $t0 = new Benchmark;
my @pairs = unpack '(a2)*', $bigstring;
my $t1 = new Benchmark;
print "Unpack/list took: ",timestr(timediff($t1, $t0)),"\n";
}
##
{
my ($val, $offs, @pairs) = ('',0);
my $t0 = new Benchmark;
while ($val=unpack("x$offs a2", $bigstring) )
{
push @pairs, $val;
$offs+=2;
}
my $t1 = new Benchmark;
print "Unpack/push took: ",timestr(timediff($t1, $t0)),"\n";
}
##
{
my $t0 = new Benchmark;
my @pairs = $bigstring =~ /[0-9a-f]{2}/g;
my $t1 = new Benchmark;
print "Regexp/list took: ",timestr(timediff($t1, $t0)),"\n";
}
##
{
my @pairs;
my $t0 = new Benchmark;
while ( $bigstring =~ /([0-9a-f]{2})/g ) {
push @pairs, $1;
}
my $t1 = new Benchmark;
print "Regexp/push took: ",timestr(timediff($t1, $t0)),"\n";
}
}

__END__

From: sln on 9 Mar 2010 12:59

On Tue, 09 Mar 2010 09:57:23 -0800, sln(a)netherlands.com wrote:
>=======
use strict;
use warnings;
use Benchmark ':hireswallclock';

>for my $multiplier (40, 400, 4_000, 40_000, 400_000)

From: jis on 10 Mar 2010 00:09

On Mar 9, 10:59 pm, s...(a)netherlands.com wrote:
> On Tue, 09 Mar 2010 09:57:23 -0800, s...(a)netherlands.com wrote:
> >=======
>
> use strict;
> use warnings;
> use Benchmark ':hireswallclock';
>
>
>
> >for my $multiplier (40, 400, 4_000, 40_000, 400_000)- Hide quoted text -
>
> - Show quoted text -

Thanks for the replies.

As said regex and unpack took longer time than substr.
I use Windows. The following are the time taken.

1. Regex : @arr = $hex =~ /[[:xdigit:]]{2}/g; - To read 4Mb file
into an array it took 1min 7 seconds.
2. Unpack : @arr = unpack("(C2)*",$hex); - To read 4Mb file into
an array it took 3min 26seconds.
3. Substr: while ($val=substr( $hex, $offs, 2))
{
push @arr, $val;
$offs+=2;
} - To read 4Mb file into an array it took 11 seconds.

thanks,
jis

From: Uri Guttman on 10 Mar 2010 02:51

>>>>> "j" == jis <jismagic(a)gmail.com> writes:

j> As said regex and unpack took longer time than substr.
j> I use Windows. The following are the time taken.

j> 1. Regex : @arr = $hex =~ /[[:xdigit:]]{2}/g; - To read 4Mb file
j> into an array it took 1min 7 seconds.
j> 2. Unpack : @arr = unpack("(C2)*",$hex); - To read 4Mb file into
j> an array it took 3min 26seconds.
j> 3. Substr: while ($val=substr( $hex, $offs, 2))
j> {
j> push @arr, $val;
j> $offs+=2;
j> } - To read 4Mb file into an array it took 11 seconds.

i am sorry, i can't believe it took on the order of minutes to read in a
file and convert from hex to binary. this is not possible on anything
but an abacus. given you haven't shown the complete script for each
version i have to assume your code is broken in some way. also there is
no way a substr loop would be faster than unpack or a regex. both of
those would spend all their time in perl's guts while the substr version
spends most of its time doing slow perl ops in a loop. i say this from
plenty of experience benchmarking perl code. you can easily write an
incorrect test of this so i must ask you to post complete working
programs that exhibit the slowness you claim. i will wager large amounts
of quatloos i can fix them so the substr will be outed as the slowest
one.

uri

--
Uri Guttman ------ uri(a)stemsystems.com -------- http://www.sysarch.com --
----- Perl Code Review , Architecture, Development, Training, Support ------
--------- Gourmet Hot Cocoa Mix ---- http://bestfriendscocoa.com ---------

First | Prev | Next | Last
Pages: 1 2 3 4 5
Prev: FAQ 7.4 How do I skip some return values?
Next: Simple regex question