how to speed up a string-substitution loop? [Perl]

Prev: trim the last blank-line and compare files
Next: Can you compile a perl executable?

From: Adam Kellas on 3 Mar 2010 11:04

On Mar 3, 10:21 am, "Uri Guttman" <u...(a)StemSystems.com> wrote:
> >>>>> "AK" == Adam Kellas <adam.kel...(a)gmail.com> writes:
> AK> Really, we have no disagreement here. It's just that when one is
> AK> trying desperately to speed something up it's reasonable to try
> AK> everything you know of. What I posted was not a published, supported
> AK> piece of code, it was the result of a tuning exercise.
>
> desperation is not a reason to try everything under the sun for
> speedups. just applying the benchmark module is a saner way to find out
> what is actually faster and by how much.

OK, if you insist, we do have a disagreement. I will simply note that
the benchmark module is of little use when the code you're fixing is
so slow as to never finish.

AK

From: Uri Guttman on 3 Mar 2010 11:09

>>>>> "AK" == Adam Kellas <adam.kellas(a)gmail.com> writes:

AK> On Mar 3, 10:21�am, "Uri Guttman" <u...(a)StemSystems.com> wrote:
>> >>>>> "AK" == Adam Kellas <adam.kel...(a)gmail.com> writes:
>> � AK> Really, we have no disagreement here. It's just that when one is
>> � AK> trying desperately to speed something up it's reasonable to try
>> � AK> everything you know of. What I posted was not a published, supported
>> � AK> piece of code, it was the result of a tuning exercise.
>>
>> desperation is not a reason to try everything under the sun for
>> speedups. just applying the benchmark module is a saner way to find out
>> what is actually faster and by how much.

AK> OK, if you insist, we do have a disagreement. I will simply note that
AK> the benchmark module is of little use when the code you're fixing is
AK> so slow as to never finish.

no, it is useful when you think using $_ will save you massive amounts
of time when it doesn't. so that will lead you to better coding style
and to focus on your real slowdown. also profiling modules will help
(even in code that never finishes) to find out where the time is spent.

uri

--
Uri Guttman ------ uri(a)stemsystems.com -------- http://www.sysarch.com --
----- Perl Code Review , Architecture, Development, Training, Support ------
--------- Gourmet Hot Cocoa Mix ---- http://bestfriendscocoa.com ---------

From: sln on 3 Mar 2010 11:42

On Tue, 2 Mar 2010 08:39:16 -0800 (PST), Adam Kellas <adam.kellas(a)gmail.com> wrote:

>Hi,
>
>Looking for suggestions on how to speed up the function below. It's
>intended to "re-macroize" the output of make; in other words given a
>sequence of command lines generated by make, and a set of make macros,
>I need to substitute in the make variables such that "gcc -c -g -O2 -
>Wall -DNDEBUG" might become (say) "$(CC) -c $(CFLAGS) $(DFLAGS)", as
>much as possible as it was in the original Makefile. The %Variables
>hash maps strings to macro names; thus with the above example we would
>have
>
> $Variables{'gcc'} = '$(CC)';
> $Variables{'-g -O2 -Wall'} = '$(CFLAGS)';
> $Variables{'-DNDEBUG'} = '$(DFLAGS)';
>
>Anyway, the function below seems to work but scales very badly and
>becomes unusable when enough variables are in use. Any ideas on how to
>write it better?
>
>sub varify {
> my $word = shift;
> $word =~ s%\$%\$\$%g;
> for my $substr (keys %Variables) {
> while ((my $start = index($word, $substr)) >= 0) {
> substr($word, $start, length($substr)) = $Variables{$substr};
> }
> }
> return $word;
>}
>
>Thanks,
>AK

Just a comment that no where is it written that you can reconstruct
variables from the output of variable substitution.

$(a) = a
$(b) = -b$(a)

cc a.obj $(b) ab.obj ->
cc a.obj -ba ab.obj
----
cc a.obj -ba ab.obj ->
cc $(a).obj -b$(a) $(a)b.obj ->
cc $(a).obj $(b) $(a)b.obj != cc a.obj $(b) ab.obj

Even if you could be guaranteed distinction in the final
output, the order in which you do the reverse substitution
has to start from the first variable defined and progress
to the last defined, ie: FIFO.

This means you can't use a hash, which is random and can't
be fifo. Instead you have to store the pseudo variables and
thier data in an array:

@Variables = (
'$(a)' , 'a',
'$(b)' , '-b$(a)',
);

then read each pair as you progress down the list.

Then, to be complete, you have to repeat the substitution
reconstruction process as many times as the deepest nesting
of the $Variables. This could be acomplished by repeating
until there is no difference between the old and new strings.

But, if you can overcome these hurdles, you might get a
broken up static snapshot of makefile state, which can
dynamically generate multiple states.

In this case it would be:

$(a) = a
$(b) = -b$(a)
cc $(a).obj $(b) $(a)b.obj
(but this was designed to fail)

-sln

From: sln on 3 Mar 2010 12:10

On Wed, 03 Mar 2010 08:42:02 -0800, sln(a)netherlands.com wrote:

>On Tue, 2 Mar 2010 08:39:16 -0800 (PST), Adam Kellas <adam.kellas(a)gmail.com> wrote:
>
>>Hi,
>>
>>Looking for suggestions on how to speed up the function below. It's
>>intended to "re-macroize" the output of make; in other words given a
>>sequence of command lines generated by make, and a set of make macros,
>>I need to substitute in the make variables such that "gcc -c -g -O2 -
>>Wall -DNDEBUG" might become (say) "$(CC) -c $(CFLAGS) $(DFLAGS)", as
>>much as possible as it was in the original Makefile. The %Variables
>>hash maps strings to macro names; thus with the above example we would
>>have
>>
>> $Variables{'gcc'} = '$(CC)';
>> $Variables{'-g -O2 -Wall'} = '$(CFLAGS)';
>> $Variables{'-DNDEBUG'} = '$(DFLAGS)';
>>
>>Anyway, the function below seems to work but scales very badly and
>>becomes unusable when enough variables are in use. Any ideas on how to
>>write it better?
>>
>>sub varify {
>> my $word = shift;
>> $word =~ s%\$%\$\$%g;
>> for my $substr (keys %Variables) {
>> while ((my $start = index($word, $substr)) >= 0) {
>> substr($word, $start, length($substr)) = $Variables{$substr};
>> }
>> }
>> return $word;
>>}
>>
>>Thanks,
>>AK
>
>Just a comment that no where is it written that you can reconstruct
>variables from the output of variable substitution.
>
>$(a) = a
>$(b) = -b$(a)

When checking if the value is in the target,
there must be a check that the value itself
is not a variable name.

'this is the t$(a)rget '
^
Found value 'a' here.
Aviod substitution if its part of a variable name,
enclosed with $().

$target =~ s/ \$$.*?$ \K | ($value) / defined $1 ? $varname : '' /xeg;

Probably in the simplese cases, this will never happen, but its possible.

-sln

From: sln on 3 Mar 2010 19:02

On Wed, 03 Mar 2010 09:10:17 -0800, sln(a)netherlands.com wrote:

>On Wed, 03 Mar 2010 08:42:02 -0800, sln(a)netherlands.com wrote:
>
>>On Tue, 2 Mar 2010 08:39:16 -0800 (PST), Adam Kellas <adam.kellas(a)gmail.com> wrote:
>>
>>>Hi,
>>>
>>>Looking for suggestions on how to speed up the function below. It's
>>>intended to "re-macroize" the output of make; in other words given a
>>>sequence of command lines generated by make, and a set of make macros,
>>>I need to substitute in the make variables such that "gcc -c -g -O2 -
>>>Wall -DNDEBUG" might become (say) "$(CC) -c $(CFLAGS) $(DFLAGS)", as
>>>much as possible as it was in the original Makefile. The %Variables
>>>hash maps strings to macro names; thus with the above example we would
>>>have
>>>
>>> $Variables{'gcc'} = '$(CC)';
>>> $Variables{'-g -O2 -Wall'} = '$(CFLAGS)';
>>> $Variables{'-DNDEBUG'} = '$(DFLAGS)';
>>>
>>>Anyway, the function below seems to work but scales very badly and
>>>becomes unusable when enough variables are in use. Any ideas on how to
>>>write it better?
>>>
>>>sub varify {
>>> my $word = shift;
>>> $word =~ s%\$%\$\$%g;
>>> for my $substr (keys %Variables) {
>>> while ((my $start = index($word, $substr)) >= 0) {
>>> substr($word, $start, length($substr)) = $Variables{$substr};
>>> }
>>> }
>>> return $word;
>>>}
>>>
>>>Thanks,
>>>AK
>>
>>Just a comment that no where is it written that you can reconstruct
>>variables from the output of variable substitution.
>>
>>$(a) = a
>>$(b) = -b$(a)
>
>When checking if the value is in the target,
>there must be a check that the value itself
>is not a variable name.
>
> 'this is the t$(a)rget '
> ^
> Found value 'a' here.
> Aviod substitution if its part of a variable name,
> enclosed with $().
>
>$target =~ s/ \$$.*?$ \K | ($value) / defined $1 ? $varname : '' /xeg;
>
This could be done using regexp, something like this (untested):

use strict;
use warnings;

my ($target,@macros);

##
$target = 'cc a.obj -ba ab.obj';
@macros = (
'$(a)' , 'a',
'$(b)' , '-b$(a)',
'$(c)' , '-c$(a)',
);
print "\n",$target,"\n";
for (my $ndx = 0; $ndx <= $#macros; $ndx+=2) {
print +($macros[$ndx] =~ /\$$(.*?)$/), " = $macros[$ndx+1]\n";
}
print varify($target, \@macros),"\n";

##
$target = 'gcc -c -g -O2 -Wall -DNDEBUG';
@macros = (
'$(CC)' , 'gcc',
'$(CFLAGS)', '-g -O2 -Wall',
'$(DFLAGS)', '-DNDEBUG',
);
print "\n",$target,"\n";
for (my $ndx = 0; $ndx <= $#macros; $ndx+=2) {
print +($macros[$ndx] =~ /\$$(.*?)$/), " = $macros[$ndx+1]\n";
}
print varify($target, \@macros),"\n";

##
sub varify
{
my ($matched, $newtarget, $macref) = (1, @_);
while ($matched) {
$matched = 0;
for my $ndx (0 .. $#{$macref}/2) {
$ndx *= 2;
my ($varname, $value) = @$macref[$ndx, $ndx+1];
$newtarget =~ s/ \$$.*?$\K | (\Q$value\E) /
if (defined $1) {
$matched = 1;
$varname
}
else {''}
/xeg;
}
}
return $newtarget;
}
__END__

-sln

First | Prev | Next | Last
Pages: 1 2 3 4 5 6
Prev: trim the last blank-line and compare files
Next: Can you compile a perl executable?