From: Adam Kellas on
Hi,

Looking for suggestions on how to speed up the function below. It's
intended to "re-macroize" the output of make; in other words given a
sequence of command lines generated by make, and a set of make macros,
I need to substitute in the make variables such that "gcc -c -g -O2 -
Wall -DNDEBUG" might become (say) "$(CC) -c $(CFLAGS) $(DFLAGS)", as
much as possible as it was in the original Makefile. The %Variables
hash maps strings to macro names; thus with the above example we would
have

$Variables{'gcc'} = '$(CC)';
$Variables{'-g -O2 -Wall'} = '$(CFLAGS)';
$Variables{'-DNDEBUG'} = '$(DFLAGS)';

Anyway, the function below seems to work but scales very badly and
becomes unusable when enough variables are in use. Any ideas on how to
write it better?

sub varify {
my $word = shift;
$word =~ s%\$%\$\$%g;
for my $substr (keys %Variables) {
while ((my $start = index($word, $substr)) >= 0) {
substr($word, $start, length($substr)) = $Variables{$substr};
}
}
return $word;
}

Thanks,
AK
From: Jim Gibson on
In article
<688f37dd-7719-4944-a19f-77a60c572804(a)d2g2000yqa.googlegroups.com>,
Adam Kellas <adam.kellas(a)gmail.com> wrote:

> Hi,
>
> Looking for suggestions on how to speed up the function below. It's
> intended to "re-macroize" the output of make; in other words given a
> sequence of command lines generated by make, and a set of make macros,
> I need to substitute in the make variables such that "gcc -c -g -O2 -
> Wall -DNDEBUG" might become (say) "$(CC) -c $(CFLAGS) $(DFLAGS)", as
> much as possible as it was in the original Makefile. The %Variables
> hash maps strings to macro names; thus with the above example we would
> have
>
> $Variables{'gcc'} = '$(CC)';
> $Variables{'-g -O2 -Wall'} = '$(CFLAGS)';
> $Variables{'-DNDEBUG'} = '$(DFLAGS)';
>
> Anyway, the function below seems to work but scales very badly and
> becomes unusable when enough variables are in use. Any ideas on how to
> write it better?
>
> sub varify {
> my $word = shift;
> $word =~ s%\$%\$\$%g;
> for my $substr (keys %Variables) {
> while ((my $start = index($word, $substr)) >= 0) {
> substr($word, $start, length($substr)) = $Variables{$substr};
> }
> }
> return $word;
> }

I don't see how you are going to get much of a speedup. It seems you
are already doing the minimum amount of work with no wasted steps.

You might try using the each() function instead of keys. That saves
generating the key array and the hash lookup for the replacement
string:

while( my($key,$replace) = each( %Variables) ) {
...
substr($word, $start, length($substr)) = $replace;
}

You could also pre-compute the replacement string lengths so you don't
have to call the length() function for each replacement. Thus, you
might be better off using three arrays or a two-dimensional (N,3) array
to hold (key,replacement,length(replacement)) values.

How many key:replacement pairs are you using? I am surprised this
doesn't scale very well. It would appear to be O(n) in the number of
search strings.

As always, only benchmarking can ensure you are getting any speedups or
tell you where the actual bottlenecks are.

--
Jim Gibson
From: Uri Guttman on
>>>>> "AK" == Adam Kellas <adam.kellas(a)gmail.com> writes:

AK> Hi,
AK> Looking for suggestions on how to speed up the function below. It's
AK> intended to "re-macroize" the output of make; in other words given a
AK> sequence of command lines generated by make, and a set of make macros,
AK> I need to substitute in the make variables such that "gcc -c -g -O2 -
AK> Wall -DNDEBUG" might become (say) "$(CC) -c $(CFLAGS) $(DFLAGS)", as
AK> much as possible as it was in the original Makefile. The %Variables
AK> hash maps strings to macro names; thus with the above example we would
AK> have

AK> $Variables{'gcc'} = '$(CC)';
AK> $Variables{'-g -O2 -Wall'} = '$(CFLAGS)';
AK> $Variables{'-DNDEBUG'} = '$(DFLAGS)';

AK> sub varify {
AK> my $word = shift;
AK> $word =~ s%\$%\$\$%g;
AK> for my $substr (keys %Variables) {
AK> while ((my $start = index($word, $substr)) >= 0) {
AK> substr($word, $start, length($substr)) = $Variables{$substr};
AK> }
AK> }
AK> return $word;
AK> }

can you make a pattern that would match ANY of the strings you want to
match? even a alternation might do well enough. then you can do a
single s/// with the replacement value being looked up in the $variables
(poor name) hash.

also why would speed be an issue here? it looks like it would be done
off line to fix up makefile output.

uri

--
Uri Guttman ------ uri(a)stemsystems.com -------- http://www.sysarch.com --
----- Perl Code Review , Architecture, Development, Training, Support ------
--------- Gourmet Hot Cocoa Mix ---- http://bestfriendscocoa.com ---------
From: J�rgen Exner on
Adam Kellas <adam.kellas(a)gmail.com> wrote:
>Looking for suggestions on how to speed up the function below. It's
>intended to "re-macroize" the output of make; in other words given a
>sequence of command lines generated by make, and a set of make macros,
>I need to substitute in the make variables such that "gcc -c -g -O2 -
>Wall -DNDEBUG" might become (say) "$(CC) -c $(CFLAGS) $(DFLAGS)", as
>much as possible as it was in the original Makefile. The %Variables
>hash maps strings to macro names; thus with the above example we would
>have
>
> $Variables{'gcc'} = '$(CC)';
> $Variables{'-g -O2 -Wall'} = '$(CFLAGS)';
> $Variables{'-DNDEBUG'} = '$(DFLAGS)';
>
>Anyway, the function below seems to work but scales very badly and
>becomes unusable when enough variables are in use. Any ideas on how to
>write it better?
>
>sub varify {
> my $word = shift;
> $word =~ s%\$%\$\$%g;
> for my $substr (keys %Variables) {
> while ((my $start = index($word, $substr)) >= 0) {
> substr($word, $start, length($substr)) = $Variables{$substr};

Probably I am missing the obvious,but why are you doing the replacements
manually, thus incurring a lot of string copy, instead of simply doing a
s///?
I would replace the whole while loop with a straight-forward
s/$substr/$Variables{$substr}/g;
May have to add \Q...\E if needed.

BTW: $substr is an awful name considering there is a function substr()
and capitalized names ($Variables) normally indicate file handles.

jue
From: Adam Kellas on
On Mar 2, 1:06 pm, "Uri Guttman" <u...(a)StemSystems.com> wrote:
> can you make a pattern that would match ANY of the strings you want to
> match? even a alternation might do well enough. then you can do a
> single s/// with the replacement value being looked up in the $variables
> (poor name) hash.

Thanks, I will try this.

> also why would speed be an issue here? it looks like it would be done
> off line to fix up makefile output.

You're right, this is done more or less off line and in theory should
not be too performance sensitive. But this exhibits really
pathological behavior - for reasons I don't understand, though it
works fine in small unit-test setups it can appear to hang for hours
on end in real-world situations. During that time strace shows that
perl is calling the brk() system call over and over.

Thanks,
AK