From: Adam Kellas on
On Mar 2, 1:07 pm, J rgen Exner <jurge...(a)hotmail.com> wrote:
> Adam Kellas <adam.kel...(a)gmail.com> wrote:
> >Looking for suggestions on how to speed up the function below. It's
> >intended to "re-macroize" the output of make; in other words given a
> >sequence of command lines generated by make, and a set of make macros,
> >I need to substitute in the make variables such that "gcc -c -g -O2 -
> >Wall -DNDEBUG" might become (say) "$(CC) -c $(CFLAGS) $(DFLAGS)", as
> >much as possible as it was in the original Makefile. The %Variables
> >hash maps strings to macro names; thus with the above example we would
> >have
>
> >  $Variables{'gcc'} = '$(CC)';
> >  $Variables{'-g -O2 -Wall'} = '$(CFLAGS)';
> >  $Variables{'-DNDEBUG'} = '$(DFLAGS)';
>
> >Anyway, the function below seems to work but scales very badly and
> >becomes unusable when enough variables are in use. Any ideas on how to
> >write it better?
>
> >sub varify {
> >    my $word = shift;
> >    $word =~ s%\$%\$\$%g;
> >    for my $substr (keys %Variables) {
> >    while ((my $start = index($word, $substr)) >= 0) {
> >        substr($word, $start, length($substr)) = $Variables{$substr};
>
> Probably I am missing the obvious,but why are you doing the replacements
> manually, thus incurring a lot of string copy, instead of simply doing a
> s///?
> I would replace the whole while loop with a straight-forward
>         s/$substr/$Variables{$substr}/g;
> May have to add \Q...\E if needed.
>
> BTW: $substr is an awful name considering there is a function substr()
> and capitalized names ($Variables) normally indicate file handles.

I figured that s/// would be slower than the index/substr technique,
which in theory should resolve to a strstr(), strcpy, and realloc() in
the perl binary. Or at least so I thought.

AK
From: J�rgen Exner on
Adam Kellas <adam.kellas(a)gmail.com> wrote:
[...]
>> I would replace the whole while loop with a straight-forward
>> � � � � s/$substr/$Variables{$substr}/g;
>> May have to add \Q...\E if needed.
>>
>I figured that s/// would be slower than the index/substr technique,

I'd guess it's a simple enough change to just give it a try. If you
really need hard data then there's always the Benchmark module.

jue
From: Ben Morrow on

Quoth Adam Kellas <adam.kellas(a)gmail.com>:
> On Mar 2, 1:07�pm, J rgen Exner <jurge...(a)hotmail.com> wrote:
> > Adam Kellas <adam.kel...(a)gmail.com> wrote:
> > >Looking for suggestions on how to speed up the function below. It's
> > >intended to "re-macroize" the output of make; in other words given a
> > >sequence of command lines generated by make, and a set of make macros,
> > >I need to substitute in the make variables such that "gcc -c -g -O2 -
> > >Wall -DNDEBUG" might become (say) "$(CC) -c $(CFLAGS) $(DFLAGS)", as
> > >much as possible as it was in the original Makefile. The %Variables
> > >hash maps strings to macro names; thus with the above example we would
> > >have
> >
> > > �$Variables{'gcc'} = '$(CC)';
> > > �$Variables{'-g -O2 -Wall'} = '$(CFLAGS)';
> > > �$Variables{'-DNDEBUG'} = '$(DFLAGS)';
> >
> > >Anyway, the function below seems to work but scales very badly and
> > >becomes unusable when enough variables are in use. Any ideas on how to
> > >write it better?
> >
> > >sub varify {
> > > � �my $word = shift;
> > > � �$word =~ s%\$%\$\$%g;
> > > � �for my $substr (keys %Variables) {
> > > � �while ((my $start = index($word, $substr)) >= 0) {
> > > � � � �substr($word, $start, length($substr)) = $Variables{$substr};
> >
> > Probably I am missing the obvious,but why are you doing the replacements
> > manually, thus incurring a lot of string copy, instead of simply doing a
> > s///?
> > I would replace the whole while loop with a straight-forward
> > � � � � s/$substr/$Variables{$substr}/g;
> > May have to add \Q...\E if needed.
> >
> > BTW: $substr is an awful name considering there is a function substr()
> > and capitalized names ($Variables) normally indicate file handles.
>
> I figured that s/// would be slower than the index/substr technique,
> which in theory should resolve to a strstr(), strcpy, and realloc() in
> the perl binary. Or at least so I thought.

As a rule of thumb, any single perl op (s/// is a single op) is likely
to be faster than the equivalent algorithm using multiple ops (substr,
index, the assignment and the while loop add up to quite a lot of ops
between them). Lvalue substr is also likely to be slower than 4-arg
substr (since it's much more magical).

Your post xthread (about the process sitting in brk(2)) suggests a lot
of string copying. You could try running through the source string and
building a single result string as an alternative, which should avoid
most of the copying. If you have some idea of an upper bound on the
length of the destination string (if your de-substitutions always make
the string shorter, for instance) you could even try preallocating the
destination string like this:

my $dest = "x" x $dest_size;
$dest = "";

If you now only append to $dest, it won't ever be re-allocated (assuming
you don't go over the original size).

Ben

From: Uri Guttman on
>>>>> "AK" == Adam Kellas <adam.kellas(a)gmail.com> writes:

AK> On Mar 2, 1:06�pm, "Uri Guttman" <u...(a)StemSystems.com> wrote:
>> can you make a pattern that would match ANY of the strings you want to
>> match? even a alternation might do well enough. then you can do a
>> single s/// with the replacement value being looked up in the $variables
>> (poor name) hash.

AK> Thanks, I will try this.

>> also why would speed be an issue here? it looks like it would be done
>> off line to fix up makefile output.

AK> You're right, this is done more or less off line and in theory should
AK> not be too performance sensitive. But this exhibits really
AK> pathological behavior - for reasons I don't understand, though it
AK> works fine in small unit-test setups it can appear to hang for hours
AK> on end in real-world situations. During that time strace shows that
AK> perl is calling the brk() system call over and over.

you would have to post all the code and some data and hopefully that
would exhibit the problem. calling brk() means you are doing serious
data allocation which shouldn't happen in this style of code. maybe you
are doing something else that you didn't tell us. s/// ops on a string
may add some ram needs but not insane amounts. massive unnecessary ram
needs are usually leaks, either a rare perl bug or some poorly designed
data structure that leaks.

uri

--
Uri Guttman ------ uri(a)stemsystems.com -------- http://www.sysarch.com --
----- Perl Code Review , Architecture, Development, Training, Support ------
--------- Gourmet Hot Cocoa Mix ---- http://bestfriendscocoa.com ---------
From: Uri Guttman on
>>>>> "BM" == Ben Morrow <ben(a)morrow.me.uk> writes:

BM> As a rule of thumb, any single perl op (s/// is a single op) is likely
BM> to be faster than the equivalent algorithm using multiple ops (substr,
BM> index, the assignment and the while loop add up to quite a lot of ops
BM> between them). Lvalue substr is also likely to be slower than 4-arg
BM> substr (since it's much more magical).

totally agree. perl guts are almost always faster than the equivilent in
perl lang. the rule is to try to stay inside perl as much as
possible. there are a few exceptions but not in this case.

BM> Your post xthread (about the process sitting in brk(2)) suggests a lot
BM> of string copying. You could try running through the source string and
BM> building a single result string as an alternative, which should avoid
BM> most of the copying. If you have some idea of an upper bound on the
BM> length of the destination string (if your de-substitutions always make
BM> the string shorter, for instance) you could even try preallocating the
BM> destination string like this:

BM> my $dest = "x" x $dest_size;
BM> $dest = "";

and my solution of a single s///g with all the keys and lookup in a hash
for the replacement will speed it up. fewer ops and much less realloc as
it will build up a single new string as it does the replacements. it may
realloc some but nowhere like with the lvalue substr and such the OP
used.

uri

--
Uri Guttman ------ uri(a)stemsystems.com -------- http://www.sysarch.com --
----- Perl Code Review , Architecture, Development, Training, Support ------
--------- Gourmet Hot Cocoa Mix ---- http://bestfriendscocoa.com ---------