From: Helmut Richter on
On Fri, 12 Feb 2010, Dr.Ruud wrote:

> Date: Fri, 12 Feb 2010 19:41:58 +0100
> From: Dr.Ruud <rvtol+usenet(a)xs4all.nl>
> Newsgroups: comp.lang.perl.misc
> Subject: Re: know-how(-not) about regular expressions
>
> Helmut Richter wrote:
>
> > [again parsing the wrong way]
>
> Is there a newsgroup or mailing list that we can refer "them" to?
> I am sure that we are well past our monthly share already.
>
> --
> Ruud
>

--
Helmut Richter
From: Peter J. Holzer on
On 2010-02-13 17:15, sln(a)netherlands.com <sln(a)netherlands.com> wrote:
> On Fri, 12 Feb 2010 15:22:20 -0800, sln(a)netherlands.com wrote:
>
>>On Fri, 12 Feb 2010 12:40:14 +0100, Helmut Richter <hhr-m(a)web.de> wrote:
>>
> [snip]
>
>>Yea, this is better. Slow but maybe try to reduce copying with a while(/.../g) type
>>of thing.
>
> [snip]
>
>>>This version runs considerably slower, by a factor of three
>
> [snip]
>
>>I didn't bench the code, its probably fairly quick.
>
> [snip]
>
> I did bench the code on a 7 mbyte file 'mscore.xml'.
> What really makes it slow on large files is the constant
> "appending" to a variable. Its roughly 2 times + slower doing
> it this way.
>
> The fastest way to do it, is to write it to the disk as you
> get it. Pass in a filehandle, or some other method.
>
> Perl would have to spend all its time on realloc() because
> of all the appending.

That's a surprising result. Perl doubles the size of a string every time
it needs to expand it, so it shouldn't have to realloc much
(only O(log(length($MarkupNew))) times).

As it is, I cannot reproduce your result. Trying it on a 22 MB file I
get these times:

append 9.031 9.041 9.150
tempfile 9.285 9.370 9.479

As you can see, appending is consistently faster than writing to a
temporary file and reading it back.

According to Devel::NYTProf nearly all of the time is spent in these
lines:


while ($$markup =~ /$Rxmarkup/g)

$begin_pos = pos($$markup);

while ($$strref =~ /$Rxent/g) {

where the second is the end of the loop started in the first, so I
suspect that the time attributed to the second line is really spent in
the match, not the pos call.

hp

PS: The nytprofhtml output is at http://www.hjp.at/junk/nytprof/

From: sln on
On Sun, 14 Feb 2010 13:11:13 +0100, "Peter J. Holzer" <hjp-usenet2(a)hjp.at> wrote:

>On 2010-02-13 17:15, sln(a)netherlands.com <sln(a)netherlands.com> wrote:
>> On Fri, 12 Feb 2010 15:22:20 -0800, sln(a)netherlands.com wrote:
>>
>>>On Fri, 12 Feb 2010 12:40:14 +0100, Helmut Richter <hhr-m(a)web.de> wrote:
>>>
>> [snip]
>>
>>>Yea, this is better. Slow but maybe try to reduce copying with a while(/.../g) type
>>>of thing.
>>
>> [snip]
>>
>>>>This version runs considerably slower, by a factor of three
>>
>> [snip]
>>
>>>I didn't bench the code, its probably fairly quick.
>>
>> [snip]
>>
>> I did bench the code on a 7 mbyte file 'mscore.xml'.
>> What really makes it slow on large files is the constant
>> "appending" to a variable. Its roughly 2 times + slower doing
>> it this way.
>>
>> The fastest way to do it, is to write it to the disk as you
>> get it. Pass in a filehandle, or some other method.
>>
>> Perl would have to spend all its time on realloc() because
>> of all the appending.
>
>That's a surprising result. Perl doubles the size of a string every time
>it needs to expand it, so it shouldn't have to realloc much
>(only O(log(length($MarkupNew))) times).
>
>As it is, I cannot reproduce your result. Trying it on a 22 MB file I
>get these times:
>
>append 9.031 9.041 9.150
>tempfile 9.285 9.370 9.479
>
>As you can see, appending is consistently faster than writing to a
>temporary file and reading it back.
>
>According to Devel::NYTProf nearly all of the time is spent in these
>lines:
>
>
> while ($$markup =~ /$Rxmarkup/g)
>
> $begin_pos = pos($$markup);
>
> while ($$strref =~ /$Rxent/g) {
>
>where the second is the end of the loop started in the first, so I
>suspect that the time attributed to the second line is really spent in
>the match, not the pos call.
>
> hp
>
>PS: The nytprofhtml output is at http://www.hjp.at/junk/nytprof/

I looked at that profiling result. Impressive utility. Is it free?

To isolate what I am seeing, I am posting a benchmark that simulates
what I found on the other code. It shows huge performance degredation.
I don't know if its the Perl build 5.10.0 (from ActiveState) or what.

Run this and compare the relative numbers with your build.
I'd feel better knowing Perl is not like this and there is a grave error
on my part/and or build.

Thanks.

-sln

-----------------------
## bench.pl
## ----------
use strict;
use warnings;
use Benchmark ':hireswallclock';

my ($t0,$t1);
my @limit = (
0, # 0
1_000_000, # 1 MB
2_000_000, # 2 MB
3_000_000, # 3 MB
4_000_000, # 4 MB
5_000_000 # 5 MB
);
my @buf = ('') x scalar(@limit);
my $append = '<RXZWQ>sdfgg<oo/>';

print "Starting ...\n";

for (1 .. 2)
{
print "\n",'-' x 30,"\n>> Pass $_:\n";
for my $ndx (0 .. $#limit)
{
my ($t0,$t1);

$buf[$ndx] = 'P' x $limit[$ndx]; # pre-allocate buffer from limit array
$buf[$ndx] = ''; # clear buffer

$t0 = new Benchmark;
for ( 1 .. 235_000 ) { # simulate 235,000 segment appends
$buf[$ndx] .= $append; # from 'mscorlib.xml'
}
$t1 = new Benchmark;

printf STDERR "\nBuf[%d]", $ndx;
printf STDERR ", start size = %.0fmb", $limit[$ndx]/1_000_000;
printf STDERR ", current size = %d bytes\n", length $buf[$ndx];
print STDERR "code metrics: ",timestr( timediff($t1, $t0) ),"\n";
}
}

print "\n", '-' x 30, "\n";
system ('perl -V');

__END__

Output =

Starting ...

------------------------------
>> Pass 1:

Buf[0], start size = 0mb, current size = 3995000 bytes
code metrics: 2.32798 wallclock secs ( 1.52 usr + 0.81 sys = 2.33 CPU)

Buf[1], start size = 1mb, current size = 3995000 bytes
code metrics: 2.23181 wallclock secs ( 1.47 usr + 0.77 sys = 2.23 CPU)

Buf[2], start size = 2mb, current size = 3995000 bytes
code metrics: 1.7917 wallclock secs ( 1.34 usr + 0.45 sys = 1.80 CPU)

Buf[3], start size = 3mb, current size = 3995000 bytes
code metrics: 1.0548 wallclock secs ( 0.78 usr + 0.28 sys = 1.06 CPU)

Buf[4], start size = 4mb, current size = 3995000 bytes
code metrics: 0.0685248 wallclock secs ( 0.08 usr + 0.00 sys = 0.08 CPU)

Buf[5], start size = 5mb, current size = 3995000 bytes
code metrics: 0.0682061 wallclock secs ( 0.06 usr + 0.00 sys = 0.06 CPU)

------------------------------
>> Pass 2:

Buf[0], start size = 0mb, current size = 3995000 bytes
code metrics: 0.0659492 wallclock secs ( 0.06 usr + 0.00 sys = 0.06 CPU)

Buf[1], start size = 1mb, current size = 3995000 bytes
code metrics: 0.0691559 wallclock secs ( 0.08 usr + 0.00 sys = 0.08 CPU)

Buf[2], start size = 2mb, current size = 3995000 bytes
code metrics: 0.069617 wallclock secs ( 0.06 usr + 0.00 sys = 0.06 CPU)

Buf[3], start size = 3mb, current size = 3995000 bytes
code metrics: 0.0686679 wallclock secs ( 0.06 usr + 0.00 sys = 0.06 CPU)

Buf[4], start size = 4mb, current size = 3995000 bytes
code metrics: 0.0811398 wallclock secs ( 0.08 usr + 0.00 sys = 0.08 CPU)

Buf[5], start size = 5mb, current size = 3995000 bytes
code metrics: 0.068722 wallclock secs ( 0.08 usr + 0.00 sys = 0.08 CPU)

------------------------------
Summary of my perl5 (revision 5 version 10 subversion 0) configuration:
Platform:
osname=MSWin32, osvers=5.00, archname=MSWin32-x86-multi-thread
uname=''
config_args='undef'
hint=recommended, useposix=true, d_sigaction=undef
useithreads=define, usemultiplicity=define
useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef
use64bitint=undef, use64bitall=undef, uselongdouble=undef
usemymalloc=n, bincompat5005=undef
Compiler:
cc='cl', ccflags ='-nologo -GF -W3 -MD -Zi -DNDEBUG -O1 -DWIN32 -D_CONSOLE -
DNO_STRICT -DHAVE_DES_FCRYPT -DUSE_SITECUSTOMIZE -DPRIVLIB_LAST_IN_INC -DPERL_IM
PLICIT_CONTEXT -DPERL_IMPLICIT_SYS -DUSE_PERLIO -DPERL_MSVCRT_READFIX',
optimize='-MD -Zi -DNDEBUG -O1',
cppflags='-DWIN32'
ccversion='12.0.8804', gccversion='', gccosandvers=''
intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
d_longlong=undef, longlongsize=8, d_longdbl=define, longdblsize=10
ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='__int64', lseeksi
ze=8
alignbytes=8, prototype=define
Linker and Libraries:
ld='link', ldflags ='-nologo -nodefaultlib -debug -opt:ref,icf -libpath:"C:
\Perl\lib\CORE" -machine:x86'
libpth=\lib
libs= oldnames.lib kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32
..lib advapi32.lib shell32.lib ole32.lib oleaut32.lib netapi32.lib uuid.lib ws2_
32.lib mpr.lib winmm.lib version.lib odbc32.lib odbccp32.lib msvcrt.lib
perllibs= oldnames.lib kernel32.lib user32.lib gdi32.lib winspool.lib comd
lg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib netapi32.lib uuid.lib
ws2_32.lib mpr.lib winmm.lib version.lib odbc32.lib odbccp32.lib msvcrt.lib
libc=msvcrt.lib, so=dll, useshrplib=true, libperl=perl510.lib
gnulibc_version=''
Dynamic Linking:
dlsrc=dl_win32.xs, dlext=dll, d_dlsymun=undef, ccdlflags=' '
cccdlflags=' ', lddlflags='-dll -nologo -nodefaultlib -debug -opt:ref,icf -
libpath:"C:\Perl\lib\CORE" -machine:x86'


Characteristics of this binary (from libperl):
Compile-time options: MULTIPLICITY PERL_DONT_CREATE_GVSV
PERL_IMPLICIT_CONTEXT PERL_IMPLICIT_SYS
PERL_MALLOC_WRAP PL_OP_SLAB_ALLOC USE_ITHREADS
USE_LARGE_FILES USE_PERLIO USE_SITECUSTOMIZE
Locally applied patches:
ActivePerl Build 1004 [287188]
33741 avoids segfaults invoking S_raise_signal() (on Linux)
33763 Win32 process ids can have more than 16 bits
32809 Load 'loadable object' with non-default file extension
32728 64-bit fix for Time::Local
Built under MSWin32
Compiled at Sep 3 2008 13:16:37
@INC:
C:/Perl/site/lib
C:/Perl/lib
.

From: Peter J. Holzer on
On 2010-02-14 22:22, sln(a)netherlands.com <sln(a)netherlands.com> wrote:
> On Sun, 14 Feb 2010 13:11:13 +0100, "Peter J. Holzer" <hjp-usenet2(a)hjp.at> wrote:
>
>>On 2010-02-13 17:15, sln(a)netherlands.com <sln(a)netherlands.com> wrote:
>>> I did bench the code on a 7 mbyte file 'mscore.xml'.
>>> What really makes it slow on large files is the constant
>>> "appending" to a variable. Its roughly 2 times + slower doing
>>> it this way.
>>>
>>> The fastest way to do it, is to write it to the disk as you
>>> get it. Pass in a filehandle, or some other method.
>>>
>>> Perl would have to spend all its time on realloc() because
>>> of all the appending.
>>
>>That's a surprising result. Perl doubles the size of a string every time
>>it needs to expand it, so it shouldn't have to realloc much
>>(only O(log(length($MarkupNew))) times).
>>
>>As it is, I cannot reproduce your result. Trying it on a 22 MB file I
>>get these times:
>>
>>append 9.031 9.041 9.150
>>tempfile 9.285 9.370 9.479
>>
>>As you can see, appending is consistently faster than writing to a
>>temporary file and reading it back.
>>
>>According to Devel::NYTProf nearly all of the time is spent in these
>>lines:
[...]
>>PS: The nytprofhtml output is at http://www.hjp.at/junk/nytprof/
>
> I looked at that profiling result. Impressive utility. Is it free?

Yes. Available from CPAN.

Devel::NYTProf is really nice. However, it adds a rather large overhead
(smaller than most other Perl profilers, but still large), so it is
impractical for programs which run for a long time and sometimes the
overhead hides the real performance bottleneck.


> To isolate what I am seeing, I am posting a benchmark that simulates
> what I found on the other code. It shows huge performance degredation.
> I don't know if its the Perl build 5.10.0 (from ActiveState) or what.
[...]
> ------------------------------
>>> Pass 1:
>
> Buf[0], start size = 0mb, current size = 3995000 bytes
> code metrics: 2.32798 wallclock secs ( 1.52 usr + 0.81 sys = 2.33 CPU)
>
> Buf[1], start size = 1mb, current size = 3995000 bytes
> code metrics: 2.23181 wallclock secs ( 1.47 usr + 0.77 sys = 2.23 CPU)
>
> Buf[2], start size = 2mb, current size = 3995000 bytes
> code metrics: 1.7917 wallclock secs ( 1.34 usr + 0.45 sys = 1.80 CPU)
>
> Buf[3], start size = 3mb, current size = 3995000 bytes
> code metrics: 1.0548 wallclock secs ( 0.78 usr + 0.28 sys = 1.06 CPU)
>
> Buf[4], start size = 4mb, current size = 3995000 bytes
> code metrics: 0.0685248 wallclock secs ( 0.08 usr + 0.00 sys = 0.08 CPU)
>
> Buf[5], start size = 5mb, current size = 3995000 bytes
> code metrics: 0.0682061 wallclock secs ( 0.06 usr + 0.00 sys = 0.06 CPU)
>
> ------------------------------
>>> Pass 2:
>
> Buf[0], start size = 0mb, current size = 3995000 bytes
> code metrics: 0.0659492 wallclock secs ( 0.06 usr + 0.00 sys = 0.06 CPU)
>
> Buf[1], start size = 1mb, current size = 3995000 bytes
> code metrics: 0.0691559 wallclock secs ( 0.08 usr + 0.00 sys = 0.08 CPU)
>
> Buf[2], start size = 2mb, current size = 3995000 bytes
> code metrics: 0.069617 wallclock secs ( 0.06 usr + 0.00 sys = 0.06 CPU)
>
> Buf[3], start size = 3mb, current size = 3995000 bytes
> code metrics: 0.0686679 wallclock secs ( 0.06 usr + 0.00 sys = 0.06 CPU)
>
> Buf[4], start size = 4mb, current size = 3995000 bytes
> code metrics: 0.0811398 wallclock secs ( 0.08 usr + 0.00 sys = 0.08 CPU)
>
> Buf[5], start size = 5mb, current size = 3995000 bytes
> code metrics: 0.068722 wallclock secs ( 0.08 usr + 0.00 sys = 0.08 CPU)
>
> ------------------------------

Ouch. That's really a ludicrous slowdown.

Here are the results from my home system:

------------------------------
>> Pass 1:

Buf[0], start size = 0mb, current size = 3995000 bytes
code metrics: 0.093436 wallclock secs ( 0.08 usr + 0.01 sys = 0.09 CPU)

Buf[1], start size = 1mb, current size = 3995000 bytes
code metrics: 0.105453 wallclock secs ( 0.10 usr + 0.01 sys = 0.11 CPU)

Buf[2], start size = 2mb, current size = 3995000 bytes
code metrics: 0.10132 wallclock secs ( 0.07 usr + 0.03 sys = 0.10 CPU)

Buf[3], start size = 3mb, current size = 3995000 bytes
code metrics: 0.10031 wallclock secs ( 0.05 usr + 0.04 sys = 0.09 CPU)

Buf[4], start size = 4mb, current size = 3995000 bytes
code metrics: 0.0609372 wallclock secs ( 0.06 usr + 0.00 sys = 0.06 CPU)

Buf[5], start size = 5mb, current size = 3995000 bytes
code metrics: 0.060972 wallclock secs ( 0.06 usr + 0.00 sys = 0.06 CPU)

------------------------------
>> Pass 2:

Buf[0], start size = 0mb, current size = 3995000 bytes
code metrics: 0.058821 wallclock secs ( 0.06 usr + 0.00 sys = 0.06 CPU)

Buf[1], start size = 1mb, current size = 3995000 bytes
code metrics: 0.0602 wallclock secs ( 0.06 usr + 0.00 sys = 0.06 CPU)

Buf[2], start size = 2mb, current size = 3995000 bytes
code metrics: 0.060935 wallclock secs ( 0.06 usr + 0.00 sys = 0.06 CPU)

Buf[3], start size = 3mb, current size = 3995000 bytes
code metrics: 0.0601468 wallclock secs ( 0.06 usr + 0.00 sys = 0.06 CPU)

Buf[4], start size = 4mb, current size = 3995000 bytes
code metrics: 0.0608931 wallclock secs ( 0.06 usr + 0.00 sys = 0.06 CPU)

Buf[5], start size = 5mb, current size = 3995000 bytes
code metrics: 0.0607629 wallclock secs ( 0.06 usr + 0.00 sys = 0.06 CPU)

------------------------------

The base time (0.06 seconds) is about the same as for you so I assume
that we use a processor of roughly the same speed (Intel Core2 6300 @
1.86GHz in my case). But I have only a slowdown of less than 2
(0.10/0.06), and you have a slowdown of almost 35 (2.33/0.068).

I don't have a plausible explanation for that. It seems most likely that
Activestate perl extends strings linearly instead of exponentially but
why it would do such a stupid thing is beyond me.

hp